This tutorial assumes that you have downloaded the source distribution and unzipped/ungzipped it to some folder.
The examples assume this path is $TUPLESPACE_HOME/tutorial or %TUPLESPACE_HOME%\tutorial, which is set to C:\tuplespace in this tutorial.
You will need Java 2 SDK 1.4 or better to start the examples, as well as Ant 1.6 or better. Other libraries (JUnit and jMock) are already included in the lib/ folder..
Let's say we have to write some kind of web crawler. The crawler needs to download a page, transform it, extract some information, and finally, post the report.
Traditionnaly, one would write such an application in a single-threaded manner. Something like this:
The problem with such an architecture is that we are going to spend most of our time waiting for the network. While the thread is waiting for the downloaded information, another one could be reading the previous document, extracting the necessary information. This would enhance performance tremendously.
This application architecture would be vastly more complex, though.
Each thread would need to make sure that it is not modifying a
data structure that another thread is reading. Sure, Java has
the synchronized
primitive, and
also has the wait()
and
notify()
methods. These are good,
but they are too low-level for most programmers.
The new java.util.concurrent
package in JDK 1.5 will allow programmers to write code at a higher
abstraction level, lowering the number of synchronization bugs.
For those that are working on pre-JDK 1.5 JREs, there is always
the original library that became java.util.concurrent
:
util.concurrent,
by Doug Lea.
Unfortunately, these libraries are huge !
util.concurrent
alone contains
over 50 public classes. This is unwieldy.
TupleSpace aims to provide a dead-simple, can't go wrong view of the world. The TupleSpace interface contains three methods:
That's it ! It can't be simpler than that.
A TupleSpace is a region (of memory, disk space, network, etc.) which contains Tuples. Tuples are ordered collections of elements, which need not be the same type. A TupleSpace allows coupling to be reduced in an application. Threads wait for an object of the appropriate type to be available, and will receive it when it is.
This reduces coupling because all objects are talking to a common TupleSpace, and not to each other.
So, let's start coding…
We're going to implement UnitOfWork to create the thread will download documents.
Let's say the specs call for returning the page's textual content. This means we'll need to download the page and create a string with the full contents of the page.
So, let's start to implement the unit of work:
// Wait for a URL to become available in the space Tuple tuple = tupleSpace.takeFromSpace(new SimpleTemplate(URL.class));
Here, we're instantiating a Template implementation. This implementation matches tuples by wildcards. A type wildcard is indicated by providing an object's class, and not a value. So, we are asking the space to return whenever it finds a tuple with a single element, whose first element is a URL class or subclass.
// Get the actual URL URL page = (URL) tuple.getValueAt(0);
Now that we've retrieved the tuple, we need the tuple's values. Tuples are like Java arrays, which are zero-based. So, we retrieve the first value, which just happens to be a URL (guaranteed, because our Template matched on class).
String pageContent = getPageContent(page);
Next, we get the page's content (the implementation is available in the tutorial/ folder).
tupleSpace.addToSpace(new SimpleTuple(page, pageContent));
Finally, we return whatever we downloaded to the space.
Simple, isn't it ?
The second part of our spec is the part where we read each downloaded page, and extract the relevant information. In this case, we're going to wait for a [URL, String] tuple to become available.
// Retrieve a URL and content from the space Tuple tuple = tupleSpace.takeFromSpace(new SimpleTemplate(URL.class, String.class));
Again, we retrieve something from the space. In this case, we're interested only in a tuple that has a URL as the first item, and a string as the second item. Other tuples will never be returned.
// Get the values from the Tuple URL url = (URL) tuple.getValueAt(0); String pageContent = (String) tuple.getValueAt(1);
Data extraction phase…
// Create the report Report report = createReport(url, pageContent);
And we create the report.
// Add the report only if one was created if (null != report) { tupleSpace.addToSpace(new SimpleTuple(report)); }
Finally, we add the report back to the space, but only if one was created.
We now need an object that will configure and instantiate everything, plus start each thread. The following would work nicely:
// Instantiate a TupleSpace TupleSpace space = new SimpleTupleSpace();
We start by instantiating a TupleSpace. The SimpleTupleSpace implementation is non-distributed and non-copying. This means that the objects that we put in the space are the exact same references that we're going to retrieve. In this case, this is not very important, as we are dealing with value objects, but it could have it's importance for reference objects. See the javadocs for details.
// Instantiate a downloader thread Thread downloaderThread = new Thread( new UnitOfWorkRunner(new UrlDownloaderUnitOfWork(), space)); downloaderThread.setDaemon(true); downloaderThread.start();
Now, we instantiate a new thread, which will download the pages. Notice that we are marking this thread as being a daemon. This is to allow us to terminate the application when the main thread terminates. We also start the thread.
// Instantiate a reporter thread Thread reporterThread = new Thread( new UnitOfWorkRunner(new ReporterUnitOfWork(), space)); reporterThread.setDaemon(true); reporterThread.start();
Now, we're doing the same, but with the reporter thread instead.
// Start the download space.addToSpace(new SimpleTuple(new URL(args[0])));
Finally, we're making a URL object available in the space. Some time later (this is non-deterministic, but should be a small amount of time), the downloader thread will be notified that an URL is available. The thread will wake up, download the page, and will put the content back into the space. At that time, the downloader thread will go back to sleep, waiting for another URL.
The same thing will happen with the reporter thread - wait for a [URL, String] tuple, process it, and return a result to the space.
// Wait until the report is available Tuple reportTuple = space.takeFromSpaceWithTimeout( new SimpleTemplate(Report.class), ONE_MINUTE_AS_MILLIS); if (null == reportTuple) { throw new ReportUnavailableException( "Could not create report withing one minute"); }
Continuing our example, we now wait for the report. The
Template
we are instantiating will
only match if it finds a Tuple
with a
single Report
element.
In case something goes horribly wrong, and the report is not available in time, we throw an exception to inform callers that the report could not be created.
// Retrieve the tuple's values Report report = (Report) reportTuple.getValueAt(0);
As usual, we retrieve the value from the tuple
// Format and print the report System.out.println( report.getPage() + "'s title is: " + report.getPageTitle());
And we format the report for screen.
This example is very basic, but it neatly solves multiple problems.
For example, downloading the page is not very CPU intensive, but
creating the report is. So, we might have two or three
ReporterUnitOfWork
running at the same
time, while we could have ten
UrlDownloaderUnitOfWork
running
simultaneously.