Tutorial

Prerequesites

This tutorial assumes that you have downloaded the source distribution and unzipped/ungzipped it to some folder.

The examples assume this path is $TUPLESPACE_HOME/tutorial or %TUPLESPACE_HOME%\tutorial, which is set to C:\tuplespace in this tutorial.

You will need Java 2 SDK 1.4 or better to start the examples, as well as Ant 1.6 or better. Other libraries (JUnit and jMock) are already included in the lib/ folder..

What application are we going to write ?

Let's say we have to write some kind of web crawler. The crawler needs to download a page, transform it, extract some information, and finally, post the report.

Traditionnaly, one would write such an application in a single-threaded manner. Something like this:

A single-threaded view of
            the example application.  A URL comes at the top, is downloaded,
            information is extracted, and the results are posted at the end

The problem with such an architecture is that we are going to spend most of our time waiting for the network. While the thread is waiting for the downloaded information, another one could be reading the previous document, extracting the necessary information. This would enhance performance tremendously.

This application architecture would be vastly more complex, though. Each thread would need to make sure that it is not modifying a data structure that another thread is reading. Sure, Java has the synchronized primitive, and also has the wait() and notify() methods. These are good, but they are too low-level for most programmers.

The new java.util.concurrent package in JDK 1.5 will allow programmers to write code at a higher abstraction level, lowering the number of synchronization bugs.

For those that are working on pre-JDK 1.5 JREs, there is always the original library that became java.util.concurrent: util.concurrent, by Doug Lea.

Unfortunately, these libraries are huge ! util.concurrent alone contains over 50 public classes. This is unwieldy.

TupleSpace aims to provide a dead-simple, can't go wrong view of the world. The TupleSpace interface contains three methods:

That's it ! It can't be simpler than that.

What is a TupleSpace ?

A TupleSpace is a region (of memory, disk space, network, etc.) which contains Tuples. Tuples are ordered collections of elements, which need not be the same type. A TupleSpace allows coupling to be reduced in an application. Threads wait for an object of the appropriate type to be available, and will receive it when it is.

This reduces coupling because all objects are talking to a common TupleSpace, and not to each other.

So, let's start coding…

Writing the download thread

We're going to implement UnitOfWork to create the thread will download documents.

Let's say the specs call for returning the page's textual content. This means we'll need to download the page and create a string with the full contents of the page.

So, let's start to implement the unit of work:

// Wait for a URL to become available in the space
Tuple tuple = tupleSpace.takeFromSpace(new SimpleTemplate(URL.class));

Here, we're instantiating a Template implementation. This implementation matches tuples by wildcards. A type wildcard is indicated by providing an object's class, and not a value. So, we are asking the space to return whenever it finds a tuple with a single element, whose first element is a URL class or subclass.

// Get the actual URL
URL page = (URL) tuple.getValueAt(0);

Now that we've retrieved the tuple, we need the tuple's values. Tuples are like Java arrays, which are zero-based. So, we retrieve the first value, which just happens to be a URL (guaranteed, because our Template matched on class).

String pageContent = getPageContent(page);

Next, we get the page's content (the implementation is available in the tutorial/ folder).

tupleSpace.addToSpace(new SimpleTuple(page, pageContent));

Finally, we return whatever we downloaded to the space.

Simple, isn't it ?

Writing the reporter

The second part of our spec is the part where we read each downloaded page, and extract the relevant information. In this case, we're going to wait for a [URL, String] tuple to become available.

// Retrieve a URL and content from the space
Tuple tuple = tupleSpace.takeFromSpace(new SimpleTemplate(URL.class,
        String.class));

Again, we retrieve something from the space. In this case, we're interested only in a tuple that has a URL as the first item, and a string as the second item. Other tuples will never be returned.

// Get the values from the Tuple
URL url = (URL) tuple.getValueAt(0);
String pageContent = (String) tuple.getValueAt(1);

Data extraction phase…

// Create the report
Report report = createReport(url, pageContent);

And we create the report.

// Add the report only if one was created
if (null != report) {
    tupleSpace.addToSpace(new SimpleTuple(report));
}

Finally, we add the report back to the space, but only if one was created.

Linking it all together

We now need an object that will configure and instantiate everything, plus start each thread. The following would work nicely:

// Instantiate a TupleSpace
TupleSpace space = new SimpleTupleSpace();

We start by instantiating a TupleSpace. The SimpleTupleSpace implementation is non-distributed and non-copying. This means that the objects that we put in the space are the exact same references that we're going to retrieve. In this case, this is not very important, as we are dealing with value objects, but it could have it's importance for reference objects. See the javadocs for details.

// Instantiate a downloader thread
Thread downloaderThread = new Thread(
        new UnitOfWorkRunner(new UrlDownloaderUnitOfWork(), space));
downloaderThread.setDaemon(true);
downloaderThread.start();

Now, we instantiate a new thread, which will download the pages. Notice that we are marking this thread as being a daemon. This is to allow us to terminate the application when the main thread terminates. We also start the thread.

// Instantiate a reporter thread
Thread reporterThread = new Thread(
        new UnitOfWorkRunner(new ReporterUnitOfWork(), space));
reporterThread.setDaemon(true);
reporterThread.start();

Now, we're doing the same, but with the reporter thread instead.

// Start the download
space.addToSpace(new SimpleTuple(new URL(args[0])));

Finally, we're making a URL object available in the space. Some time later (this is non-deterministic, but should be a small amount of time), the downloader thread will be notified that an URL is available. The thread will wake up, download the page, and will put the content back into the space. At that time, the downloader thread will go back to sleep, waiting for another URL.

The same thing will happen with the reporter thread - wait for a [URL, String] tuple, process it, and return a result to the space.

// Wait until the report is available
Tuple reportTuple = space.takeFromSpaceWithTimeout(
        new SimpleTemplate(Report.class), ONE_MINUTE_AS_MILLIS);
        if (null == reportTuple) {
            throw new ReportUnavailableException(
                    "Could not create report withing one minute");
        }

Continuing our example, we now wait for the report. The Template we are instantiating will only match if it finds a Tuple with a single Report element.

In case something goes horribly wrong, and the report is not available in time, we throw an exception to inform callers that the report could not be created.

// Retrieve the tuple's values
Report report = (Report) reportTuple.getValueAt(0);

As usual, we retrieve the value from the tuple

// Format and print the report
System.out.println(
        report.getPage() + "'s title is: " + report.getPageTitle());

And we format the report for screen.

Other things to try…

This example is very basic, but it neatly solves multiple problems. For example, downloading the page is not very CPU intensive, but creating the report is. So, we might have two or three ReporterUnitOfWork running at the same time, while we could have ten UrlDownloaderUnitOfWork running simultaneously.

SourceForge.net Logo