DIET Agents tutorial

Author: Erwin Bonsma
Revision: 1.26
Date: 2004/07/08 15:42:40

Introduction

About this tutorial

The aim of this tutorial is to provide you with a general idea of how to program using the DIET Agents platform. It describes the main concepts used in DIET and how these fit together. It should also give you an idea of the DIET design philosophy. You should use this tutorial in combination with the API documentation and the sample application that are part of the standard DIET Agents platform.

This tutorial introduces you to the main DIET classes and their most important methods. If you want detailed knowledge of specific DIET classes and/or methods, you should still refer to the API.

The tutorial makes minimal use of example code. This hopefully enhances the readability of the tutorial. It should also reduce the risk that the tutorial becomes out of date and make it easier to maintain. If you want detailed implementation details of a specific feature, you can look at how it has been done in the sample code. At many places in the tutorial references are made to sample applications that illustrate how to implement particular features.

DIET background

DIET Agents is a novel platform for developing agent-based applications. It was created as part of the EU-funded DIET project, where DIET stands for Decentralised Information Ecosystem Technologies. The DIET project was part of wider the Universal Information Ecosystem Initiative. DIET's "bottom-up", "ecosystem-inspired" design makes the development possible of scalable and adaptive systems, that are robust to failure. On the other hand, programming in DIET requires a significant mind-shift from the traditional "top-down", centralized approaches used in many existing frameworks for agent applications.

DIET Philosophy

The DIET Agents platform has been designed to be scalable, robust and adaptive using a "bottom-up" design approach:

It is scalable at a local and at a global level. Local scalability is achieved because DIET agents can be very lightweight. This makes it possible to run large numbers of agents, up to several hundred thousands, in a single machine. DIET is also globally scalable, because the architecture is such that it does not impose any constraints on the size of distributed DIET applications. This is mainly achieved because the architecture is fully decentralised, thus not imposing any centralised bottlenecks.
It is robust and supports adaptive applications. The DIET kernel itself is robust to hardware failure and/or system overload. The effects of these failures are localised, and the kernel provides feedback when failure occurs allowing applications to adapt accordingly. The decentralised nature of DIET also makes the platform less susceptible to failure.
It is based on a bottom-up, nature-inspired design approach. DIET agents are not assumed to be highly intelligent and/or to use complex communication protocols. Instead, agents can be very small and simple, allowing intelligent behaviour to emerge from the interactions between large numbers of agents.

Using DIET does, of course, not guarantee that your application is scalable, robust and adaptive. The DIET platform supports these features, but you still have to design your application carefully to ensure it is scalable, robust and adaptive. If you use DIET to write a client-server application, the application will be as scalable and robust as (the access to) the server is. For more details about the "DIET approach" in agent system design, please refer to:

Cefn Hoile, Fang Wang, Erwin Bonsma and Paul Marrow, "Core Specification and Experiments in DIET: A Decentralised Ecosystem-inspired Mobile Agent System", Proc. 1st Int. Conf. on Autonomous Agents and Multi-Agent Systems (AAMAS2002), pp. 623-630, July 2002, Bologna, Italy

P. Marrow, M. Koubarakis, R.H. van Lengen, F. Valverde-Albacete, E. Bonsma, J. Cid-Suerio, A.R. Figueiras-Vidal, A. Gallardo-Antolín, C. Hoile, T. Koutris, H. Molina-Bulla, A. Navia-Vázquez, P. Raftopoulou, N. Skarmeas, C. Tryfonopoulos, F. Wang, C. Xiruhaki, "Agents in Decentralised Information Ecosystems: the DIET Approach", Proc. of the AISB’01 Symposium on Information Agents for Electronic Commerce, pp. 109-117, 2001, York, UK

Layered architecture

The DIET Agents platform is designed as a three-layer architecture:

The core layer is the lowest layer. It provides the minimal software needed to implement multi-agent functionality in the DIET framework. This is through the DIET platform kernel, which provides the underlying "physics" of the DIET ecosystem. It also includes basic support for debugging and visualisation.
The ARC layer contains Application Reusable Components. These components provide functionality that does not need to be in the core layer, yet can be used by different applications. It includes, amongst others:
- various schemes for remote communication,
- a framework for pluggable agent behaviours and
- support for scheduling events for execution at a later time.
The application layer is the top layer. It contains code specific to particular applications, along with debugging and visualisation code that may also be specific to these applications. The platform includes a dozen small sample applications, each demonstrating specific features of the platform.

The core layer

Element hierarchy

The basic classes and interfaces in the com.btexact.diet.core package define five fundamental elements. These elements are arranged in the following hierarchy:

worlds,
environments,
agents,
connections and
messages.

At the heart of this conceptual hierarchy are the agents, which all execute autonomously. Agents in the core are designed to be very lightweight. An agent only has minimal capabilities to execute and to communicate.

Each agent resides in an environment. Environments implement the DIET "physics", enabling agent creation, agent destruction, agent communication and agent migration. An environment can host large numbers of agents. The execution time of each function provided by an environment does not go up when the number of agents increases but stays constant.

A world is a placeholder for environments. It manages functionality that can be shared by environments, such as agent migration to other worlds. A world is also the access point for attaching debugging and visualisation components to the DIET platform. A DIET application typically creates a single world, and does so during start-up (distributed applications obviously create one world per machine). So usually there is only one world per JVM. One notable exception is applets that are running in a browser, The applets share the same JVM. So when each applet creates its own world, which is recommended for security reasons, there are multiple worlds in a JVM.

Agents can communicate with other agents in the same environment. To do so, they create connections. A connection is a bi-directional communication channel between a pair of agents. After a connection has been set up, agents can use it to pass messages. The DIET platform does not enforce a particular communication protocol. It provides agents with the ability to exchange text messages and optionally objects. This allows each agent to use a protocol most suited to its functionality and capabilities.

Agent addressing

An agent is uniquely specified by its address. An agent address consists of two parts:

An environment address. This is the address of the environment that the agent resides in. It changes when the agent migrates to a different environment.
An agent identity. This identity is established when the agent is created, and remains fixed throughout its lifetime. It consists of two parts:
- a name tag, and
- a family tag.

A name tag is the part of an agent's identity that uniquely identifies it. The name tag makes it possible for an agent to restore contact with a specific agent. Name tags are randomly generated by the DIET kernel. The only thing that an agent needs to specify, by overriding the com.btexact.diet.core.imp.BasicAgent#getNameTagLength method, is the length of the name tag in bits. In general, 128 bits is a good length. It does not require too much memory and makes identity clashes extremely unlikely. When name tags are 128 bits long, the probability that two agents in a group of one million have an identical name tag is 1.5e-26 (under the assumption that the process of generating name tags is perfectly random).

Sometimes a different name tag length can be useful. For example, if you specify a name tag length of zero bits, there is only one name tag possible. So an agent with a given family tag and zero length name tag can only be created when such an agent does not yet exist. When this is done on start-up, no other agent can be created or migrate into the environment with this identity. Therefore, when this mechanism is used, incoming agents can connect to this agent and be sure that it is a local agent and not a, potentially malicious, agent from elsewhere.

Family tags can be used to identify agents by their functionality. Agents that offer the same functionality typically have identical family tags. Agents can connect to another agent in its environment by only specifying the required family tag. If an agent with this family tag exists, a connection is created between both agents. When there are multiple agents with the same family tag, the agent that initiated the connection is connected to one of them, chosen at random.

You specify the family tag for an agent by implementing the com.btexact.diet.core.imp.BasicAgent#getFamilyTag method. This method is called only once, when the agent is created. The agent's family tag is then fixed throughout its lifetime to the value that was returned. So, it is no use returning a family tag that depends on the state of the agent. It won't work. There is no way that you can change the identity of an agent after it has been created.

Often family tags are constructed based on the name of the class that implements the agent. The com.btexact.diet.core.Tag#Tag(String, int) constructor can be used for this, as it constructs a tag from a text string. The length of the tag needs to be chosen as well. Generally, 128 bits is a good length. It makes accidental clashes (two family tags based on different classnames that are the same) very unlikely, yet does not require too much memory. However, occasionally it makes sense to use a shorter family tag, e.g. 32 bits. This is, for instance, done in the sorting sample application. Here, we wanted the application to be able to support several hundred thousand agents on an ordinary desktop machine. With these numbers, any reduction in the memory used by an individual agent is significant.

Family tags are not necessarily based on classnames. For example, imagine an application where some agents are responsible for hosting files. The application is designed such that there is one agent for each file. If an agent wants to access a file, it needs to contact the agent that hosts the file, not anyone of the file-hosting agents. In this case, it makes sense to base the family tag on the filename. It is still recommended to base the family tag on the classname as well, e.g. by XOR-ing both strings, so that there is not a name clash when other agents want to associate a different service with the same file. Sometimes it can even be useful to randomly choose a family tag. In DIET, agents cannot connect to an agent if they do not know its family tag. Therefore, randomly generating a family tag is a simple and efficient way of restricting access.

How to configure the rest of your agent, and how to create it is described in the next section.

Agent configuration and creation

Agents are created using prototyping instead of using constructors. To let the application create agents directly you use: com.btexact.diet.core.imp.BasicEnvironment#create. To let an agent create another agent you should use: com.btexact.diet.core.Environment#create. Both methods are similar and take two arguments. The first, prototype, specifies the type of agent to create. The second, params, are the parameters that are used to configure the agent. The prototype parameter should be an empty, uninitialised instance of the class of agent that you want to create. Cloning is used to create a new instance, which is subsequently initialised using the parameters that are provided.

Prototyping is used because an agent should never interact with another agent by directly invoking its methods. Doing so would create security problems. For instance, if an agent has a direct reference to another agent, it could simply invoke its destroyMe method to kill the other agent. Secondly, it would create various multi-threading issues including inconsistent object states and execution deadlocks. The prototype creation mechanism ensures that a reference to an agent is only known by the DIET kernel and by the agent itself, but not by any other agents (not even the agent that created it).

The use of prototyping for agent creation means that you should not initialise member variables in the agent's constructor or using field initialisers. For instance, it is typically wrong to declare a member variable as follows:

  List  my_list = new ArrayList();

If you do so, all instances of that agent class would refer to the same list, instead of having their own instance. This is probably not what you want and, unless you synchronize access to the list, would also create multi-threading problems. You should also not initialise the list in the constructor, as this would create a similar problems.

The proper place to initialise an agent's member variables is in its com.btexact.diet.core.imp.BasicAgent#initialise method. This method takes a single argument, which is used to specify parameters for configuring the agent. The argument must be an instance of the com.btexact.diet.core.imp.BasicAgent.Params member class. This class has few parameters that can be used to configure any BasicAgent. Three parameters specify the size of each of the agent's event buffers (more about these buffers when event handling is described). There is also a parameter for specifying the agent's so called "friendly name". This name can be used as a human-readable name for the agent, for debugging and visualisation purposes.

Usually, when you implement your own agent you would like to use one or more additional parameters, specific to its functionality. For example, for agents that periodically perform a task, you probably want to specify the task period. You do so by extending the com.btexact.diet.core.imp.BasicAgent.Params class and adding member variables for each parameter you want to add. Subsequently, in the initialise method you cast the method argument to the right class, and use it to initialise the agent's member variables.

Finally, you should provide a static prototype field for your agent, which is the prototype agent that can be used to create new instances of your agent:

  public static final MyAgent  PROTOTYPE = new MyAgent();

It is also good practice to provide an empty constructor as follows:

  protected MyAgent() {}

This prevents a similar constructor with public access being created. It is good not to have any public constructors, as it makes others aware that they should use a different mechanism for creating new agents.

Agent start-up and close-down

The agent's initialise method is called when the agent is being created. This happens before the agent is put in a DIET environment. So during initialisation the agent cannot yet interact with other agents. If the agent wants to initiate any actions after it has been created it should do so in its com.btexact.diet.core.imp.BasicAgent#startUp method. This method is called when the agent starts up. This is after it has just been successfully created, and also when it has arrived in a new environment after successful migration. By default, an agent does nothing when it starts up, but you can override the startUp method to initiate active behaviour. For instance, if an agent relies on services provided by another agent, it can connect to this "service provider" when it starts up.

Complementary to the startUp method, there is an com.btexact.diet.core.imp.BasicAgent#closeDown method. It is called just before the agent is destroyed and just before the agent migrates to another environment. You can override this method to provide any final clean-up. If the agent still has any connections to other agents, it could gracefully terminate these interactions, for instance by sending some kind of "goodbye" message. However, often this is not necessary, as the kernel will automatically disconnect all connections that are still open after the agent has closed down. So as long as the agents at the other end of the connections can cope with these sudden disconnections, there is no need to explicitly close down existing connections in the closeDown method.

Basic kernel actions

The DIET kernel provides agents with four actions that they can perform in the DIET universe. Each of these are accessible through its environment, which the agent can access using com.btexact.diet.core.imp.BasicAgent#getEnvironment. The environment provides functionality enabling each agent to:

... communicate with other agents in its environment. It does so by first creating a connection to the other agent, using one of the com.btexact.diet.core.Environment#connectMe methods (either specifying only a family tag, or the complete identity of the other agent). Subsequently, it can send messages using the com.btexact.diet.core.Connection#send method. Each message is an instance of com.btexact.diet.core.Message and consists of a text string, and an optional object. There are no restrictions on the protocol that agents use.
... migrate to other environments. It does this by using the com.btexact.diet.core.Environment#migrateMe method, which requires the address of the destination address. The agent may have gotten this address from com.btexact.diet.core.Environment#getNeighbouringEnvironment, but this is not required. Agents can migrate to any environment, as long as they know the address.
... create other agents. It can do so using the com.btexact.diet.core.Environment#create method. You need to specify the agent prototype, which determines the type of agent, and a parameter object, which contains all its configuration settings.
... self-destruct. When an agent is not required anymore, it can destroy itself by calling the com.btexact.diet.core.Environment#destroyMe method.

The kernel implementation for each of these actions is "resource constrained" and "fail-fast". The kernel actions are resource constrained because there are explicit limits on the resources that they can use. For example, threads are a constrained resource. The number of threads that are used by agents in a DIET world is limited (to a number specified by the user when the world is created). The kernel actions are fail-fast because when an action cannot be executed instantaneously, it fails immediately. The kernel does not retry the actions later and/or block execution until it has successfully executed the action. So when an attempt is made to create a new agent, but there is no thread available for it, this will fail.

The fail-fast, resource constrained implementation of the kernel actions protects the system against overload. For example, the buffer where each agent receives incoming messages is of limited size. When an agent attempts to send a message to an agent whose message buffer is already full, the message is rejected. If this would not happen, there are for instance problems when a service-providing agent is processing messages more slowly than the rate at which these message arrive. Its buffer of incoming messages would constantly grow, and eventually the JVM would run out of memory (although this may take a while). A more immediate effect is that as the number of pending messages grows, the time it takes before a reply is received to each message goes up as well. This may mean that the reply may be too late and obsolete by the time it is received. Although you could use watchdog timers and/or check if a reply is still needed before handling a message, this can be quite cumbersome. Furthermore, unless the system load is reduced, it is inevitable that messages need to be dropped. Limiting the size of the agent event buffers is a simple yet effective way to cope with overload. As long as the system load is low, it has no effect. When the system becomes overloaded, it offers basic protection and allows agents to rapidly adapt their behaviour accordingly.

Agents can quickly adapt to overload because the actions fail directly, and the agents receive feedback when this happens. The kernel provides feedback by throwing an exception when an agent has requested an action that the kernel cannot fulfil. The agent can then adapt its behaviour. For instance, it can lower its active behaviour. The Trigger agents in the sorting sample application for instance do so after an attempt to send a message has failed.

Event handling

Agents are made aware of actions by other agents by way of events. The kernel supports three different types of events. These are used to notify an agent that:

a new connection has been created to it,
it has received a message sent along one of its connections, or
one of its connections has been disconnected.

These events are called external events because they are all generated by other agents. Incoming events are stored in event buffers, waiting for the agent to handle each event (in the agent's thread). There are separate buffers for each of these three event types. Next to these buffers, each agent uses an event portal for managing these events. The event portal enables the agent to block execution until an external event occurs. The default implementation of BasicAgent manages the external event buffers and blocks execution when there are temporarily no events ready to be handled. This behaviour is sufficient most of the time. What you still need to do when developing your own agent is to respond to each event appropriately. You do so by overriding the various event handling methods, as is discussed in the next three subsections.

Handling messages

Most agents need to be able to handle incoming messages. They can do so by sending a reply message, performing another action, changing their state or any combination of these. An agent responds to messages in its com.btexact.diet.core.imp.BasicAgent#handleMessage method.

A simple implementation of handleMessage can be found in the PrimeChecker agent in the primes sample application. The agent can handle "is-prime?" messages. When it receives such a message it calculates if the attached number is a prime number, and either replies "yes" or "no". In the case of the PrimeChecker, its replies do not depend on the agent's state.

A second, somewhat more complicated, implementation of handleMessage can be found in the Linker agent in the sorting sample application. The agent can handle different types of messages. It can reply to queries about its current links. It can also forward messages across its links or update its links in response to incoming messages. Here the state of the agent, which includes its current links, affects how it handles messages. However, how it handles a message is independent of who sent it.

Sometimes agents need to associate a state with a connection to determine how to handle incoming messages. A simple example is an agent that can calculate running sums. When other agents connect to it, they can send messages with an associated integer value. In response to any message, the RunningSum agent will reply with a message containing the sum of values that it has received over the connection so far. This means that the agent needs to maintain a running sum for each of its current connections. Where should it do this? Storing the running sums in a look-up table, using the identities of the client agents as a key, seems natural. However, this is not the most efficient with respect to memory usage and access time. Furthermore, it would go wrong when one or more agents have multiple connections to the RunningSum agent.

A better approach is to use connection contexts. The com.btexact.diet.core.Connection#setContext and com.btexact.diet.core.Connection#getContext methods can be used to do so. The context is a local state that an agent associates with a specific connection. It can use the context to decide what to do when handling events related to the connection. In the case of the RunningSum agent, each connection context would contain an integer value representing the current value of the sum. When a new message is received, the agent has immediate access to the appropriate running sum. It can update it and immediately sent the reply, without the need to use a look-up table.

An additional advantage of using contexts instead of look-up tables is that agents do not need to clean up state after a disconnection. As long as the state is maintained only as a context with the connection it is automatically available for garbage collection when the connection is disconnected. If, on the other hand, the state is be maintained in a look-up table, it needs to be explicitly removed from the table, otherwise it will continue to take up space. Therefore, it is recommended to use contexts whenever possible. Sometimes you cannot do so, as is discussed in the next section.

When implementing the handleMessage method, you need to decide which agent disconnects the connection along which the message is sent, and when. For instance, if you use a "query-reply" protocol the agent that receives the query can disconnect the connection after it has sent the reply. Alternatively, the other agent could disable the connection after it has received the reply. Both would work, and there is not much difference between these approaches. However, you should ensure that at least one agent disconnects the connection. Otherwise the connection could remain open indefinitely. There is a limit on the number of connections an agent owns. An agent's owned connections are its currently active connections that it initiated itself. Therefore, if a connection is not disconnected, some agents would eventually not be able to open new connections anymore. You need to take particular care to ensure that connections are disconnected when failure occurs. For instance, if an agent has successfully opened a connection but fails to send the query message it should either retry sending the query at a later moment, or disconnect the connection immediately.

Handling disconnections

Some agents need to maintain state with connections "outside" the context associated with the connection. For instance, imagine a DatagramTransceiver agent that maintains a UDP socket. You would want it to send UDP packets on behalf of other agents, its clients. So clients can connect to the DatagramTransceiver, send it the UDP packet they want to send and it would send the UDP packet using the socket. However, it would be nice if clients could also receive UDP packets. To do so, the DatagramTransceiver can associate a unique ID with each client. All incoming UDP packets would have a client ID associated with it. The DatagramTransceiver agent would send the UDP packet across the connection associated with the client with that ID. To maintain this, it would maintain a look-up table where each key is a client ID, and the value is the connection to the corresponding client. This will work fine. However, if you implement the DatagramTransceiver agent you have to make sure that when a client disconnects, its entry gets removed from the look-up table.

To do so, you should override the com.btexact.diet.core.imp.BasicAgent#handleDisconnection method. It gets called on an agent when another agent that was connected to it, disconnected the connection. You can then perform any necessary clean-up. For example, in the case of the DatagramTransceiver, you would remove the client entry from the look-up table. It would still be useful to associate a context with each client, containing the client's ID. Using the client ID, you can efficiently remove the client's entry from the look-up table.

Implementing #handleDisconnection on its own it not sufficient to ensure that state associated with connections is always cleaned up after disconnection. As was mentioned earlier, the buffers where the agent's incoming events are stored (before they are handled) are of limited size. It may therefore happen, when many disconnections happen at once and/or the CPU is very heavily utilised, that the disconnection event buffer overflows. When this happens, the handleDisconnection method is not called for these missed events. The easiest way to cope with these missed disconnections is to prevent them. Since Version 0.94 of the platform, it is possible to use disconnection event buffers with unlimited capacity. At first sight, this seems to go against the resource-contrained, fail-fast nature of the basic kernel actions, and thus loose the associated benefits. Luckily, this is not the case. The number of disconnection events that an agent may receive, is implicitly limited by the number of connections it maintains, which is something the agent can fully control.

So, for agents that associate state with connections outside the connection's context, which therefore need to handle disconnection events to ensure this state is cleaned up, the recommended approach is to give them an unlimited disconnection event buffer. When these agents limit the number of connections they handle concurrently, they will automatically limit the number of events in their disconnection event buffer. Note, agents should never use "infinitely" sized message and/or connection buffers, except maybe for debugging. For these buffers, agents have no control over the number of events in it. So without limiting these buffers, there is no protection against overload. Latency can go up unacceptably and the system can run out of memory.

It is possible to handle all disconnections while still using a disconnection event buffer with a limited size. In this case, an agent can check if it has missed any events by examining the "rejected elements count" of the buffer. If it is not zero, it can go over all of its client connections, which are maintained in its look-up table. For each connection it checks if it is still enabled, and if not, cleans up the state associated with the connection. The rejected elements count can subsequently be reset to zero. In fact, you should actually reset the rejected element count when you check if it is greater than zero, using com.btexact.diet.core.imp.BufferWithRejection#clearNumRejectedElements as follows:

  if (getDisconnectionBuffer().clearNumRejectedElements() > 0) {
    // Iterate over all client connections, and clean-up those that
    // have been disconnected.
  }

Otherwise you run the risk that you failed to clean-up all dead client connections. This could happen if the rejected element count goes up while you are iterating over the client connections.

There is yet one more thing to be aware of when handling "missed" disconnection events this way. It is possible that a connection is cleaned up a little prematurely. More specifically, it can cause the clean-up of connections, that still have one or more pending events associated with it. This is something that is unavoidable given the multi-threading model that is used. As a result though, agents agents may ocassionally be unable to properly handle a message, because the state associated with the connection has been cleaned up already. Modifying the agent protocol can help to cope with this. For instance, you may constrain which agent can disconnect and when, in order to guarantee that all messages sent along the connection are handled properly. Of course, agents may still disconnect prematurely, but in this case, agents cannot assume that all the messages they have sent have indeed been handled.

Handling connections

When an agent wants to respond to new connections that have been created to it, it can do so by overriding the com.btexact.diet.core.imp.BasicAgent#handleConnection method. In practice, this is not done very often, as there are not too many uses for it. You can, however, use it when you have a simple agent that is connected to a sensor. If all it does is notifying other agents of the current sensor reading, it could send out the current value as soon as another agent connects to it. This would have the (minor) advantage that the other agent does not need to send a query message.

You could also override the handleConnection method of an agent if you would like to restrict which agents can connect to it (note that if you, maybe temporarily, do not want any agents to connect to it, you should use com.btexact.diet.core.imp.BasicAgent#setAcceptConnections). In the handleConnection you can examine the identity of the agent that has just connected. If it does not meet your specific requirements (e.g. it does not use the "secret" family tag), you can disconnect the connection. Be aware, the agent may already have sent one or more messages. Replying to those messages will fail, because the connection has been disabled by now. Despite that, it is still recommended to check in the handleMessage method if the connection along which a message was received is enabled before handling the message. If nothing else, it would avoid unnecessary work.

Thread use

Agents are running autonomously. Each runs in its own thread. The DIET kernel ensures that an agent's thread is safe from malicious intervention by other agents. Messages are, for instance, passed asynchronously. When an agent sends a message, it is impossible for the receiving agent to use or even block the thread of the sending agent.

Agents are designed such that each is using only one single thread (at most). This means that when one of the agent's event handling methods is called (such as handleMessage), the agent is never simultaneously handling other events. This is convenient because it means that within an agent, no synchronization is required. So, for example, when an agent uses a look-up table for storing/retrieving values in response to messages, access to the table does not need to be synchronized.

Since agents only have one thread, they should generally not let the thread sleep (using Thread.currentThread().sleep()). The following example illustrates what not to do. Imagine you want to implement an AlarmClock agent. It can handle "wake-up call request" messages, which take a a single argument indicating the number of seconds after which the other agent should be sent a "wake-up call" message. If the AlarmClock agent receives a request to wake up another agent in delay seconds, it should not go to sleep for delay seconds and then reply with the "wake-up call" message. If it would do so, it would not be able to handle any other messages in the meantime! Instead, the AlarmClock agent should use a Scheduler for managing all wake-up calls. This is an ARC-layer component, which is described in more detail later.

The only occasion when an agent can safely let its thread sleep is when it does not need to handle any incoming events. This is for instance the case for the Trigger agent in the sorting application, and the Migrator agent in the migrate application. Both execute a periodic active behaviour, but do not accept any connections and messages.

You should take care not to manipulate an agent using more than one thread, as they have not been designed for that. It would lead to occasional problems that are hard to track down. So in general, agents should not create their own thread(s) but only use the one provided by the kernel. Furthermore, you should not manipulate agents directly from the application's thread(s), for instance in response to a GUI interaction. There are potential concurrency problems when you do so, as the agent may be doing stuff in its own thread simultaneously. Occasionally, however, you want to manipulate agents from your application. You can do this, safely, using the ExternalControl component provided in the ARC-layer, and described in a later section.

Agents can temporarily give up their thread when they do not need it. They can do so simply by returning from their com.btexact.diet.core.imp.BasicAgent#doRun method instead of executing an "endless" loop:

  protected void doRun() {
    while (getExternalEventPortal().eventReady()) {
      update();
    }
  }

This code lets the agent retain its thread as long as there are any external events, for instance incoming messages, ready to be handled. As soon as there are no events ready, the event handling loop is exited and the agent gives up its thread. The DIET kernel will then attempt to give the agent a thread in response to subsequent external events. However, as the number of threads is limited, this is not guaranteed to succeed. When all available threads are in use, the agent does not get thread and the event is rejected. So, if an attempt was made to send the agent a message, the message is rejected. The advantage of not having to allocate a thread for each agent is that a single DIET world can support large numbers of agents: up to several hundred thousands. It would be impossible to reach such high numbers otherwise, as the number of threads that a JVM can typically support is considerably lower. Even though agents can give up their thread, at no time is a thread shared by more than one agent. So agents are still running completely autonomously. Only the number of agents that can run at any moment is limited.

Basic debugging and visualisation

There is basic support for debugging and visualisation built into the kernel. Events can be generated, amongst others, when any of the following occurs:

an environment is added or removed from a world,
a neighbourhood link is added or removed between two environments,
an agent is created or destroyed, or migrates between environments,
a connection is established or broken between two agents,
a message is passed along a connection, or
an agent starts, terminates, suspends or resumes its execution.

In addition, there are events that signal failure, for instance when messages or connections are rejected. As long as there are interested event listeners, the kernel generates these events. Your agent does not need to do anything to enable this.

There is, however, one type of event that agents can generate directly: property events. Property events signal when properties internal to an agent have changed. They therefore allow debugging and visualisation tools to monitor to some extend the internal state of your agent. However, it is still up to you when you implement the agent to decide which part of the agent's state to make externally visible using properties. To add a new property, you have to give it a name and fire a property event whenever the value of the property has changed.

Two types of property can be distinguished: persistent properties and volatile properties. By definition, persistent properties are those for which the agent fires a property event in its com.btexact.diet.core.imp.BasicAgent#fireAllProperties method, and volatile properties are all other properties that the agent supports. Persistent properties have the advantage that debugging and visualisation components can directly retrieve their value as soon as a new agent is created or arrives in an environment. For volatile properties, on the other hand, property listeners only become aware of the property and its value when it first changes value.

In general, if you can make a property persistent by firing a property event with its current value in the fireAllProperties method you should do so. So properties that are based on the agent's state, maintained in its member variables, are typically persistent. Volatile properties are useful to signal specific events that do not directly affect the agent's state, and for which the agent does not have a corresponding member variable. This is for instance the case for the last_sequence_id property supported by the Crawler agent in the sorting sample application. The Crawler supports this property, as it makes it easy for debugging components to track and visualise the sequence of Linkers that the Crawler has so far "crawled" along. It is fired by the Crawler as soon as it receives a new sequence ID. There is, however, no need for the Crawler to remember this ID. So it is not stored in a member variable, and therefore it is also not fired in its fireAllProperties method.

The DIET kernel protects the access to its listening infrastructure. The kernel allows debugging and visualisation components to register as listeners to agents (and their connections). However, agents themselves cannot listen to events. The reason is that allowing them to do so, would enable agents to monitor and even control other agents. The kernel uses a "cookie" to protect access. You can only register a listener to an agent if you know the cookie. This cookie can be obtained from the DIET world, but only before any environments are created. This means that agents cannot obtain it, but "trusted" components created before the DIET world is initialised can.

How to put it all together

After having written one or more agents, you need to put them together in one or more environments to let them execute. This is most conveniently done by subclassing com.btexact.diet.app.shared.BasicApp. Even if you decide not to use this class, it is still a good place to go about initialising a DIET world.

The BasicApp class provides the following functionality:

It provides basic runtime configuration support, using a commandline interface. You can extend it to support extra parameters that are specific to your application. You do so by extending com.btexact.diet.app.shared.BasicAppArgumentParser and overriding the com.btexact.diet.app.shared.BasicApp#createArgumentParser factory method accordingly. You should also override com.btexact.diet.app.shared.BasicApp#helpOptionSummary and com.btexact.diet.app.shared.BasicApp#helpOptionDetails to provide help about the command line arguments that you added.
It creates the world, and optionally enables remote access to the world.
It creates one or more environments, and optionally neighbourhood links between them. By default, it creates a single environment, but you can change this default by overriding com.btexact.diet.app.shared.BasicAppArgumentParser#setDefaultEnvironmentsAndLinks. You can also use the commandline to configure how many environments to create, and how to link them.
It creates one or more thread pools. These pools control how many threads are allocated per environment, and whether or not environments can share the same threads.
It can generate basic output for debugging and visualisation. Basically, you can choose to enable one or more event dumpers. Event dumpers monitor all events of a specific type, and generate very basic output each time such an event occurs.

The minimal thing you have to do when extending BasicApp is overriding the com.btexact.diet.app.shared.BasicApp#createAgents method, to fill the world with agents.

Sample applications

The following three sample applications illustrate various aspects of the DIET core layer:

helloworld is a very minimal application with "Hello world!" functionality. It demonstrates what the minimum is you need to do to create your own DIET application.
migrate is an application that demonstrates agent migration. It can be used to experiment with creating worlds, environments and neighbourhood links. You can run the application across multiple machines, letting agents migrate between them.
sorting is probably the most interesting application out of the three. It uses three different types of agents, and demonstrates how simple, local interactions can be used to build an organised structure: a sorted chain of agents. It also demonstrates how to make the applications robust to system overload, by letting agents adapt in response to failure.

The ARC layer

This section provides a quick introduction to the Application Reusable Component layer. It is quite concise because it only aims to give you an idea of the main functionality provided by the ARC layer. For details on how to use specific components, you should refer to either the API or the sample applications.

Service-providing agents

The ARC layer provides several service-providing agents and jobs. The most important ones are:

com.btexact.diet.arc.services.Carrier. It provides message-based remote communication. A Carrier can carry a message to a remote environment, and deliver it to a specific agent (either specified by a family tag or by a complete identity). Carriers are short-lived. During their lifetime they only carry a single message (and optionally the reply). You create Carriers as and when you need them.
com.btexact.diet.arc.remote.MasterMirrorJob. It provides connection-based remote communication. You can use Mirrors to establish a virtual connection between two agents in different environments. Both agents are locally connected to a Mirror in their environment. As far as the agents connected to the Mirrors are concerned, their connection to the Mirror can be considered as a direct connection to the other, remote agent. It is up to the Mirrors to make the remote communication as transparent as possible, using Carriers to do so. Due to the nature of remote communication, the virtual connection between both remote agents is inevitably different from a local connection. For instance, having successfully sent a message across the connection to the Mirror does not mean that it has been or will be successfully delivered to the remote agent. There may be a network failure, the remote world may have crashed, or the remote agent may have simply disappeared. Mirrors are also created on demand, when a connection to a remote agent is required. The mirror, mirrorchat, and running, sample applications all demonstrates how to use the Mirror's functionality.
com.btexact.diet.arc.remote.MessageChannelProviderJob. Like Mirror agents, a MessageChannelProvider agent also provide connection-based remote communication. The functionality provided by message channels is more low-level than that provided by Mirrors though. In fact, Mirrors use message channels to implement their functionality. Using message channels directly is therefore slightly more efficient. On the other hand, agents need to have to explicit support for message channels to use them. So, use message channels if you want to minimise the communication overhead, and do not mind the extra code that is required in the agents that use them. The channelchat sample application demonstrates how agents cam use message channels for remote communication.
com.btexact.diet.arc.services.DatagramTransceiver. It provides UDP based remote communication. This therefore differs from the remote communication provided by Carriers, Mirrors and MessageChannelProviders, which indirectly all use TCP sockets for communication. TCP is connection-based and more reliable than UDP. However, UDP is more lightweight, and especially suitable when messages are short and fast and efficient message delivery is more important than reliable message delivery. A single DatagramTransceiver agent can manage the remote communication for multiple clients. So, when needed, it is typically created when the world is initialised, and clients connect to it as and when they need to.
com.btexact.diet.arc.services.AlarmClock. It provides the ability to send agents a "wake up call" message at a specified time in the future. It can be used by other agents to give up their thread, even when they want to perform an action some time later. Typically, you do not interact with this agent directly, but do so through the Scheduler interface, as is discussed in the next section.

Scheduling

Some agents want to schedule actions for execution sometime later. One example is if an agent periodically wants to perform an action. Another example is when an agent wants to perform a "time out" check after having sent a message. For instance, if it has not received a reply within 200 ms, it may disconnect and try sending the query elsewhere. As was discussed earlier, an agent should not put its thread to sleep as this will prevent it from handling any external events in the meantime. Instead, it should use the scheduling functionality provided in the ARC layer.

The com.btexact.diet.arc.Scheduler interface should be used for scheduling events for execution in the future. It can be used to schedule multiple events at once. The com.btexact.diet.arc.ScheduleEvent class is the baseclass for schedule events. It includes the time when the event is due, and implements the Runnable interface so that the event can be executed when it is due.

The ARC package provides two different implementations of the Scheduler interface:

com.btexact.diet.arc.SchedulerEventManager is an implementation of the scheduler functionality that is entirely internal to the agent. This event manager is responsible for managing all schedule events, and for awaking the agent when a schedule event is due. If an agent uses this scheduler, it always retains its thread when one or more schedule events are still awaiting execution.
com.btexact.diet.arc.jobs.SchedulerJob implements the scheduler functionality partly "outside" the agent. More specifically, the job manages the schedule events itself, but tries to use an AlarmClock agent to notify the agent when the first event is due. This way, the agent can actually give up its thread, even when one or more schedule events are still awaiting execution.

The primes sample application demonstrates how to use the scheduling functionality. It uses scheduling for periodically initiating "prime checking" session. It also uses scheduling to check if replies to queries have been received in time, and if not, it sends the query to a different agent.

Event managers

Every agent has some event managing capabilities built into it: the ability to respond to external events such as incoming messages. However, some agents also want to be able to handle other types of events, for instance the schedule events introduced in the previous section. The ARC layer defines a more general event managing infrastructure to facilitate this.

The com.btexact.diet.arc.EventManager is the interface that reusable event managers should implement. It can be used by the agent to check if an event is ready, and to handle any such event. It also defines the way that multiple event managers can be combined, which is by cascading them. The first event manager can respond to any event related method call itself, but otherwise can forward the call to the next manager. Therefore, the order in which event managers are chained together affects the priority with which each type of event is handled. However, when there is a low system load, and each event can be handled as soon as it is ready, the order of the event managers does not make much difference.

The com.btexact.diet.arc.EventManagingAgent is the base class for agents that want to use one or more reusable event managers. If you use this agent, you can specify which event managers it should use through its parameters when you create the agent.

Event managers can, amongst others, be used to:

... manage scheduled events and execute each scheduled event when it is due. See for example com.btexact.diet.arc.SchedulerEventManager.
... enable users to externally control agents. The com.btexact.diet.arc.ExternalControlEventManager uses "external control" events to control an agent's behaviour, as is discussed in more detail later.
... handle messages in a different order. By default, messages are handled in the order in which they arrive. However, if there are multiple messages, you could choose to handle the message with the highest priority first. This functionality is provided by the com.btexact.diet.arc.jobs.MessageOrderingJob class. It uses the external message event to sort the message according to their priority. It then generates an "internal message" event to signal that there is a message ready to be handled.

An example of the use of an event manager can be found in the primes sample application. The PrimeMaster agent extends EventManagingAgent in order to use a SchedulerEventManager for scheduling events.

Jobs

Many behaviours are not specific to a single type of agent, for instance connecting to an AlarmClock agent to use a scheduler without necessarily retaining a thread. The ARC layer provides a job infrastructure that supports modular, reusable agent behaviours.

The com.btexact.diet.arc.Job interface needs to be implemented by all reusable behaviours, which are called jobs in short. Jobs are notified of external events that the agent receives through various event handling methods similar to those in com.btexact.diet.core.imp.BasicAgent. Since multiple jobs can be running in parallel, a job should not necessarily handle all events it receives. In general, a job should check if the event is intended for it, and if so, handle it. If not, its event handling method should return false so that one of the other jobs can handle the event.

When you implement a job, you also have to be careful how to handle missed events (which have been rejected because the corresponding event buffer was full). A job should not inspect and clear the rejected element count of the buffers directly. This goes wrong when multiple jobs run in parallel, as some jobs will be unaware that events have been missed. Instead, jobs should handle missed events in the three methods provided especially for this: #missedConnections, #missedMessages and #missedDisconnections.

Jobs have access to the agent's internal state, and through it can control the agent, using the com.btexact.diet.arc.AgentGuts. This interface provides access to the agent's protected methods. By default, all jobs have full access to the agent through its guts. However, agents may use a different implementation of AgentGuts to limit access to certain jobs.

Some jobs are active throughout the lifetime of an agent. Other jobs may only perform a temporary task, and finish when this task is completed. When a job finishes, it should notify its com.btexact.diet.arc.JobManager. If the job is the agent's only job, this will destroy the agent. Otherwise, the job is simply removed, leaving the other jobs to control the agent.

The com.btexact.diet.arc.BasicJob is a baseclass for jobs. It provides the minimal functionality common to all jobs. You typically create new jobs by subclassing from this class.

The com.btexact.diet.arc.JobAgent is an agent that supports reusable jobs. Typically you can use this class directly, without having to subclass it, because you can use jobs and event managers to fully control the agent's behaviour. The jobs and event managers it should use can be specified as parameters when the agent is created.

Agents can compose their behaviour by combining multiple jobs. The com.btexact.diet.arc.SerialJobManager can be used to execute several jobs in sequence. After the first job has finished, it will start the second job, and so on. This can be useful when an agent's behaviour can be split into various stages. For instance, an agent may first perform a random walk (one job) before it starts its main task (another job). The com.btexact.diet.arc.ParallelJobManager can be used to run multiple jobs concurrently. For instance, an agent may use a scheduler job to manage its schedule events, and another job that implements the behaviour specific to the agent which requires scheduling functionality. There is even an abstract com.btexact.diet.arc.SingleJobManager class for managing only a single job. It is useful as a baseclass for jobs that want to "wrap" their functionality around other jobs.

Jobs are being used in quite a few sample applications. First of all, helloworldjob is a job-based implementation of the "Hello world!" application. It is instructive to see how it differs from the helloworld application which is functionally equivalent but does not use jobs. The job sample application demonstrates how to compose fairly complicated agent behaviours from several simple jobs. The priority sample application is also entirely job-based. It for instance uses a subclass of the com.btexact.diet.arc.jobs.MessageOrderingJob to let agents handle messages in a priority-based order.

External control

When the user invokes a command through using a GUI, you may want to let an agent perform an action in response. However, you should not manipulate the agent from the GUI thread, as this introduces multi-threading related bugs. Instead you should control agents externally using the functionality provided in the ARC-layer.

The com.btexact.diet.arc.ExternalControl interface provides a means to control agents outside their own thread. Most importantly, the #invokeLater method can be used to schedule an action which the agent will execute as soon as possible, from its own thread.

The com.btexact.diet.arc.jobs.ExternalControlJob is a job that makes it very easy to let job-based agents support external control. It takes care of everything, including managing the external control events and running them from the agent's thread. It also ensures that an "external-control" property is fired. This allows visualisation and debugging tools to get hold of the external control.

Even if your agent does not support jobs, it can still support external control. It can do so using the com.btexact.diet.arc.jobs.ExternalControlEventManager when it does support event managers, or com.btexact.diet.arc.jobs.BasicExternalControl when the agent does not use event managers either. When using either of these classes, you still need to ensure that the external control is accessible to the application and that the external control events are actually being handled.

How to do so is demonstrated in the zombies sample application. It defines Zombie agents that can be controlled externally through a simple GUI. There are two implementations of the Zombie agent, which are functionally equivalent: a job-based one and one that doesn't use jobs.

Conclusion

This tutorial has provided a first introduction to the DIET Agents platform. You should now have a general idea of what the DIET Agents platform it about, the underlying philosophy, its main components and how they fit together. To explore DIET further, you can examine the sample applications in detail, look more closely at the API, or maybe best of all, start writing a simple DIET application yourself.

Table of Contents