Chapter 23. ProActive File Transfer Model

23.1. Introduction and Concepts

Currently we provide support for the following type of transfers:

  • To a remote node (Push)

  • From a remote node (Pull)

The transfer can take place at any of the following moments:

  • Deployment Time: At the beggining of the application to input the data.

  • Retrieval Time: At the end of the application to collect results.

  • During the user application: To transfer information between nodes.

To achieve this, we have implemented File Transfer support in two ways:

  • File Transfer API

  • Descriptor File Transfer support.

23.2. File Transfer API

23.2.1. API Definition

import org.objectweb.proactive.filetransfer.*;

static public FileVector FileTransfer.pushFile(Node n, File source, File destination);
static public FileVector FileTransfer.pushFile(Node n, File[] source, File[] destination);
static public FileVector FileTransfer.pullFile(Node n, File source, File destination);
static public FileVector FileTransfer.pullFile(Node n, File[] source, File[] destination);

These methods can be used to put and get files on a remote Node while the user's application is running. Note that these methods behave asynchronously, and in the case of the pullFile method, the returned File is a future. For further information on asynchronism and futures, please refer to the Asynchronous calls and futures section of this manual.

23.2.2. How to use the API

In the following example, a Node is deployed using a descriptor file. A file is then pushed from localhost@localSource to nodehost@remoteDest, using the paths specified in a java.io.File type object. Afterwards, a file is pulled from nodehost@remoteSource and saved at localhost@localDest, in the same fashion.

import org.objectweb.proactive.filetransfer.*;

pad = ProActive.getProactiveDescriptor(XML_LOCATION);

VirtualNode testVNode = pad.getVirtualNode("example");
testVNode.activate();
Node[] examplenode = testVNode.getNodes();

File localSource = new File("/local/source/path/file");
File remoteDest = new File("/remote/destination/path/file");
FileVector filePushed =FileTransfer.pushFile(examplenode[0],localSource, remoteDest);
filePushed.waitForAll();  //wait for push to finish

File remoteSource = new File("/remote/source/path/file");
File localDest = new File("/local/destination/path/file");
FileVector filePulled = FileTransfer.pullFile(examplenode[0], remoteSource, localDest);
File  file = filePulled.getFile(0); //wait for pull to finish

23.3. Descriptor File Transfer

File Transfers can also be specified using ProActive Descriptors. The main advantage of this scheme is that it allows deployment and retrieval of input and output (files). In this section we will concentrate on mainly three topics:

  • XML Descriptor File Transfer Tags

  • Deployment File Transfer

  • Retrieval File Transfer

23.3.1. XML Descriptor File Transfer Tags

The File Transfer related tags, are placed inside the descriptor at three different parts (or levels).

The first one corresponds to the fileTransferDefinitions tag, which contains a list of FileTransfer definitions. A FileTransfer definition is a high level representation of the File Transfer, containing mainly the file names. It is created in such a way, that no low level information such as: hosts, protocols, prefix is present (this is the role of the low level representation). The following example shows a FileTranfer definition named example:

....
</deployment>
<fileTransferDefinitions>
   <fileTransfer id="example">
      <file src="hello.dat" dest="world.dat"/>
      <file src="hello.jar" dest="world.jar"/>
      <file src="hello.class" dest="world.class"/>
      <dir src="exampledir" dest="exampledir"/>
  </fileTransfer>
  <fileTransfer id="anotherExample">
      ...
  </fileTransfer>
  ...
</fileTransferDefinitions>
<infrastructure>
....         

The FileTransfer definitions can be referenced through their names, from the VirtualNode tags using two attributes:fileTransferDeploy and fileTransferRetrieve. The first one, corresponds to the file transfer that will take place at deployment time, and the second one corresponds to the file transfer that the user will trigger once the user application is done.

<virtualNode name="exampleVNode" fileTransferDeploy="example" fileTransferRetrieve="example"/>

All the low level information such as: hosts, username, protocols, prefix, etc... is declared inside each process. Both fileTransferDeploy and fileTransferRetrieve are specified separetly using a refid attribute. The refid can be a direct reference to a FileTransfer definition, or set using the keyword implicit. If implicit is used, then the reference will be inherited from the corresponding VirtualNode. In the following example both mechanisms (Deploy and Retrieve) reference indirectly and directly the example definition:

<processDefinition id="xyz">
  <sshProcess>
  ...  
<!-- Inside the process, the FileTransfer tag becomes an element instead of
an attribute.  This happens because FileTransfer information is process specific.
Note that the destination hostname and username can be omitted,
and implicitly inferred from the process information. -->

    <fileTransferDeploy refid="implicit"> <!-- referenceID or keyword "implicit" (inherit)-->
      <copyProtocol>processDefault, rcp, scp, pft</copyProtocol>
      <sourceInfo prefix="/home/user"/>
      <destinationInfo prefix="/tmp" hostname="foo.org" username="smith" />
    </fileTransferDeploy>

    <fileTransferRetrieve refid="example">
      <sourceInfo prefix="/tmp"/>
      <destinationInfo prefix="/home/user"/>
    </fileTransferRetrieve>
  </sshProcess>
</processDefinition>

In the example above, fileTransferDeploy has an implicit refid. This means that the File Transfer definitions used will be inherited from the VirtualNode. The first element shown inside this tag corresponds to copyProtocol. The copyProtocol tag specified the sequence of protocols that will be executed to achieve the FileTransfer at deployment time. Notice the processDefault keyword, which specifies the usage of the default copy protocol associated with this process. In the case of the example, this corresponds to an sshProcess and therefore the Secure Copy Protocol (scp) will be tried first. To complement the higher level File Transfer definition, other information can be specified as attributes in the sourceInfo and destinationInfo elements. For the case of FileTransferDeploy, these tags currently correspond to: prefix, hostname and username.

For fileTransferRetrieve, no copyProtocol needs to be specified. ProActive will use it's internal mechanism to transfer the files. This implies that no hostname or username are required.

23.3.1.1. Currently supported protocols for file transfer deployment

  • pftp (ProActive File Transfer Protocol)

  • scp (ssh processDefault)

  • rcp (rsh processDefault)

  • unicore (Unicore processDefault)

  • nordugrid (Nordugrid processDefault)

23.3.1.2. Triggering File Transfer Deploy

The trigger (start) of the File Transfer will take place when the deployment of the descriptor file is executed. In the case of external protocols (scp, rcp), this will take place before the process deployment. In the case of internal protocols (unicore, nordugrid), this will take place with the process deployment. In any case, it should be noted that intersting things can be achieved, such as transfering the ProActive libraries into the deploying machine using an on-the-fly style. This means that it is possible to deploy on remote machines without having ProActive pre-installed. Even further, when the network allows, it is also possible to transfer other required libraries like the JRE (Java Runtime Envirorment).

There is one protocol that behaves differently from the rest, the ProActive FileTransfer Protocol (pftp). The pftp uses the ProActive FileTranfer API (described earier), to transfer files between nodes. The main advantage of using the pftp is that no external copy protocols are required to transfer files at deployment time. Therefore, if the grid infrastructure does not provide a way to transfer files, a FileTransfer Deploy can still take place using the pftp. On the other hand, the main drawback of using pftp is that ProActive must already be install on the remote machines, and thus on-the-fly deployment is not possible.

23.3.1.3. Triggering File Transfer Retrieve

Since distributed application's termination is difficult to detect. The responsability of triggering the deployment corresponds to the user. To achieve this, we have provided a specific mehod that will trigger the retrieval of all files associated with a VirtualNode.

import org.objectweb.proactive.core.descriptor.data;

public FileWrapper VirtualNode.fileTransferRetrieve();

This will trigger the retrieval of all the files specified in the descriptor, from all the nodes that were deployed using this virtual node using the pftp. The following shows an example:

import org.objectweb.proactive.core.descriptor.data;

pad = ProActive.getProactiveDescriptor(XML_LOCATION);

VirtualNode testVNode = pad.getVirtualNode("example");
testVNode.activate();
Node[] examplenode = testVNode.getNodes();

...

FileWrapper fw = testVNode.fileTransferRetrieve();
...
File f[]=fw.getFiles() //wait-for-files to arrive

As a result of calling this method an array of type File[] will be created, representing all the retrieved files.

23.4. Advanced: FileTransfer Design

This section provides internal details and information on how the File Transfer is implemented. Reading the following section to use the File Transfer mechanisms provided by ProActive is not necessary.

23.4.1. Abstract Definition (High level)

This definitions can be referenced from a VirtualNode. They contain the most basic information of a FileTransfer:

  • A unique definition identification name.

  • Files: source and optionally the destination name.

  • Directories: source and optionally the destination name. Also the exclude and include patterns (not yet available feature).

References from the VirtualNode are made using the unique definition name.

23.4.2. Concrete Definition (Low level)

These definitions contain more architecture specific information, and are therefore contained within the Process:

  • A reference to an abstract definition, or the "implicit" key word indicating the reference will be inherited from the VirtualNode.

  • A sequence of Copy Protocols that will be used.

  • Source and Destination information: prefix, username, hostname, file separator, etc...

If some of this information (like username or hostname) can be inferred from the process, it is not necessary to declare it in the definition. Optionally, the information contained in the protocol can be overridden if specified.

23.4.3. How Deployment File Transfer Works

File Transfer Design

Figure 23.1. File Transfer Design

When a FileTransfer starts, both abstract and concrete information are merged using the FileTransfer Workshop. The result of this process correspons to a sequence of CopyProtocols, as specified in the Concrete Definition.

Each CopyProtocol will be tried before the deployment takes place, until one succeeds. After one succeed are all fail, the process deployment will take place.

23.4.4. How File Transfer API Works

The File Transfer API is built on top of ProActive's active object and future file asynchronism model. When pulling or pushing a file from a Node, two service Active Objects (AO) are created. One is placed on the local machine and the otherone on the remote site. The file is then split into blocks, and transfered over the network using remote invocations between these two AO.

23.4.5. How Retrieve File Transfer Works

For a given virtualnode, a File Transfer pull will take place with all the nodes deployed from this virtualnode. The detailes of the specified file transfer will correspond to the ones present in the descriptor file.