Grid Data Management
Moab Workload Manager® for Grids

17.13 Grid Data Management

17.13.1 Grid Data Management Overview

Moab provides a highly generalized data manager interface that can allow both simple and advanced data management services to be used to migrate data amongst peer clusters. Using a flexible script interface, services such as scp, NFS, and gridftp can be used to address data staging needs. This section is meant to inform about data management in a peer-to-peer environment, but uses the same data staging features that are available in a single cluster configuration.

17.13.2 Peer-to-Peer Initial Data Configuration

As with cluster data staging there are several models which can be used separately or in concert to manage data within a peer based grid. These models can include global file systems, replicated data servers, or need-based direct input and output data migration. When managing data in peer-to-peer systems, the same configuration semantics are used as for single cluster systems.

At a high level, configuring data staging across a peer-to-peer relationship consists of configuring one or more storage managers, associating them with the appropriate peer resource managers, and then specifying data requirements at the local level--when the job is submitted.

17.13.3 Peer-to-Peer SCP Key Authentication

In order to use scp as the data staging protocol, we will need to create SSH keys which allow users to copy files between the two peers, without the need for passwords. For example, if UserA is present on the source peer, and his counterpart is UserB on the destination peer, then UserA will need to create an SSH key and configure UserB to allow password-less copying. This will enable UserA to copy files to and from the destination peer using Moab's data staging capabilities.

Another common scenario is that several users present on the source peer are mapped to a single user on the destination peer. In this case, each user on the source peer will need to create keys and set them up with the user at the destination peer. Below are steps that can be used to setup SSH keys among two (or more) peers:

NOTE: These directions were written using OpenSSH version 3.6 and may not transfer correctly to older versions.

Generate SSH Key on Source Peer

As the user who will be submitting jobs on the source peer, run the following command:

ssh-keygen -t rsa

You will be prompted to give an optional key. Just hit return and ignore this or other settings. When finished, this command will create two files id_rsa and id_rsa.pub located inside the user's ~/.ssh/ directory.

Copy the Public SSH Key to the Destination Peer

Transfer the newly created public key (id_rsa.pub) to the destination peer:

scp ~/.ssh/id_rsa.pub ${DESTPEERHOST}:~

Disable Strict SSH Checking on Source Peer (Optional)

By appending the following to your ~/.ssh/config file you can disable SSH prompts which ask to add new hosts to the "known hosts file." (These prompts can often cause problems with data staging functionality.) Note that the ${DESTPEERHOST} should be the name of the host machine running the destination peer:

Host ${DESTPEERHOST}
CheckHostIP no
StrictHostKeyChecking no
BatchMode yes

Configure Destination Peer User

Now, log in to the destination peer as the destination user and set up the newly created public key to be trusted:

ssh ${DESTPEERUSER}@${DESTPEERHOST}
mkdir -p .ssh; chmod 700 .ssh
cat id_rsa.pub >> .ssh/authorized_keys
chmod 600 .ssh/authorized_keys
rm id_rsa.pub

If multiple source users map to a single destination user, then add repeat the above commands for each of the source user's SSH public keys.

Configure SSH Daemon on Destination Peer

Some configuration of the SSH daemon may be required on the destination peer. Typically, this is done by editing the /etc/ssh/sshd_config file. To verify correct configuration, see that the following attributes are set (not commented):

 
---
RSAAuthentication    yes
PubkeyAuthentication yes
---

If configuration changes were required, the SSH daemon will need to be restarted:

/etc/init.d/sshd restart

Validate Correct SSH Configuration

If all is properly configured, if you issue the following command source peer it should succeed without requiring a password:

scp ${DESTPEERHOST}:/etc/motd /tmp/

17.13.4 Peer-to-Peer SCP Data Staging Setup

After SSH key authentication is setup between users on the source and destination peers, Moab can then be configured to utilize SCP-based data staging. A single configuration file in the $TOOLSDIR ($PREFIX/tools) directory must be modified to properly enable data staging: config.dstage.pl. You will want to modify the $removeExec, $remoteCopy, and $dataSpaceUser parameters to match your system's requirements.

Next, follow the below example to create a storage resource manager using other helper scripts included in Moab 4.5.0 or higher. After making these changes, restart Moab for them to take effect:

moab.cfg (source peer)
...
SCHEDCFG[source] MODE=NORMAL SERVER=gridhead:5353

RMCFG[peer1] SERVER=moab://gridcluster1:5353 DATARM=scp_storage
RMCFG[peer2] SERVER=moab://gridcluster2:5353 DATARM=scp_storage

DATASTAGEMODEL LOOSE
RMCFG[scp_storage] TYPE=NATIVE RESOURCETYPE=STORAGE 
RMCFG[scp_storage] CLUSTERQUERYURL=exec://$TOOLSDIR/cluster.query.dstage.pl
RMCFG[scp_storage] SYSTEMMODIFYURL=exec://$TOOLSDIR/system.modify.dstage.pl
RMCFG[scp_storage] SYSTEMQUERYURL=exec://$TOOLSDIR/system.query.dstage.pl
...

17.13.5 Other Peer-to-Peer Data Staging Examples

Below are other examples of how different data staging methods can be configured for different destination peers. Note that copies of the existing scripts have been modified so that they read different config.dstage.pl files--one for SCP and one for GridFTP.

Example: Two Destination Peers with SCP Server and One with GridFTP

moab.cfg (source peer)
...
SCHEDCFG[source] MODE=NORMAL SERVER=gridhead:5353
ADMINCFG[1] USERS=sys

RMCFG[peer1] SERVER=moab://gridcluster1:5353 DATARM=scp_storage
RMCFG[peer2] SERVER=moab://gridcluster2:5353 DATARM=scp_storage

RMCFG[peer3] SERVER=moab://gridcluster3:5353 DATARM=gridftp_storage  # Utilizes Globus GridFTP

RMCFG[scp_storage] TYPE=NATIVE RESOURCETYPE=STORAGE 
RMCFG[scp_storage] CLUSTERQUERYURL=exec://$TOOLSDIR/cluster.query.dstage-scp.pl  # These files import config.dstage-scp.pl
RMCFG[scp_storage] SYSTEMMODIFYURL=exec://$TOOLSDIR/system.modify.dstage-scp.pl
RMCFG[scp_storage] SYSTEMQUERYURL=exec://$TOOLSDIR/system.query.dstage-scp.pl

RMCFG[gridftp_storage] TYPE=NATIVE RESOURCETYPE=STORAGE
RMCFG[gridftp_storage] CLUSTERQUERYURL=exec://$TOOLSDIR/cluster.query.dstage-gridftp.pl  # These files import config.dstage-gridftp.pl
RMCFG[gridftp_storage] SYSTEMMODIFYURL=exec://$TOOLSDIR/system.modify.dstage-gridftp.pl
RMCFG[gridftp_storage] SYSTEMQUERYURL=exec://$TOOLSDIR/system.query.dstage-gridftp.pl
...

As seen in the two examples above, data staging management involves very site specific configuration. Moab's data staging capabilities provide the flexibility to cater to almost any particular need. Sample storage manager interface scripts are provided with Moab Workload Manager and may be customized as needed to support other protocols or methods. For more information about these interfaces, refer to Interface Scripts for a Storage Resource Manager.