This page explains how to use KOALA on the DAS. Users interact with KOALA through so-called runners. KOALA has various runners for submitting (and monitoring) various kinds of jobs. Currently, the following KOALA runners are operational on the DAS:
Apart from the PRunner, all the runners require jobs to be defined in the resource specification language (rsl). Before any runner can be used, the environment has to be set up first. Although the main KOALA server runs on the cluster in Delft, the runners can be accessed from any of the DAS sites.
Before a job can be run with KOALA the following preparations need to be made:
The first three preparations only need to be done once. Creating a job description file has to be done for every job to run.
In the DAS, the KOALA binaries are found in /usr/local/package/koala. To avoid typing the full path of the KOALA runners every time, simply add the bin directory in your path:
export PATH=$PATH:/usr/local/package/koala/bin
To configure the shell for various modules that the runners depend on use the module add command. For example:
module add sge
module load openmpi/gcc/default
module load module add globus/4.0.3
module load default-ethernet
module load default-myrinet
The latter should only be loaded if the high speed Myri-10G interconnect is to be used, which is not available in DAS-3/Delft and therefore, it should not be loaded there. You can also add the above commands in your ~/.bashrc file so that you do not have to type them again the next time you login. If you are planning to use the OMRunner and/or the PRunner then the openmpi and the appropriate interconnect modules need to be added in the ~/.bashrc file in all 5 DAS-3 clusters. Here is a minimal .bashrc file that should get you started.
SSH public key authentication should only be set if you are planning to use the OMRunner and the PRunner. SSH public key authentication is required so that password-less ssh runs are possible between the DAS-3 sites. We have written a script to assist those who do not know how to set the SSH public key aunthentication. Simply run:
/usr/local/package/koala/bin/kssh_keygen.sh -all
and follow the instructions. This script can also be used to push your ssh keys to a remote host. Below, we show all the options available from this script.
Usage: [koala1@fs3 ~]$ /usr/local/package/koala/KOALA2_RC1/bin/kssh_keygen.sh -h Usage: kssh_keygen.sh <-g|-all> | <-rh user@hostname>
Where -g|-all generate if not available, and push the DSA keys to all the DAS-3 clusters and -rh|--rhost hostname generate if not available, and push the DSA keys to a remote host.
If you are planning to use the KRunner or the IRunner then a valid Globus certificate, which can be requested at DutchGrid is required. Initialise the Globus proxy and make sure enough time is left to run your job:
[hashim@fs3 ~] grid-proxy-init
Your identity: /O=dutchgrid/O=users/O=tudelft/CN=xxxx
Enter GRID pass phrase for this identity:
Creating proxy .......................... Done
Your proxy is valid until: Fri Aug 17 23:20:03 2007
The format for the job description is different for the CSRunner, and it is described here.
For other runners we use the same job description format to specify the Job Description File (JDF) which contains information about the job to be submitted. We use Globus RSL to describe the components of the job. Each component requires a number of nodes. All the (count) nodes of a component are always started at the same location. Components themselves can be scheduled on the same or different locations. How to use rsl to construct a job description file can be found here. Below is an example of a jdf file for a semi-fixed job request, which describes an Open MPI job consisting of two separate components, one specifying eight nodes and the other one requesting four nodes. The first component does not request a specific execution site. The second component is fixed to run at fs0.das3.cs.vu.nl. The job writes its output (stdout) to the file output.out and its executable takes two arguments: 3 and 4.
+
(
&( count = "8" )
( directory = "/home/hashim/demos/bin" )
( executable = "pois-das3" )
( arguments = "3" "4" )
( stdout = "output.out" )
( maxWallTime = "15" )
)
(
&( count = "4" )
( directory = "/home/hashim/demos/bin" )
( executable = "pois-das3" )
( resourcemanagercontact = "fs0.das3.cs.vu.nl" )
( stdout = "output.out" )
( arguments = "3" "4" )
( maxWallTime = "15" )
)
A complete list of rsl attributes used by KOALA is described below.
The attributes, count and executable are mandatory and should be specified for every job component.
We are now ready to start running the job using one of KOALA's runners.