Using KOALA

This page explains how to use KOALA on the DAS. Users interact with KOALA through so-called runners. KOALA has various runners for submitting (and monitoring) various kinds of jobs. Currently, the following KOALA runners are operational on the DAS:

  • OMRunner, is the runner for submitting OpenMPI co-allocated jobs. In addition to that, any other job that does not have any special requirements can be submitted with the OMRunner. OMRunner uses SSH for remote startup of processes.
  • PRunner, is a shell-like interface of the OMRunner trimmed for non-coallocated jobs. The advantage of the PRunner is you don't have to write the Job Description File. All the requirements of the job requests are passed on the command line.
  • KRunner, is the Globus job submission tool on clusters using Globus middleware. Any job that does not have any special requirements can be submitted with the KRunner through Globus job submission mechanisms.
  • IRunner, which is used for submitting Ibis jobs.
  • CSRunner (Cycle Scavenging Runner) is designed for running parameter sweep applications at a low priority without being in the way of regular grid users or local users. It uses SSH and DRMAA to perform job submissions. It first submits launchers (pilot jobs) to the execution sites, and the launchers pull jobs from the CSRunner to execute them. For the details of the CSRunner please see this paper in the publications page.
  • MR-Runner (MapReduce Runner) is designed to deploy multiple MapReduce clusters (MR cluster) within the same physical infrastructure. Koala is extended with two new components: a specific MR cluster configuration module called the MR-Launcher and a global manager of all active MR-Runners called the MR-ClusterManager.

Apart from the PRunner, all the runners require jobs to be defined in the resource specification language (rsl). Before any runner can be used, the environment has to be set up first. Although the main KOALA server runs on the cluster in Delft, the runners can be accessed from any of the DAS sites.

Preparation before running a job

Before a job can be run with KOALA the following preparations need to be made:

  • Setting up the environment (All Runners)
  • Setting SSH public key authentication (OMRunner, PRunner, IRunner, CSRunner)
  • Obtaining a Globus certificate (KRunner, and IRunner)
  • Creating a Job Description File (OMRunner, KRunner, IRunner, CSRunner)

The first three preparations only need to be done once. Creating a job description file has to be done for every job to run.

Setting up the environment

In the DAS, the KOALA binaries are found in /usr/local/package/koala. To avoid typing the full path of the KOALA runners every time, simply add the bin directory in your path:

export PATH=$PATH:/usr/local/package/koala/bin

To configure the shell for various modules that the runners depend on use the module add command. For example:

module add sge
module load openmpi/gcc/default
module load module add globus/4.0.3
module load default-ethernet
module load default-myrinet

The latter should only be loaded if the high speed Myri-10G interconnect is to be used, which is not available in DAS-3/Delft and therefore, it should not be loaded there. You can also add the above commands in your ~/.bashrc file so that you do not have to type them again the next time you login. If you are planning to use the OMRunner and/or the PRunner then the openmpi and the appropriate interconnect modules need to be added in the ~/.bashrc file in all 5 DAS-3 clusters. Here is a minimal .bashrc file that should get you started.

Setting SSH public key authentication

SSH public key authentication should only be set if you are planning to use the OMRunner and the PRunner. SSH public key authentication is required so that password-less ssh runs are possible between the DAS-3 sites. We have written a script to assist those who do not know how to set the SSH public key aunthentication. Simply run:

/usr/local/package/koala/bin/kssh_keygen.sh -all

and follow the instructions. This script can also be used to push your ssh keys to a remote host. Below, we show all the options available from this script.

Usage: [koala1@fs3 ~]$ /usr/local/package/koala/KOALA2_RC1/bin/kssh_keygen.sh -h Usage: kssh_keygen.sh <-g|-all> | <-rh user@hostname>

Where -g|-all generate if not available, and push the DSA keys to all the DAS-3 clusters and -rh|--rhost hostname generate if not available, and push the DSA keys to a remote host.

Obtaining a Globus certificate

If you are planning to use the KRunner or the IRunner then a valid Globus certificate, which can be requested at DutchGrid is required. Initialise the Globus proxy and make sure enough time is left to run your job:

[hashim@fs3 ~] grid-proxy-init
Your identity: /O=dutchgrid/O=users/O=tudelft/CN=xxxx
Enter GRID pass phrase for this identity:
Creating proxy .......................... Done
Your proxy is valid until: Fri Aug 17 23:20:03 2007

Creating a Job Description File

The format for the job description is different for the CSRunner, and it is described here.

For other runners we use the same job description format to specify the Job Description File (JDF) which contains information about the job to be submitted. We use Globus RSL to describe the components of the job. Each component requires a number of nodes. All the (count) nodes of a component are always started at the same location. Components themselves can be scheduled on the same or different locations. How to use rsl to construct a job description file can be found here. Below is an example of a jdf file for a semi-fixed job request, which describes an Open MPI job consisting of two separate components, one specifying eight nodes and the other one requesting four nodes. The first component does not request a specific execution site. The second component is fixed to run at fs0.das3.cs.vu.nl. The job writes its output (stdout) to the file output.out and its executable takes two arguments: 3 and 4.

+
(
&( count = "8" )
( directory = "/home/hashim/demos/bin" )
( executable = "pois-das3" )
( arguments = "3" "4" )
( stdout = "output.out" )
( maxWallTime = "15" )
)
(
&( count = "4" )
( directory = "/home/hashim/demos/bin" )
( executable = "pois-das3" )
( resourcemanagercontact = "fs0.das3.cs.vu.nl" )
( stdout = "output.out" )
( arguments = "3" "4" )
( maxWallTime = "15" )
)

A complete list of rsl attributes used by KOALA is described below.

  • (count=value): the number of nodes requested by the component.
  • (directory=value): specifies the path of the directory to be used as the default directory for the requested job.
  • (executable=value): the name of the executable to be run on the remote machine. If the executable is not in the path specified by the directory attribute, then the absolute path of the executable is required.
  • (arguments=value [value] ["value"] ...): the command line arguments for the executable.
  • (resourcemanagercontact=value): specifies the execution site of the component.
  • (stdout=value): the name of the file to store the standard output from the job. If stdout is not specified, then the output is written on the user screen.
  • (stderr=value): the name of the file where the runtime errors of the application are to be redirected to.
  • (maxWallTime=value): the estimated runtime of the application in minutes. The default value of the attribute is 15 minutes.
  • (stagein="file1" ["file2"] ...): the names of the input files to be transferred before the application starts. You need to specify the full path of the input files.
  • (stageout="file1" ["file2"]...): the names of the output files to be fetched after the execution of the application finishes.
  • (environment=(var value) [(var value)] ...): The environment variables to be exported before the application starts. For example, (environment= ("LD_LIBRARY_PATH" "/home/hashim/lib:/usr/local/lib" "TMPDIR" "/home/hashim/tmp")

The attributes, count and executable are mandatory and should be specified for every job component.

We are now ready to start running the job using one of KOALA's runners.

KOALA News

  • January 2013: MR-Runner upgraded! Now the MR-Runner deploys Hadoop-1.0.0 clusters, compatible with Pig-0.10.0. 

  • December 2012KOALA 2.1 released! Deploy MapReduce clusters on DAS-4 with the Koala MR-Runner

  • November 2012:  Best Paper Award at MTAGS12 workshop (co-located with SC12) with work on MapReduce!

  • November 2009KOALA 2.0 released! You can now run Parameter sweep applications (PSAs) with KOALA CSRunner

  • April 2008: New KOALA runner! The OMRunner enables DRMAA and OpenMPI job submissions. 

  • July 2007: Paper accepted at Grid07 conference with work on scheduling malleable jobs in KOALA.

  • May 2007: KOALA has now been ported successfully to DAS-3. All the KOALA runners are operational apart from the DRunner.

  • April 2007: The KOALA IRunner has been updated to include recommendations made by the Ibis group