KRunner - The default Globus runner

Synopsis

krunner [OPTIONS] -f jdf

Description

The KRunner is the default Globus runner of KOALA. It implements the most basic way of running a job on a grid. It can be used for almost any kind of job, but it does not implement specific requirements certain job types may have.

Options

-l <LEVEL> : set log4j <FATAL|ERROR|WARN|DEBUG> output level
-g : stage executable to the execution site
-flex : the job request is flexible
-optComm : if possible, try to optimize communication
-cm : if possible, try to minimize the number of clusters used
-x <clusters>: comma separated list of clusters not to be used

Examples

The following are examples of running jobs with the KRunner.

Simple single job execution.

The first example is a very simple job which just executes "uname -n" and exits. This can be done with the rsl given below. In this example the rsl is stored in the file 'uname-1.jdf'. The most simple way of starting a job is shown.

& ( directory = "/bin" ) ( arguments = "-n" ) ( executable = "uname" ) ( maxWallTime = "15" ) ( count = "5" ) [hashim@fs3 JDFs]$ krunner -f uname-1.jdf Ksched - Assigned job ID 78624 Ksched - Job 78624 Assigned LOW_PRIORITY Ksched - Reservation for component 1 succeed Ksched - Placed component 1 on fs3.das3.tudelft.nl Ksched - Claiming for processors for job 78624 begins Runner - Submitting for execution component 1 to fs3.das3.tudelft.nl GRAM - Component1 @ fs3.das3.tudelft.nl: PENDING node358 node301 node362 node342 node310 GRAM - Component1 @ fs3.das3.tudelft.nl: DONE Runner - Job 78624 has completed successfully

The KRunner sends a new job request to the Ksched, which is the KOALA scheduler. If the rsl is correct, the Ksched responds with a KOALA job id and the assigned priority level of the job. After the job has been placed successfully, the Ksched informs the runner the execution site, in this case fs3.das3.tudelft.nl, selected for the component. At the predetermined job claiming time, the Ksched instructs the runner to start claiming processors for the job components. The runner then submits the job component to the selected execution site for execution. node358, node301, node362, node342, and node310 are the messages from stdout redirected from the nodes where the command uname -n has been running. The status messages are the transition messages coming from the local resource manager informing us about the progress of the job. A successful job component goes through the following stages:

  • UNSUBMITTED
  • STAGE_IN
  • PENDING
  • ACTIVE
  • STAGE_OUT
  • DONE

An MPI job execution

In this example we run an MPICH application that calculates pi. The job request, shown below, is semi-fixed and consists of two components. In this example, we want the standard output of the run to be appended to the file out.dat. Note in the rsl we have added the "jobtype" attribute. This is required with the Globus GRAM for MPI jobs.

[hashim@fs3 JDFs]$ cat cpi-mpich.jdf + ( &( count = "2") ( directory = "/home/hashim/bin" ) (maxWallTime = "15" ) (jobtype = "mpi" ) (stdout = "out.dat") ( executable = "/home/hashim/bin/cpi.mpich" ) ( resourcemanagercontact = "fs2.das3.science.uva.nl" ) ) ( &( count = "2") ( directory = "/home/hashim/bin" ) (maxWallTime = "15" ) (jobtype = "mpi" ) (stdout = "out.dat") ( executable = "/home/hashim/bin/cpi.mpich" ) ) [hashim@fs3 JDFs]$ krunner -f cpi-mpich.jdf Ksched - Assigned job ID 78647 Ksched - Job 78647 Assigned LOW_PRIORITY Ksched - Reservation for component 1 succeed Ksched - Placed component 2 on fs0.das3.cs.vu.nl Ksched - Reservation for component 2 succeed Ksched - Placed component 1 on fs2.das3.science.uva.nl Ksched - Claiming for processors for job 78647 begins Runner - Submitting for execution component 2 to fs0.das3.cs.vu.nl Runner - Submitting for execution component 1 to fs2.das3.science.uva.nl GRAM - Component1 @ fs2.das3.science.uva.nl: STAGE_IN GRAM - Component2 @ fs0.das3.cs.vu.nl: STAGE_IN GRAM - Component2 @ fs0.das3.cs.vu.nl: PENDING GRAM - Component1 @ fs2.das3.science.uva.nl: PENDING GRAM - Component2 @ fs0.das3.cs.vu.nl: DONE GRAM - Component1 @ fs2.das3.science.uva.nl: DONE Runner - Job 78647 has completed successfully [hashim@fs3 JDFs]$ more out.dat Process 1 of 2 on node011.beowulf.cluster Process 0 of 2 on node004.beowulf.cluster pi is approximately 3.1415926544231318, Error is 0.0000000008333387 wall clock time = 0.000000 Process 1 of 2 on node218.beowulf.cluster Process 0 of 2 on node230.beowulf.cluster pi is approximately 3.1415926544231318, Error is 0.0000000008333387 wall clock time = 0.000000

The components are sent to fs2.das3.science.uva.nl, which was fixed, and fs0.das3.cs.vu.nl. Since the KRunner does not support co-allocation, the two components are executed independently and hence, each produce its own output.

KOALA News

  • January 2013: MR-Runner upgraded! Now the MR-Runner deploys Hadoop-1.0.0 clusters, compatible with Pig-0.10.0. 

  • December 2012KOALA 2.1 released! Deploy MapReduce clusters on DAS-4 with the Koala MR-Runner

  • November 2012:  Best Paper Award at MTAGS12 workshop (co-located with SC12) with work on MapReduce!

  • November 2009KOALA 2.0 released! You can now run Parameter sweep applications (PSAs) with KOALA CSRunner

  • April 2008: New KOALA runner! The OMRunner enables DRMAA and OpenMPI job submissions. 

  • July 2007: Paper accepted at Grid07 conference with work on scheduling malleable jobs in KOALA.

  • May 2007: KOALA has now been ported successfully to DAS-3. All the KOALA runners are operational apart from the DRunner.

  • April 2007: The KOALA IRunner has been updated to include recommendations made by the Ibis group