IRunner - The Ibis jobs runner

Synopsis

irunner [OPTIONS] -f jdf

Description

The IRunner is the KOALA Globus runner designed to run Ibis applications . Ibis applications use specialized Ibis Java communication library developed at the Vrije University in Amsterdam. Ibis provides an efficient and flexible Java-based programming environment and runtime systems for grid computing. More about Ibis can be obtained here.

Options

-flex : the job request is flexible
-optComm : if possible, try to optimize communication
-cm : if possible, try to minimize the number of clusters used
-x <clusters> : comma separated list of clusters not to be used
-ns <hostname> : specifies the hostname on which the nameserver runs
-ns-port <port> : specifies the port number on which the nameserver is listening
-key <key> : use <key> for identification with the nameserver
-no-pool : do not pass on any node-pool information to the application
-cp <classes> : extra java classes to be included
-ibisroot <path> : use ibis version of your choice
-globus|-ssh : use either Globus or SSH to submit the job. The default is SSH
-l <LEVEL> : set log4j <FATAL| ERROR| WARN| DEBUG> output level
-jvm-params <jvm parameters>: extra java virtual machine paramaters

Examples

The following are examples of running jobs with the IRunner

Simple single Ibis job execution.

This example uses Globus to submit the NQueens application and exits. This can be done with the rsl given below. The directory attribute of the rsl points to the path where the Java classes specific for the application to be run are. In this path is where the class with the main method is to be found. With this application, the classes have been archived to the jar file called "nqueens.jar" with the jar tool. The executable attribute points the class where the Java main method is to be found. The rest of the rsl attributes bear the same meaning with other KOALA runners. The most simple way of starting a job is shown below.

[hashim@fs3 jdfs]$ more nqueens-base.jdf
+
(&(count = "4")
(directory = "/home/hashim/share/demos/ASCI_2007/Ibis-satin/bin")
(maxWallTime = "10" )
( arguments = "20" )
( resourcemanagercontact = "fs3.das3.tudelft.nl" )
(executable = "NQueens" )
)
(&(count = "4")
(directory = "/home/hashim/share/demos/ASCI_2007/Ibis-satin/bin")
(maxWallTime = "60" )
( arguments = "20" )
( resourcemanagercontact = "fs3.das3.tudelft.nl" )
(executable = "NQueens" )
)

[hashim@fs3 jdfs]$ irunner -globus -f nqueens-base.jdf
Ksched - Assigned job ID 78651
Ksched - Job 78651 Assigned SUPERLOW_PRIORITY
Ksched - Reservation for component 1 succeed
Ksched - Reservation for component 2 succeed
Ksched - Claiming for processors for job 78651 begins
Ksched - Placed component 2 on fs3.das3.tudelft.nl
Ksched - Placed component 1 on fs3.das3.tudelft.nl
Runner - Submitting for execution component 2 to fs3.das3.tudelft.nl
Runner - No Nameserver host provided, Nameserver will be started automatically
Runner - Submitting for execution component 1 to fs3.das3.tudelft.nl
GRAM - Component1 @ fs3.das3.tudelft.nl: PENDING
GRAM - Component1 @ fs3.das3.tudelft.nl: ACTIVE
GRAM - Component2 @ fs3.das3.tudelft.nl: PENDING
GRAM - Component2 @ fs3.das3.tudelft.nl: ACTIVE
nqueens 20 started
application result nqueens (20) = 0 2 4 1 3 12 14 11 17 19 16 8 15 18 7 9 6 13 5 10
application time nqueens (20) took 35.911 s
----------SATIN STATISTICS-------
SATIN: SPAWN: 67,087 spawns, 67,087 executed, 5,328 syncs
SATIN: STEAL: 599 attempts, 153 successes (25.543 %)
SATIN: MESSAGES: intra 1,351 msgs, 97,183 bytes; inter 0 msgs, 0 bytes
--------SATIN TOTAL TIMES--------

SATIN: STEAL_TIME: total 21.899 s time/req 36.559 ms
SATIN: HANDLE_STEAL_TIME: total 14.648 s time/handle 24.454 ms
SATIN: INV SERIALIZATION_TIME: total 52.213 ms time/write 341.261 us
SATIN: INV DESERIALIZATION_TIME: total 126.980 ms time/read 829.935 us
SATIN: RET SERIALIZATION_TIME: total 44.271 ms time/write 289.353 us
SATIN: RET DESERIALIZATION_TIME: total 29.841 ms time/read 195.039 us
-------SATIN RUN TIME BREAKDOWN-----
SATIN: TOTAL_RUN_TIME: 35.913 s
SATIN: LOAD_BALANCING_TIME: agv. per machine 2.706 s ( 7.534 %)
SATIN: (DE)SERIALIZATION_TIME: agv. per machine 31.663 ms ( 0.088 %)
SATIN: TOTAL_PARALLEL_OVERHEAD: agv. per machine 2.737 s ( 7.622 %)
SATIN: USEFUL_APP_TIME: agv. per machine 33.175 s (92.378 %)
GRAM - Component1 @ fs3.das3.tudelft.nl: DONE
GRAM - Component2 @ fs3.das3.tudelft.nl: DONE
Runner - Job 78651 has completed successfully

A flexible Ibis job execution

In this example a flexible Ibis job is submitted using SSH and DRMAA. The application we use is the NQueens like in the example above. The job request, shown below, requests a total of 164 nodes, which no single DAS cluster has. To ensure that the jar file nqueens.jar will be present at the selected execution sites, we add the stagein parameter in the jdf. The scheduler is also instructed not to use fs4 for this execution and that this job is flexible.

[hashim@fs3 Ibis-satin]$ more jdfs/nqueens-base.jdf + (&(count = "164") (directory = "/home/hashim/share/demos/ASCI_2007/Ibis-satin/bin") (maxWallTime = "10" ) ( arguments = "20" ) ( stagein="/home/hashim/share/demos/ASCI_2007/Ibis-satin/bin/nqueens.jar") (executable = "NQueens" ) )

[hashim@fs3 Ibis-satin]$ irunner -x fs4 -flex -f jdfs/nqueens-base.jdf
Ksched - Assigned job ID 78751
Ksched - Job 78751 Assigned HIGH_PRIORITY
Ksched - Splitted the flexible job 78751 into 3 components
Ksched - Reservation for component 1 succeed
Ksched - Reservation for component 2 succeed
Ksched - Reservation for component 3 succeed
Ksched - Claiming for processors for job 78751 begins
Ksched - Placed component 3 on fs2.das3.science.uva.nl
Ksched - Placed component 2 on fs3.das3.tudelft.nl
Ksched - Placed component 1 on fs0.das3.cs.vu.nl
Runner - Submitting for execution component 2 to fs3.das3.tudelft.nl
Runner - No Nameserver host provided, Nameserver will be started automatically
DRMAA - Component2 @ fs3.das3.tudelft.nl: QUEUED
DM - Input file already present on the destination cluster
Runner - Submitting for execution component 3 to fs2.das3.science.uva.nl
DM - Stagein file /home/hashim/share/demos/ASCI_2007/Ibis-satin/bin/nqueens.jar to fs0.das3.cs.vu.nl
#
DM - Transferred /home/hashim/share/demos/ASCI_2007/Ibis-satin/bin/nqueens.jar to fs0.das3.cs.vu.nl
Runner - Submitting for execution component 1 to fs0.das3.cs.vu.nl
DRMAA - Component3 @ fs2.das3.science.uva.nl: QUEUED
DRMAA - Component1 @ fs0.das3.cs.vu.nl: QUEUED
DRMAA - Component2 @ fs3.das3.tudelft.nl: ACTIVE
DRMAA - Component3 @ fs2.das3.science.uva.nl: ACTIVE
DRMAA - Component1 @ fs0.das3.cs.vu.nl: ACTIVE
nqueens 20 started
application result nqueens (20) = 0 2 4 1 3 12 14 11 17 19 16 8 15 18 7 9 6 13 5 10
application time nqueens (20) took 7.095 s
---------------SATIN STATISTICS----------------

SATIN: SPAWN: 67,087 spawns, 67,087 executed, 5,328 syncs
SATIN: STEAL: 16,780 attempts, 1,396 successes (8.319 %)
SATIN: MESSAGES: intra 32,545 msgs, 1,544,222 bytes; inter 0 msgs, 0 bytes
-------------SATIN TOTAL TIMES-------------------
SATIN: STEAL_TIME: total 807.258 s time/req 48.108 ms
SATIN: HANDLE_STEAL_TIME: total 239.199 s time/handle 14.255 ms
SATIN: INV SERIALIZATION_TIME: total 730.623 ms time/write 523.369 us
SATIN: INV DESERIALIZATION_TIME: total 4.464 s time/read 3.197 ms
SATIN: RET SERIALIZATION_TIME: total 564.583 ms time/write 404.429 us
SATIN: RET DESERIALIZATION_TIME: total 949.429 ms time/read 680.107 us
-----------SATIN RUN TIME BREAKDOWN----------------
SATIN: TOTAL_RUN_TIME: 7.096 s
SATIN: LOAD_BALANCING_TIME: agv. per machine 4.881 s (68.786 %)
SATIN: (DE)SERIALIZATION_TIME:agv. per machine 40.904 ms ( 0.576 %)
SATIN: TOTAL_PARALLEL_OVERHEAD:agv. per machine 4.922 s (69.363 %)
SATIN: USEFUL_APP_TIME: agv. per machine 2.174 s (30.637 %)
Runner - Job 78751 has completed successfully

KOALA News

  • January 2013: MR-Runner upgraded! Now the MR-Runner deploys Hadoop-1.0.0 clusters, compatible with Pig-0.10.0. 

  • December 2012KOALA 2.1 released! Deploy MapReduce clusters on DAS-4 with the Koala MR-Runner

  • November 2012:  Best Paper Award at MTAGS12 workshop (co-located with SC12) with work on MapReduce!

  • November 2009KOALA 2.0 released! You can now run Parameter sweep applications (PSAs) with KOALA CSRunner

  • April 2008: New KOALA runner! The OMRunner enables DRMAA and OpenMPI job submissions. 

  • July 2007: Paper accepted at Grid07 conference with work on scheduling malleable jobs in KOALA.

  • May 2007: KOALA has now been ported successfully to DAS-3. All the KOALA runners are operational apart from the DRunner.

  • April 2007: The KOALA IRunner has been updated to include recommendations made by the Ibis group