next up previous contents
Next: Job Submission - Doing Up: Queue commands - Basic Previous: qstat Basics   Contents

qsub Basics



The qsub command is used to submit jobs to the queue. job, as previously mentioned, is a program or task that may be assigned to run on a cluster system. qsub command is itself simple, however, it to actually run your desired program may be a bit tricky. is because qsub, when used as designed, will only run scripts. A script is a text file containing a series of instructions or commands that are carried out in sequence when run on a computer. Scripts can vary a great deal in complexity, a simple script may just change your directory and run a program, while a complex script, may change your directory, run multiple programs, sort the output, compress it, and then copy it to a specific location.


Note: For advanced users, although you can technically submit any type of script to qsub, as a best practice you should only submit shell scripts (i.e. bash), even if this script only invokes another script and scripting language. This is because the data the scheduling system provides to a job during runtime is set as environment variables to the shell. Also, in general, &rarr#to;simplify debugging, it is prudent to keep the actual job submission scripting, separate from the program or task you want to run.


Lets say I have a program that calculates an arbitrary fibonacci number, and I want to run this program on the cluster. have written a script file ``script.sh'' which runs this program. submit this job I can use the following method.


Example:

[jdpoisso@axiom ~]$ cd qsub 
[jdpoisso@axiom qsub]$ ls 
fib  script.sh 
[jdpoisso@axiom qsub]$ qsub script.sh 
1056.axiom.localdomain 
[jdpoisso@axiom qsub]$ qstat 
Job id                    Name             User            Time Use S Queue 
------------------------- ---------------- --------------- -------- - ----- 
1056.axiom                script.sh        jdpoisso        00:00:38 R first          
[jdpoisso@axiom qsub]$


Here the qsub command was given the script as an argument. This tells the qsub command to to the scheduling system and submit the script ``script.sh'' to the cluster, if the submission is accepted, you are given a number, in this case 1055, which we can use to monitor the job. Once resources are available to run the program, the scheduling system will send the script to one of the cluster nodes and follow its instructions to run the program. shown in the example by the qstat command, the script started running soon after submission. The job will continue to run until it completes, at which point it will copy back the results and will disappear from job listing provided by qstat.

Example:

[jdpoisso@axiom qsub]$ qstat 
[jdpoisso@axiom qsub]$ ls 
fib  script.sh  script.sh.o1056 
[jdpoisso@axiom qsub]$



By default the only results copied back is the standard output and standard error (script.sh.o1056). These are Unix/Linux terms for anything that would be printed by the program being run. Many programs write their results to a separate file, rather than printing it. On most cluster configurations, the copying of these files (if necessary) are the responsibility of the user. The instructions to do so may be done at the end of your submitted script. More will be said about this in the sections on scripting.



Note: You may or may not be notified by email when your job completes, depending on the cluster configuration. If you wish to receive an email when your job completes ( or fails) instructions for doing so may be found in Scripting Details section under PBS directives.



Example Script -



The script used in the previous qsub example, and its output are listed below. learn about scripting, see the next section and the section on Scripting Details or refer to other documentation on scripting in Unix/Linux.

script.sh :

	#!/bin/bash 
	#PBS -j oe 

	echo "Running on: " 
	cat ${PBS_NODEFILE} 

	echo 
	echo "Program Output begins: " 

	cd ${PBS_O_WORKDIR} 

	./fib 46

script.sh.o1056 :

	Running on: 
	compute-0-19 

	Program Output begins: 
	1,      1,      2,      3,      5, 
	8,      13,     21,     34,     55, 
	89,     144,    233,    377,    610, 
	987,    1597,   2584,   4181,   6765, 
	10946,  17711,  28657,  46368,  75025, 
	121393, 196418, 317811, 514229, 832040, 
	1346269,        2178309,        3524578,        5702887,        9227465, 
	14930352,       24157817,       39088169,       63245986,       102334155, 
	165580141,      267914296,      433494437,      701408733,      1134903170, 
	1836311903,



next up previous contents
Next: Job Submission - Doing Up: Queue commands - Basic Previous: qstat Basics   Contents
2010-08-27