shell - Wait for all jobs of a user to finish before submitting subsequent jobs to a PBS cluster -
i trying adjust bash scripts make them run on (pbs) cluster.
the individual tasks performed several script thats started main script. far main scripts starts multiple scripts in background (by appending &
) making them run in parallel on 1 multi core machine. want substitute these calls qsub
s distribute load accross cluster nodes.
however, jobs depend on others finished before can start. far, achieved wait
statements in main script. best way using grid engine?
i found this question -w after:jobid[:jobid...]
documentation in qsub
man page hope there better way. talking several thousend jobs run in parallel first , set of same size run simultatiously after last 1 of these finished. mean had queue lot of jobs depending on lot of jobs.
i bring down using dummy job in between, doing nothing depending on first group of jobs, on second group depend. decrease number of dependencies millions thousands still: feeles wrong , not sure if such long command line accepted shell.
- isn't there way wait my jobs finish (something
qwait -u <user>
)? - or jobs submitted this script (something
qwait [-p <pid>]
)?
of course possible write using qstat
, sleep
in while
loop, guess use case important enough have built in solution , incapable figure 1 out.
what recommend / use in such situation?
addendum i:
since requested in comment:
$ qsub --version version: 2.4.8
maybe helpful determine exact pbs system:
$ qsub --help usage: qsub [-a date_time] [-a account_string] [-b secs] [-c [ none | { enabled | periodic | shutdown | depth=<int> | dir=<path> | interval=<minutes>}... ] [-c directive_prefix] [-d path] [-d path] [-e path] [-h] [-i] [-j oe] [-k {oe}] [-l resource_list] [-m n|{abe}] [-m user_list] [-n jobname] [-o path] [-p priority] [-p proxy_user] [-q queue] [-r y|n] [-s path] [-t number_to_submit] [-t type] [-u user_list] [-w] path [-w otherattributes=value...] [-v variable_list] [-v] [-x] [-x] [-z] [script]
since comments point job arrays far searched qsub
man page following results:
[...] description [...] in addition above, following environment variables available batch job. [...] pbs_arrayid each member of job array assigned unique identifier (see -t) [...] options [...] -t array_request specifies task ids of job array. single task arrays allowed. array_request argument integer id or range of integers. multiple ids or id ranges can combined in comman delimeted list. examples : -t 1-100 or -t 1,10,50-100 [...]
addendum ii:
i have tried torque solution given dmitri chubarov not work described.
without job arrray works expected:
testuser@headnode ~ $ qsub -w depend=afterok:`qsub ./test1.sh` ./test2 && qstat 2553.testserver.domain job id name user time use s queue ----------------------- ---------------- --------------- -------- - ----- 2552.testserver test1 testuser 0 q testqueue 2553.testserver test2 testuser 0 h testqueue testuser@headnode ~ $ qstat job id name user time use s queue ----------------------- ---------------- --------------- -------- - ----- 2552.testserver test1 testuser 0 r testqueue 2553.testserver test2 testuser 0 h testqueue testuser@headnode ~ $ qstat job id name user time use s queue ----------------------- ---------------- --------------- -------- - ----- 2553.testserver test2 testuser 0 r testqueue
however, using job arrays second job won't start:
testuser@headnode ~ $ qsub -w depend=afterok:`qsub -t 1-2 ./test1.sh` ./test2 && qstat 2555.testserver.domain job id name user time use s queue ----------------------- ---------------- --------------- -------- - ----- 2554-1.testserver test1-1 testuser 0 q testqueue 2554-2.testserver test1-1 testuser 0 q testqueue 2555.testserver test2 testuser 0 h testqueue testuser@headnode ~ $ qstat job id name user time use s queue ----------------------- ---------------- --------------- -------- - ----- 2554-1.testserver test1-1 testuser 0 r testqueue 2554-2.testserver test1-2 testuser 0 r testqueue 2555.testserver test2 testuser 0 h testqueue testuser@headnode ~ $ qstat job id name user time use s queue ----------------------- ---------------- --------------- -------- - ----- 2555.testserver test2 testuser 0 h testqueue
i guess due lack of array indication in job id returned first qsub
:
testuser@headnode ~ $ qsub -t 1-2 ./test1.sh 2556.testserver.domain
as can see there no ...[]
indicating being job array. also, in qsub
output there no ...[]
s ...-1
, ...-2
indicating array.
so remaining question how format -w depend=afterok:...
make job depend on specified job array.
filling in following solution suggested jonathan in comments.
there several resource managers based on original portable batch system: openpbs, torque , pbs professional. systems had diverged , use different command syntax newer features such job arrays.
job arrays convenient way submit multiple similar jobs based on same job script. quoting manual:
sometimes users want submit large numbers of jobs based on same job script. rather using script repeatedly call qsub, feature known job arrays exists allow creation of multiple jobs 1 qsub command.
to submit job array pbs provides following syntax:
qsub -t 0-10,13,15 script.sh
this submits jobs ids 0,1,2,...,10,13,15.
within script variable pbs_arrayid
carries id of job within array , can used pick necessary configuration.
job array have specific dependency options.
torque
torque resource manager used in op. there additional dependency options provided can seen in following example:
$ qsub -t 1-1000 script.sh 1234[].pbsserver.domainname $ qsub -t 1001-2000 -w depend=afterokarray:1234[] script.sh 1235[].pbsserver.domainname
this result in following qstat
output
1234[] script.sh user 0 r queue 1235[] script.sh user 0 h queue
tested on torque version 3.0.4
the full afterokarray syntax in qsub(1)
manual.
pbs professional
in pbs professional dependencies can work uniformly on ordinary jobs , array jobs. here example:
$ qsub -j 1-1000 -ry script.sh 1234[].pbsserver.domainname $ qsub -j 1001-2000 -ry -w depend=afterok:1234[] script.sh 1235[].pbsserver.domainname
this result in following qstat
output
1234[] script.sh user 0 b queue 1235[] script.sh user 0 h queue
update on torque versions
array dependencies became available in torque since version 2.5.3. job arrays version 2.5 not compatible job arrays in versions 2.3 or 2.4. in particular []
syntax introduced in torque since version 2.5.
update on using delimeter job
for torque versions prior 2.5 different solution may work based on submitting dummy delimeter jobs between batches of jobs separated. uses 3 dependency types: on
,before
, , after
.
consider following example
$ delim=`qsub -wdepend=on:1000 dummy.sh ` $ qsub -wdepend=beforeany:$delim script.sh 1001.pbsserver.domainname ... 998 jobs ... $ qsub -wdepend=beforeany:$delim script.sh 2000.pbsserver.domainname $ qsub -wdepend=after:$delim script.sh 2001.pbsserver.domainname ...
this result in queue state this
1000 dummy.sh user 0 h queue 1001 script.sh user 0 r queue ... 2000 script.sh user 0 r queue 2001 script.sh user 0 h queue ...
that job #2001 run after previous 1000 jobs terminate. rudimentary job array facilities available in torque 2.4 can used submit script job.
this solution work torque version 2.5 , higher.
Comments
Post a Comment