shell - Wait for all jobs of a user to finish before submitting subsequent jobs to a PBS cluster -


i trying adjust bash scripts make them run on () cluster.

the individual tasks performed several script thats started main script. far main scripts starts multiple scripts in background (by appending &) making them run in parallel on 1 multi core machine. want substitute these calls qsubs distribute load accross cluster nodes.

however, jobs depend on others finished before can start. far, achieved wait statements in main script. best way using grid engine?

i found this question -w after:jobid[:jobid...] documentation in qsub man page hope there better way. talking several thousend jobs run in parallel first , set of same size run simultatiously after last 1 of these finished. mean had queue lot of jobs depending on lot of jobs.

i bring down using dummy job in between, doing nothing depending on first group of jobs, on second group depend. decrease number of dependencies millions thousands still: feeles wrong , not sure if such long command line accepted shell.

  • isn't there way wait my jobs finish (something qwait -u <user>)?
  • or jobs submitted this script (something qwait [-p <pid>])?

of course possible write using qstat , sleep in while loop, guess use case important enough have built in solution , incapable figure 1 out.

what recommend / use in such situation?

addendum i:

since requested in comment:

$ qsub --version version: 2.4.8 

maybe helpful determine exact system:

$ qsub --help usage: qsub [-a date_time] [-a account_string] [-b secs]       [-c [ none | { enabled | periodic | shutdown |       depth=<int> | dir=<path> | interval=<minutes>}... ]       [-c directive_prefix] [-d path] [-d path]       [-e path] [-h] [-i] [-j oe] [-k {oe}] [-l resource_list] [-m n|{abe}]       [-m user_list] [-n jobname] [-o path] [-p priority] [-p proxy_user] [-q queue]       [-r y|n] [-s path] [-t number_to_submit] [-t type] [-u user_list] [-w] path       [-w otherattributes=value...] [-v variable_list] [-v] [-x] [-x] [-z] [script] 

since comments point job arrays far searched qsub man page following results:

[...] description [...]        in addition above, following environment variables available batch job. [...]        pbs_arrayid               each member of job array assigned unique identifier (see -t) [...] options [...]        -t array_request                specifies task ids of job array. single task arrays allowed.                array_request argument integer id or range of integers. multiple ids or id ranges can combined in comman delimeted list. examples : -t 1-100 or -t 1,10,50-100 [...] 

addendum ii:

i have tried solution given dmitri chubarov not work described.

without job arrray works expected:

testuser@headnode ~ $ qsub -w depend=afterok:`qsub ./test1.sh` ./test2 && qstat 2553.testserver.domain job id                  name             user            time use s queue ----------------------- ---------------- --------------- -------- - ----- 2552.testserver         test1            testuser               0 q testqueue 2553.testserver         test2            testuser               0 h testqueue testuser@headnode ~ $ qstat job id                  name             user            time use s queue ----------------------- ---------------- --------------- -------- - ----- 2552.testserver         test1            testuser               0 r testqueue 2553.testserver         test2            testuser               0 h testqueue testuser@headnode ~ $ qstat job id                  name             user            time use s queue ----------------------- ---------------- --------------- -------- - ----- 2553.testserver         test2            testuser               0 r testqueue 

however, using job arrays second job won't start:

testuser@headnode ~ $ qsub -w depend=afterok:`qsub -t 1-2 ./test1.sh` ./test2 && qstat 2555.testserver.domain job id                  name             user            time use s queue ----------------------- ---------------- --------------- -------- - ----- 2554-1.testserver       test1-1          testuser               0 q testqueue 2554-2.testserver       test1-1          testuser               0 q testqueue 2555.testserver         test2            testuser               0 h testqueue testuser@headnode ~ $ qstat job id                  name             user            time use s queue ----------------------- ---------------- --------------- -------- - ----- 2554-1.testserver       test1-1          testuser               0 r testqueue 2554-2.testserver       test1-2          testuser               0 r testqueue 2555.testserver         test2            testuser               0 h testqueue testuser@headnode ~ $ qstat job id                  name             user            time use s queue ----------------------- ---------------- --------------- -------- - ----- 2555.testserver         test2            testuser               0 h testqueue 

i guess due lack of array indication in job id returned first qsub:

testuser@headnode ~ $ qsub -t 1-2 ./test1.sh 2556.testserver.domain 

as can see there no ...[] indicating being job array. also, in qsub output there no ...[]s ...-1 , ...-2 indicating array.

so remaining question how format -w depend=afterok:... make job depend on specified job array.

filling in following solution suggested jonathan in comments.

there several resource managers based on original portable batch system: openpbs, torque , pbs professional. systems had diverged , use different command syntax newer features such job arrays.

job arrays convenient way submit multiple similar jobs based on same job script. quoting manual:

sometimes users want submit large numbers of jobs based on same job script. rather using script repeatedly call qsub, feature known job arrays exists allow creation of multiple jobs 1 qsub command.

to submit job array pbs provides following syntax:

 qsub -t 0-10,13,15 script.sh 

this submits jobs ids 0,1,2,...,10,13,15.

within script variable pbs_arrayid carries id of job within array , can used pick necessary configuration.

job array have specific dependency options.

torque

torque resource manager used in op. there additional dependency options provided can seen in following example:

$ qsub -t 1-1000 script.sh 1234[].pbsserver.domainname $ qsub -t 1001-2000 -w depend=afterokarray:1234[] script.sh 1235[].pbsserver.domainname 

this result in following qstat output

1234[]         script.sh    user          0 r queue 1235[]         script.sh    user          0 h queue    

tested on torque version 3.0.4

the full afterokarray syntax in qsub(1) manual.

pbs professional

in pbs professional dependencies can work uniformly on ordinary jobs , array jobs. here example:

$ qsub -j 1-1000 -ry script.sh 1234[].pbsserver.domainname $ qsub -j 1001-2000 -ry -w depend=afterok:1234[] script.sh 1235[].pbsserver.domainname 

this result in following qstat output

1234[]         script.sh    user          0 b queue 1235[]         script.sh    user          0 h queue    

update on torque versions

array dependencies became available in torque since version 2.5.3. job arrays version 2.5 not compatible job arrays in versions 2.3 or 2.4. in particular [] syntax introduced in torque since version 2.5.

update on using delimeter job

for torque versions prior 2.5 different solution may work based on submitting dummy delimeter jobs between batches of jobs separated. uses 3 dependency types: on,before, , after.

consider following example

 $ delim=`qsub -wdepend=on:1000 dummy.sh `  $ qsub -wdepend=beforeany:$delim script.sh  1001.pbsserver.domainname  ... 998 jobs ...  $ qsub -wdepend=beforeany:$delim script.sh  2000.pbsserver.domainname  $ qsub -wdepend=after:$delim script.sh  2001.pbsserver.domainname  ... 

this result in queue state this

1000         dummy.sh    user          0 h queue 1001         script.sh   user          0 r queue    ... 2000         script.sh   user          0 r queue    2001         script.sh   user          0 h queue ...    

that job #2001 run after previous 1000 jobs terminate. rudimentary job array facilities available in torque 2.4 can used submit script job.

this solution work torque version 2.5 , higher.


Comments

Popular posts from this blog

java - activate/deactivate sonar maven plugin by profile? -

python - TypeError: can only concatenate tuple (not "float") to tuple -

java - What is the difference between String. and String.this. ? -