cuda - The behavior of stream 0 (default) and other streams -


in cuda, how stream 0 related other streams? stream 0 (default stream) execute concurrently other streams in context or not?

considering following example:

cudamemcpy(dst, src, sizeof(float)*datasize, cudamemcpyhosttodevice);//stream 0;  cudastream_t stream1;  /...creating stream1.../  somekernel<<<blocks, threads, 0, stream1>>>(dst);//stream 1; 

in above code, can compiler ensure somekernel launches after cudamemcpy finishes or somekernel execuate concurrently cudamemcpy?

cudamemcpy call (in particular case) synchronous call. host thread running code blocks until memory transfer host. cannot proceed launch kernel until cudamemcpy call has returned, doesn't happen until copy operation completed.

more generally, default stream (0 or null) implicitly serializes operations on gpu whenever operation active in stream. if create streams , push operations them @ same time operation being performed in default stream, concurrency in streams lost until default stream idle.


Comments

Popular posts from this blog

java - activate/deactivate sonar maven plugin by profile? -

python - TypeError: can only concatenate tuple (not "float") to tuple -

java - What is the difference between String. and String.this. ? -