cuda - The behavior of stream 0 (default) and other streams -
in cuda, how stream 0 related other streams? stream 0 (default stream) execute concurrently other streams in context or not?
considering following example:
cudamemcpy(dst, src, sizeof(float)*datasize, cudamemcpyhosttodevice);//stream 0; cudastream_t stream1; /...creating stream1.../ somekernel<<<blocks, threads, 0, stream1>>>(dst);//stream 1;
in above code, can compiler ensure somekernel
launches after cudamemcpy
finishes or somekernel
execuate concurrently cudamemcpy
?
cudamemcpy
call (in particular case) synchronous call. host thread running code blocks until memory transfer host. cannot proceed launch kernel until cudamemcpy
call has returned, doesn't happen until copy operation completed.
more generally, default stream (0 or null) implicitly serializes operations on gpu whenever operation active in stream. if create streams , push operations them @ same time operation being performed in default stream, concurrency in streams lost until default stream idle.
Comments
Post a Comment