Calling global function inside a CUDA Kernel -


i'm trying write cuda kernel function contains matrix multiplication, like:

__device__ matrix_multi(matrix a,matrix b,matrix c);   __global__ void foo(type para){        ....        matrix_multi(matrix a,matrix b,matrix c);        .... } 

i want accelerate matrix multiplication operation. have 2 choices:

first, using cublas library. second, write kernel matrix multiplication , call inside foo().

i failed in both cases.

can help?

i suggest not write own mat-mul kernel @ time. try cublas way.

cublas lib can called in kernel devices compute capability @ least equal 3.5. otherwise can called host side. check cc version before using cublas lib.


Comments

Popular posts from this blog

java - activate/deactivate sonar maven plugin by profile? -

python - TypeError: can only concatenate tuple (not "float") to tuple -

java - What is the difference between String. and String.this. ? -