Calling global function inside a CUDA Kernel -
i'm trying write cuda kernel function contains matrix multiplication, like:
__device__ matrix_multi(matrix a,matrix b,matrix c); __global__ void foo(type para){ .... matrix_multi(matrix a,matrix b,matrix c); .... }
i want accelerate matrix multiplication operation. have 2 choices:
first, using cublas library. second, write kernel matrix multiplication , call inside foo()
.
i failed in both cases.
can help?
i suggest not write own mat-mul kernel @ time. try cublas way.
cublas lib can called in kernel devices compute capability @ least equal 3.5. otherwise can called host side. check cc version before using cublas lib.
Comments
Post a Comment