CUDA constant memory, namespace, and weird bugs

Edit: the usage of cudaMemcpyToSymbol describded below is deprecated since CUDA 4.1 (See also my new entry Upgrade to CUDA 5.0: cudaMemcpyToSymbol invalid device symbol error)

Today I want to discuss some issues I had with CUDA constant memory and share some workarounds.

Constant memory is a usefull feature that enables CUDA programmers to share data within a kernel. Without constant  memory one can declare and use a global variable like this:

__device__ float c_array[10]

__global__ void kernel(float* d_array){ d_array[0] = c_array[0]; }

void test(){
    float* d_array = 0;
    float h_array[10] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};    
    cudaMalloc((void**)&d_array, 10 * sizeof(float));
    cudaMemcpyToSymbol("c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice );
    kernel<<< 1, 1 >>>(d_array);
}

 The global variable c_array is in global device memory. This type of memory is known to be the slowest memory you can access from a kernel. Instead one should prefer constant memory by putting __constant__ before c_array:

__device__ __constant__ float c_array[10]

[GPU specs]

I've been facing the runtime error "invalid device symbol" when doing something like:

namespace Test_name {
__device__ __constant__ float c_array[10]

void test(){    
    cudaMemcpyToSymbol("c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice );
}
} 

I'm sure people doing langage designs would find the error trivial but for me it took some time to figure out what it was. NVIDIA documentation is not really helping here. Explanations about constant memory and its usage are quiet sparse...

So here it goes: nvcc doesn't find the c_array symbol because it is declared within a namespace. The solution is to give the correct symbol:

cudaMemcpyToSymbol("Test_name::c_array", h_array, sizeof(float)*10, 0, cudaMemcpyHostToDevice );

A shame I had to guess this syntax because it is not describded anywhere in the CUDA documentation. Or at least it is too well hidden for me to find it!

You may face weird bugs with constant memory depending on your project and GPU architectures. Here is few things you can try:

Hope this will help.