开发者

OpenCL: Strange buffer or image bahaviour with NVidia but not Amd

开发者 https://www.devze.com 2023-03-14 01:36 出处:网络
I have a big problem (on Linux): I create a buffer with defined data, then an OpenCL kernel takes this data and puts it into an image2d_t. When working on an AMD C50 (Fusion CPU/GPU) the program works

I have a big problem (on Linux): I create a buffer with defined data, then an OpenCL kernel takes this data and puts it into an image2d_t. When working on an AMD C50 (Fusion CPU/GPU) the program works as desired, but on my GeForce 9500 GT the given kernel computes the correct result very rarely. Sometimes the result is correct, but very often it is incorrect. Sometimes it depends on very strange changes like removing unused variable declarations or adding a newline. I realized that disabling the optimization will increase the probability to fail. I have the most actual display driver in both systems.

Here is my reduced code:

#include <CL/cl.h>
#include <string>
#include <iostream>
#include <sstream>
#include <cmath>

    void checkOpenCLErr(cl_int err, std::string name){
        const char* errorString[] = {
            "CL_SUCCESS",
            "CL_DEVICE_NOT_FOUND",
            "CL_DEVICE_NOT_AVAILABLE",
            "CL_COMPILER_NOT_AVAILABLE",
            "CL_MEM_OBJECT_ALLOCATION_FAILURE",
            "CL_OUT_OF_RESOURCES",
            "CL_OUT_OF_HOST_MEMORY",
            "CL_PROFILING_INFO_NOT_AVAILABLE",
            "CL_MEM_COPY_OVERLAP",
            "CL_IMAGE_FORMAT_MISMATCH",
            "CL_IMAGE_FORMAT_NOT_SUPPORTED",
            "CL_BUILD_PROGRAM_FAILURE",
            "CL_MAP_FAILURE",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "",
            "CL_INVALID_VALUE",
            "CL_INVALID_DEVICE_TYPE",
            "CL_INVALID_PLATFORM",
            "CL_INVALID_DEVICE",
            "CL_INVALID_CONTEXT",
            "CL_INVALID_QUEUE_PROPERTIES",
            "CL_INVALID_COMMAND_QUEUE",
            "CL_INVALID_HOST_PTR",
            "CL_INVALID_MEM_OBJECT",
            "CL_INVALID_IMAGE_FORMAT_DESCRIPTOR",
            "CL_INVALID_IMAGE_SIZE",
            "CL_INVALID_SAMPLER",
            "CL_INVALID_BINARY",
            "CL_INVALID_BUILD_OPTIONS",
            "CL_INVALID_PROGRAM",
            "CL_INVALID_PROGRAM_EXECUTABLE",
            "CL_INVALID_KERNEL_NAME",
            "CL_INVALID_KERNEL_DEFINITION",
            "CL_INVALID_KERNEL",
            "CL_INVALID_ARG_INDEX",
            "CL_INVALID_ARG_VALUE",
            "CL_INVALID_ARG_SIZE",
            "CL_INVALID_KERNEL_ARGS",
            "CL_INVALID_WORK_DIMENSION",
            "CL_INVALID_WORK_GROUP_SIZE",
            "CL_INVALID_WORK_ITEM_SIZE",
            "CL_INVALID_GLOBAL_OFFSET",
            "CL_INVALID_EVENT_WAIT_LIST",
            "CL_INVALID_EVENT",
            "CL_INVALID_OPERATION",
            "CL_INVALID_GL_OBJECT",
            "CL_INVALID_BUFFER_SIZE",
            "CL_INVALID_MIP_LEVEL",
            "CL_INVALID_GLOBAL_WORK_SIZE",
        };
        if (err != CL_SUCCESS) {
            std::stringstream str;
            str << errorString[-err] << " (" << err << ")";
            throw std::string(name)+(str.str());
        }
    }

int main(){
    try{
        cl_context m_context;
        cl_platform_id* m_platforms;
        unsigned int m_numPlatforms;
        cl_command_queue m_queue;
        cl_device_id m_device;
        cl_int error = 0;   // Used to handle error codes
        clGetPlatformIDs(0,NULL,&m_numPlatforms);
        m_platforms = new cl_platform_开发者_JAVA百科id[m_numPlatforms];
        error = clGetPlatformIDs(m_numPlatforms,m_platforms,&m_numPlatforms);
        checkOpenCLErr(error, "getPlatformIDs");

        // Device
        error = clGetDeviceIDs(m_platforms[0], CL_DEVICE_TYPE_GPU, 1, &m_device, NULL);
        checkOpenCLErr(error, "getDeviceIDs");

        // Context
        cl_context_properties properties[] =
            { CL_CONTEXT_PLATFORM, (cl_context_properties)(m_platforms[0]), 0};
        m_context = clCreateContextFromType(properties, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);
        //  m_private->m_context = clCreateContext(properties, 1, &m_private->m_device, NULL, NULL, &error);
        checkOpenCLErr(error, "Create context");
        // Command-queue
        m_queue = clCreateCommandQueue(m_context, m_device, 0, &error);
        checkOpenCLErr(error, "Create command queue");
        //Build program and kernel
        const char* source = "#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enable\n"
            "\n"
            "__kernel void bufToImage(__global unsigned char* in,  __write_only image2d_t out, const unsigned int offset_x, const unsigned int image_width , const unsigned int maxval ){\n"
                "\tint i = get_global_id(0);\n"
                "\tint j = get_global_id(1);\n"
                "\tint width = get_global_size(0);\n"
                "\tint height = get_global_size(1);\n"
                "\n"
                "\tint pos = j*image_width*3+(offset_x+i)*3;\n"
                "\tif( maxval < 256 ){\n"
                    "\t\tfloat4 c = (float4)(in[pos],in[pos+1],in[pos+2],1.0f);\n"
                    "\t\tc.x /= maxval;\n"
                    "\t\tc.y /= maxval;\n"
                "\t\tc.z /= maxval;\n"
                "\t\twrite_imagef(out, (int2)(i,j), c);\n"
            "\t}else{\n"
                "\t\tfloat4 c = (float4)(255.0f*in[2*pos]+in[2*pos+1],255.0f*in[2*pos+2]+in[2*pos+3],255.0f*in[2*pos+4]+in[2*pos+5],1.0f);\n"
                "\t\tc.x /= maxval;\n"
                "\t\tc.y /= maxval;\n"
                "\t\tc.z /= maxval;\n"
                "\t\twrite_imagef(out, (int2)(i,j), c);\n"
            "\t}\n"
        "}\n"
        "\n"
        "__constant sampler_t imageSampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;\n"
        "\n"
        "__kernel void imageToBuf(__read_only image2d_t in, __global unsigned char* out, const unsigned int offset_x, const unsigned int image_width ){\n"
            "\tint i = get_global_id(0);\n"
            "\tint j = get_global_id(1);\n"
            "\tint pos = j*image_width*3+(offset_x+i)*3;\n"
            "\tfloat4 c = read_imagef(in, imageSampler, (int2)(i,j));\n"
            "\tif( c.x <= 1.0f && c.y <= 1.0f && c.z <= 1.0f ){\n"
                "\t\tout[pos] = c.x*255.0f;\n"
                "\t\tout[pos+1] = c.y*255.0f;\n"
                "\t\tout[pos+2] = c.z*255.0f;\n"
            "\t}else{\n"
                "\t\tout[pos] = 200.0f;\n"
                "\t\tout[pos+1] = 0.0f;\n"
                "\t\tout[pos+2] = 255.0f;\n"
            "\t}\n"
        "}\n";
    cl_int err;
    cl_program prog = clCreateProgramWithSource(m_context,1,&source,NULL,&err);
    if( -err != CL_SUCCESS ) throw std::string("clCreateProgramWithSources");
    err = clBuildProgram(prog,0,NULL,"-cl-opt-disable",NULL,NULL);
    if( -err != CL_SUCCESS ) throw std::string("clBuildProgram(fromSources)");
    cl_kernel kernel = clCreateKernel(prog,"bufToImage",&err);
    checkOpenCLErr(err,"CreateKernel");

    cl_uint imageWidth = 80;
    cl_uint imageHeight = 90;
    //Initialize datas
    cl_uint maxVal = 255;
    cl_uint offsetX = 0;
    int size = imageWidth*imageHeight*3;
    int resSize = imageWidth*imageHeight*4;
    cl_uchar* data = new cl_uchar[size];
    cl_float* expectedData = new cl_float[resSize];
    for( int i = 0,j=0; i < size; i++,j++ ){
        data[i] = (cl_uchar)i;
        expectedData[j] = (cl_float)((unsigned char)i)/255.0f;
        if ( i%3 == 2 ){
            j++;
            expectedData[j] = 1.0f;
        }
    }
    cl_mem inBuffer = clCreateBuffer(m_context,CL_MEM_READ_ONLY|CL_MEM_COPY_HOST_PTR,size*sizeof(cl_uchar),data,&err);
    checkOpenCLErr(err, "clCreateBuffer()");

    clFinish(m_queue);
    cl_image_format imgFormat;
    imgFormat.image_channel_order = CL_RGBA;
    imgFormat.image_channel_data_type = CL_FLOAT;
    cl_mem outImg = clCreateImage2D( m_context, CL_MEM_READ_WRITE, &imgFormat, imageWidth, imageHeight, 0, NULL, &err );
    checkOpenCLErr(err,"get2DImage()");
    clFinish(m_queue);
    size_t kernelRegion[]={imageWidth,imageHeight};
    size_t kernelWorkgroup[]={1,1};
    //Fill kernel with data
    clSetKernelArg(kernel,0,sizeof(cl_mem),&inBuffer);
    clSetKernelArg(kernel,1,sizeof(cl_mem),&outImg);
    clSetKernelArg(kernel,2,sizeof(cl_uint),&offsetX);
    clSetKernelArg(kernel,3,sizeof(cl_uint),&imageWidth);
    clSetKernelArg(kernel,4,sizeof(cl_uint),&maxVal);

    //Run kernel
    err = clEnqueueNDRangeKernel(m_queue,kernel,2,NULL,kernelRegion,kernelWorkgroup,0,NULL,NULL);
    checkOpenCLErr(err,"RunKernel");
    clFinish(m_queue);
    //Check resulting data for validty
    cl_float* computedData = new cl_float[resSize];;
    size_t region[]={imageWidth,imageHeight,1};
    const size_t offset[] = {0,0,0};
    err = clEnqueueReadImage(m_queue,outImg,CL_TRUE,offset,region,0,0,computedData,0,NULL,NULL);
    checkOpenCLErr(err, "readDataFromImage()");
    clFinish(m_queue);


    for( int i = 0; i < resSize; i++ ){
        if( fabs(expectedData[i]-computedData[i])>0.1 ){
            std::cout << "Expected: \n";
            for( int j = 0; j < resSize; j++ ){
                std::cout << expectedData[j] << " ";
            }
            std::cout << "\nComputed: \n";
            std::cout << "\n";
            for( int j = 0; j < resSize; j++ ){
                std::cout << computedData[j] << " ";
            }
            std::cout << "\n";
            throw std::string("Error, computed and expected data are not the same!\n");
            }
        }

    }catch(std::string& e){
        std::cout << "\nCaught an exception: " << e << "\n";
        return 1;
    }
    std::cout << "Works fine\n";
    return 0;
}

I also uploaded the source code for you to make it easier to test it: http://www.file-upload.net/download-3524302/strangeOpenCLError.cpp.html

Please can you tell me if I've done wrong anything? Is there any mistake in the code or is this a bug in my driver?

Best reagards, Alex

Edit: changed the program (both: here and the linked one) a little bit to make it more likely to get a mismatch.


I found the bug and this is an annoying one:

When working under linux and just linking the OpenCL program with the most actual "OpenCV" library (yes, the computation lib), the binary parts of the kernels, which get compiled and cached in ~/.nv are damaged.

Can you please install the actual OpenCV library and execute following commands:

Generating bad kernel maybe leading sometimes to bad behaviour:

rm -R ~/.nv && g++ strangeOpenCLError.cpp -lOpenCL -lopencv_gpu -o strangeOpenCLError && ./strangeOpenCLError && ls -la ~/.nv/ComputeCache/*/*

Generating good kernel which performs as desired:

rm -R ~/.nv && g++ strangeOpenCLError.cpp -lOpenCL -o strangeOpenCLError && ./strangeOpenCLError && ls -la ~/.nv/ComputeCache/*/*

In my system when using -lopencv_gpu or -lopencv_core I get a kernel object in ~/.nv with a slightly other size due to sightly different binary parts. So these smaller kernels computed bad results in my systems.

The problem is that the bug does not always appear: Sometimes just when working on buffers, which are big enough. So the more relyable measurement is the different kernel-cache size. I edited the program in my question, now it is more likely that it will create the bad result.

Best regards, Alex

PS: I also created a bug report at NVidia and it is in progress. They could reproduce the bug on their system.


To turn off Nvidia compiler cache, set env. variable CUDA_CACHE_DISABLE=1. That may helps to avoid the problem in future.


In line

m_context = clCreateContextFromType(properties, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);

you should use &error as last parameter to get a meaningful error. Without it I got some silly error messages. (I needed to change the platform to get my GPU board.)

I can not reproduce the error with my nVidia GeForce 8600 GTS. I get a 'Works fine'. I tried it >20 times without any issue.

I also can not see any error beside that you code is a little confusing. You should remove all commented out code and introduce some blank lines for grouping the code a little bit.

Do you have the latest drivers? The behavior you describe sounds very familiar like an uninitialized buffer or variable, but I do not see anything like that.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号