开发者

OpenMP won't utilize all cores?

开发者 https://www.devze.com 2023-04-08 18:11 出处:网络
I\'m trying to use OpenMP to make some code parallel. omp_set_num_threads( 8 ); #pragma omp parallel for (int i = 0; i < verSize; ++i)

I'm trying to use OpenMP to make some code parallel.

    omp_set_num_threads( 8 );
    #pragma omp parallel 
    for (int i = 0; i < verSize; ++i)
    {
        #pragma omp single nowait
        { 
            neighVec[i].index = i;
            mesh.getBoxIntersecTets(mesh.vertexList->at(i), &neighVec[i]);
    开发者_运维问答    }
    }

verSize is about 90k, and getBoxIntersecTets is quite expensive. So I expect the code to fully utilize a quad core cpu. However the CPU usage is only about 25%. Any ideas?

I also tried using omp parallel for construct, but same story.

getBoxIntersecTets uses STL unordered_set, vector and deque, but I guess OpenMP should be agnostic about them, right?

Thanks.


First up, #pragma omp single is disabling parallel execution, you definitely don't want that.


Try this instead:

#pragma omp parallel for private(tempVec)
for (int i = 0; i < verSize; ++i)
{
    auto tempVec = neighVec[i];
    tempVec.index = i;
    mesh.getBoxIntersecTets(mesh.vertexList->at(i), &tempVec);
    neighVec[i] = tempVec;
}

The problem with your original code is that different threads are using adjacent elements of an array. Adjacent elements are placed next to each other in memory, which means they probably share a cache line. Since only one core can own a cache line at once, only one core can get work done at once. Or worse, your program may spend more time transferring ownership of the cache line than doing actual work.

By introducing a temporary variable, each worker can operate on an independent cache line, and then you only need access to the shared cache line at the end to store results. You should do the same thing for the first parameter if it's being passed by non-const reference.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号