c++ - Porting threads to windows. Critical sections are very slow -
i'm porting code windows , found threading extremely slow. task takes 300 seconds on windows (with 2 xeon e5-2670 8 core 2.6ghz = 16 core) , 3.5 seconds on linux (xeon e5-1607 4 core 3ghz). using vs2012 express.
i've got 32 threads calling entercriticalsection(), popping 80 byte job of std::stack, leavecriticalsection , doing work (250k jobs in total).
before , after every critical section call print thread id , current time.
- the wait time single thread's lock ~160ms
- to pop job off stack takes ~3ms
- calling leave takes ~3ms
- the job takes ~1ms
(roughly same debug/release, debug takes little longer. i'd love able profile code :p)
commenting out job call makes whole process take 2 seconds (still more linux).
i've tried both queryperformancecounter , timegettime, both give approx same result.
afaik job never makes sync calls, can't explain slowdown unless does.
i have no idea why copying stack , calling pop takes long. confusing thing why call leave() takes long.
can speculate on why it's running slowly?
i wouldn't have thought difference in processor give 100x performance difference, @ related dual cpus? (having sync between separate cpus internal cores).
by way, i'm aware of std::thread want library code work pre c++11.
edit
//in while(hasjobs) loop... event qwe1 = {"lock", timegettime(), id}; events.push_back(qwe1); scene->jobmutex.lock(); event qwe2 = {"getjob", timegettime(), id}; events.push_back(qwe2); hasjobs = !scene->jobs.empty(); if (hasjobs) { job = scene->jobs.front(); scene->jobs.pop(); } event qwe3 = {"gotjob", timegettime(), id}; events.push_back(qwe3); scene->jobmutex.unlock(); event qwe4 = {"unlock", timegettime(), id}; events.push_back(qwe4); if (hasjobs) scene->performjob(job);
and mutex class, linux #ifdef stuff removed...
critical_section mutex; ... mutex::mutex() { initializecriticalsection(&mutex); } mutex::~mutex() { deletecriticalsection(&mutex); } void mutex::lock() { entercriticalsection(&mutex); } void mutex::unlock() { leavecriticalsection(&mutex); }
window's critical_section spins in tight loop when first enter it. not suspend thread called entercriticalsection unless substantial period has elapsed in spin loop. having 32 threads contending same critical section burn , waste lot of cpu cycles. try mutex instead (see createmutex).
Comments
Post a Comment