Skip to content

Commit

Permalink
Faster NMS & body part connector
Browse files Browse the repository at this point in the history
  • Loading branch information
gineshidalgo99 committed May 2, 2019
1 parent 6d3ff8b commit 5f4cf6b
Show file tree
Hide file tree
Showing 6 changed files with 504 additions and 138 deletions.
5 changes: 4 additions & 1 deletion doc/release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,10 @@ OpenPose Library - Release Notes
## Current version - future OpenPose 1.5.0
1. Main improvements:
1. Added initial single-person tracker for further speed up or visual smoothing (`--tracking` flag).
2. Greedy body part connector implemented in CUDA: +~30% speed up in Nvidia (CUDA) version with default flags and +~10% in maximum accuracy configuration. In addition, it provides a small 0.5% boost in accuracy (default flags).
2. Speed up of the CUDA functions of OpenPose:
1. Greedy body part connector implemented in CUDA: +~30% speedup in Nvidia (CUDA) version with default flags and +~10% in maximum accuracy configuration. In addition, it provides a small 0.5% boost in accuracy (default flags).
2. +5-30% additional speedup for the body part connector of point 1.
3. 2-4x speedup for NMS.
3. Unity binding of OpenPose released. OpenPose adds the flag `BUILD_UNITY_SUPPORT` on CMake, which enables special Unity code so it can be built as a Unity plugin.
4. If camera is unplugged, OpenPose GUI and command line will display a warning and try to reconnect it.
5. Wrapper classes simplified and renamed. Wrapper renamed as WrapperT, and created Wrapper as the non-templated class equivalent.
Expand Down
1 change: 1 addition & 0 deletions doc/speed_up_openpose.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ Some speed tips to maximize the OpenPose runtime speed while preserving the accu
2. Change GPU rendering by CPU rendering to get approximately +0.5 FPS (`--render_pose 1`).
3. Use cuDNN 5.1 or 7.2 (cuDNN 6 is ~10% slower).
4. Use the `BODY_25` model for simultaneously maximum speed and accuracy (both COCO and MPII models are slower and less accurate). But it does increase the GPU memory, so it might go out of memory more easily in low-memory GPUs.
5. Enable the AVX flag in CMake-GUI (if your computer supports it).



Expand Down
1 change: 1 addition & 0 deletions include/openpose/utilities/profiler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ namespace op
} \
cudaDeviceSynchronize(); \
(finalTime) = (factor)/(float)(REPS)*getTimeSeconds(timerInit); \
cudaCheck(__LINE__, __FUNCTION__, __FILE__); \
}

// Enable PROFILER_ENABLED on Makefile.config or CMake in order to use this function. Otherwise nothing will be outputted.
Expand Down
17 changes: 12 additions & 5 deletions src/openpose/net/bodyPartConnectorBase.cpp
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#include <set>
#include <openpose/utilities/check.hpp>
#include <openpose/utilities/fastMath.hpp>
#include <openpose/pose/poseParameters.hpp>
Expand Down Expand Up @@ -459,6 +460,7 @@ namespace op
const auto peaksOffset = (maxPeaks+1);
// Save which body parts have been already assigned
std::vector<int> personAssigned(numberBodyParts*maxPeaks, -1);
std::set<int, std::greater<int>> indexesToRemoveSortedSet;
// Iterate over each PAF pair connection detected
// E.g., neck1-nose2, neck5-Lshoulder0, etc.
for (const auto& pairConnection : pairConnections)
Expand Down Expand Up @@ -592,18 +594,23 @@ namespace op
// Update score
peopleVector[assigned1].second += peopleVector[assigned2].second + pafScore;
// Erase the non-merged person
peopleVector.erase(peopleVector.begin()+assigned2);
// peopleVector.erase(peopleVector.begin()+assigned2); // x2 slower when removing on-the-fly
indexesToRemoveSortedSet.emplace(assigned2); // Add into set so we can remove them all at once
// Update associated personAssigned (person indexes have changed)
for (auto& element : personAssigned)
{
if (element == assigned2)
element = assigned1;
else if (element > assigned2)
element--;
// No need because I will only remove them at the very end
// else if (element > assigned2)
// element--;
}
}
}
}
// Remove unused people
for (const auto& index : indexesToRemoveSortedSet)
peopleVector.erase(peopleVector.begin()+index);
// Return result
return peopleVector;
}
Expand Down Expand Up @@ -685,7 +692,7 @@ namespace op
poseKeypoints.reset();
poseScores.reset();
}
const auto numberBodyPartsAndPAFs = numberBodyParts + numberBodyPartPairs;
const auto oneOverNumberBodyPartsAndPAFs = 1/T(numberBodyParts + numberBodyPartPairs);
for (auto person = 0u ; person < validSubsetIndexes.size() ; person++)
{
const auto& personPair = peopleVector[validSubsetIndexes[person]];
Expand All @@ -701,7 +708,7 @@ namespace op
poseKeypoints[baseOffset + 2] = peaksPtr[bodyPartIndex];
}
}
poseScores[person] = personPair.second / T(numberBodyPartsAndPAFs);
poseScores[person] = personPair.second * oneOverNumberBodyPartsAndPAFs;
}
}
catch (const std::exception& e)
Expand Down
Loading

0 comments on commit 5f4cf6b

Please sign in to comment.