Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline TimePos and TimeSig functions to improve performance #7549

Merged
merged 2 commits into from
Oct 26, 2024

Conversation

sakertooth
Copy link
Contributor

@sakertooth sakertooth commented Oct 18, 2024

Inline the functions within TimePos and TimeSig to improve performance. I discovered this performance issue as I was profiling a project that had ~36K+ notes in it, running at 999 BPM.

In my discovery to figure why this project uses up so much CPU really, I tweaked the code inside InstrumentTrack::play a bit (this function was of interest since it was the one generating the NotePlayHandles) and removed the two while loops. CPU was of course 0% here, but what I found interesting was that adding the loop back in that iterates over all the notes and doing some small computation with the Note::pos function was taking around 20-30% of CPU alone, and doing this for other functions like Note::getVolume kept the CPU at 0%. I eventually realized with some profiling that the functions in TimePos have a lot of function call overhead, and inlining the TimePos::operator int() function made the CPU meter reach 0% again.

These are the changes I made when investigating the performance inside the InstrumentTrack::play function:

...
// if( cur_start > 0 )
// {
// // skip notes which are posated before start-bar
// while( nit != notes.end() && ( *nit )->pos() < cur_start )
// {
// ++nit;
// }
// }
for (const auto& note: notes)
{
played_a_note = note->pos() > 0; // the meaningless computation
// const auto currentNote = *nit;
...
}
...
return played_a_note;

I've attached a screenshot of the CPU meter after removing most of the note audio generation code in InstrumentTrack::play, and instead only doing a small amount of meaningless computation (master branch, Release build):
Pasted image 20241017221119

After inlining the functions inside TimePos:
Pasted image 20241017221427

In addition, this seems to have correlated greatly with the number of cache misses during execution. After inlining these functions, the cache miss amount has dropped significantly, most likely because there are less instructions to fetch and to deal with. I discovered this using perf and it was how I realized that something odd was happening with the TimePos::operator int() function.

perf report results from perf record -e cache-misses -p $(pidof lmms) -- sleep 5s before inlining (you will most likely see cache misses coming from a lot of other places in normal instances):
Pasted image 20241017222429

Same perf report after inlining:
Pasted image 20241017222721

Copy link
Contributor

@Rossmaxx Rossmaxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks sane. A few minor comments on my end.

include/TimePos.h Show resolved Hide resolved
include/TimePos.h Show resolved Hide resolved
Copy link
Contributor

@Rossmaxx Rossmaxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an improvement in performance, though not really near stable levels. Good step towards the right direction nevertheless.

@sakertooth sakertooth merged commit 1f37c9b into LMMS:master Oct 26, 2024
11 checks passed
@sakertooth sakertooth deleted the inline-timepos branch October 26, 2024 16:41
mmeeaallyynn pushed a commit to mmeeaallyynn/lmms that referenced this pull request Oct 27, 2024
…#7549)

* Inline TimeSig functions

* Inline TimePos functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants