Skip to content

Conversation

alex
Copy link
Contributor

@alex alex commented Aug 17, 2025

This shaves roughly 40% of the runtime off of pyca/cryptography's combine.

A major limitation is that it currently doesn't work with in-memory sqlite databases, I'm not sure if there's a good way to do that.

@alex
Copy link
Contributor Author

alex commented Aug 17, 2025

Also a few tests are failing 😂.

Before I run too much farther on this path, wanted to see how you feel about it?

@nedbat
Copy link
Owner

nedbat commented Aug 17, 2025

It's an interesting idea, thanks for starting it. What in-memory databases are you concerned about?

@alex
Copy link
Contributor Author

alex commented Aug 17, 2025

on_disk=True on CoverageData

@nedbat
Copy link
Owner

nedbat commented Aug 17, 2025

Right, i know the API makes it possible, I'm just wondering realistically when it happens.

@alex
Copy link
Contributor Author

alex commented Aug 17, 2025

Oh, I have no idea. I just assumed I wasn't allowed to break it either way :-) I think there's a way I can make it work though.

@alex alex force-pushed the combine-in-sqlite3 branch 4 times, most recently from 378c05b to 5da41b8 Compare August 17, 2025 23:38
@alex alex marked this pull request as ready for review August 17, 2025 23:43
@alex
Copy link
Contributor Author

alex commented Aug 18, 2025

This should be ready for review now, all green :-)

@nedbat
Copy link
Owner

nedbat commented Aug 18, 2025

Looking at a few recent runs, this seems to make my combine step slightly slower, though these are hardly precise measurements (with this: 2m 57s, 2m 59s), (without this: 2m 35s, 2m 48s). How many files are you combining, and what are the times?

@alex
Copy link
Contributor Author

alex commented Aug 18, 2025

29 .coverage files + 30 lcov files (that I pre-merge into a single CoverageData), in total its ~415MB of stuff.

On current coverage its consistently ~15 seconds and with this patch its consistently ~9 seconds. (Measured 5x times)

I'm guessing you must have a lot more data for it to nearly 3 minutes? I can profile (and maybe do some SQL EXPLAIN) on these queries to see where the offender is, though it may not be representative of your workload.

@alex
Copy link
Contributor Author

alex commented Aug 18, 2025

With some quick instrumentation, I can see that ~85% of my time is being spent merge the arcs data, are you using branch coverage for yours or line?

@nedbat
Copy link
Owner

nedbat commented Aug 18, 2025

I am using branch coverage, but I was also quoting the time for my "combine and report" step. Splitting it into parts shows that combine is not the problem:

Time for load: 0.00s
Time for combine: 2.34s
Time for save: 0.00s
Time for load: 0.00s
Wrote HTML report to htmlcov/index.html
Time for html: 22.74s
Wrote JSON report to coverage.json
Time for json: 120.34s

I have to assume my timing difference is noise. (the coverage.json is 2.8Gb, and I don't even keep it from CI!)

This shaves roughly 40% of the runtime off of pyca/cryptography's combine.

A major limitation is that it currently doesn't work with in-memory sqlite databases, I'm not sure if there's a good way to do that.
@alex alex force-pushed the combine-in-sqlite3 branch from 5da41b8 to 270ed94 Compare August 18, 2025 23:52
@alex
Copy link
Contributor Author

alex commented Aug 18, 2025

Well, investigating I found another ~2-2.5 seconds to save, I'm now down to 7s consistently.

@nedbat
Copy link
Owner

nedbat commented Aug 19, 2025

It's good: before this: 3.59s, 3.56s. with this: 1.47s, 1.51s. Excellent

@alex
Copy link
Contributor Author

alex commented Aug 19, 2025

Perfect!

@nedbat nedbat merged commit 81e0189 into nedbat:master Aug 19, 2025
46 checks passed
@nedbat
Copy link
Owner

nedbat commented Aug 19, 2025

This was amazing, thanks so much!

@alex
Copy link
Contributor Author

alex commented Aug 19, 2025

Thank you!

@alex alex deleted the combine-in-sqlite3 branch August 19, 2025 11:17
@nedbat
Copy link
Owner

nedbat commented Aug 23, 2025

This is now released as part of coverage 7.10.5.

@alex
Copy link
Contributor Author

alex commented Aug 23, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants