Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Datastore Dependency & Fix Badger Slow Tests #6

Merged
merged 4 commits into from
May 7, 2020
Merged

Conversation

bonedaddy
Copy link
Contributor

@bonedaddy bonedaddy commented Apr 30, 2020

Upgrading ipfs/go-datastore to v0.4.4 from v0.3.1 caused this issue to come up. The solution to that was to enable async writes. I've only enabled async writes within the test, because I don't think defaulting to that is a good idea as it increases the risk fo data loss.

There are also reports of windows being faster than linux, however this is because windows ignores they sync flag

@github-actions
Copy link

CLA Assistant Lite All Contributors have signed the CLA.

@codecov
Copy link

codecov bot commented Apr 30, 2020

Codecov Report

Merging #6 into master will increase coverage by 0.36%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master       #6      +/-   ##
==========================================
+ Coverage   76.50%   76.86%   +0.36%     
==========================================
  Files           2        2              
  Lines         434      432       -2     
==========================================
  Hits          332      332              
+ Misses         78       76       -2     
  Partials       24       24              
Impacted Files Coverage Δ
badger/datastore.go 72.90% <ø> (+0.40%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2bbb979...d278522. Read the comment docs.

@jarifibrahim
Copy link

Hey! I work on badger. Could you please point me to slow badger tests? I can help fix them.

@jarifibrahim
Copy link

I see that you have test that do something like

for item := range list {
    // create badger transaction
    // add a single entry
    // commit the transaction
}

The single-entry transactions are known to be super slow. The reason being that every transaction has to be persisted to the disk. If you have many small transactions, it would mean there will be many small write system calls to store the transaction to the write-ahead log.

Badger has a writeBatch API which works similar to a transaction but allows you to bulk import data. It reduces the amount of write system calls that are required.

@bonedaddy
Copy link
Contributor Author

bonedaddy commented May 1, 2020

Hey! I work on badger. Could you please point me to slow badger tests? I can help fix them.

@jarifibrahim hey thanks for the response, this badger datastore is basically verbatim of https://github.com/ipfs/go-ds-badger except for badgerv2 and removal of jbenet/goprocess, and as such relies on upstream tests. The exact test that is being problematic is this one which is part of the TestSuite that ipfs/go-datastore uses to ensure consistency between different datastores.

Based on your comment however it looks like there's room for more improvement within badger.

The single-entry transactions are known to be super slow. The reason being that every transaction has to be persisted to the disk. If you have many small transactions, it would mean there will be many small write system calls to store the transaction to the write-ahead log.
Badger has a writeBatch API which works similar to a transaction but allows you to bulk import data. It reduces the amount of write system calls that are required.

Interesting, I did not know this, I will investigate this and see if the writeBatch API can be used instead.


@xiegeo I'm going to drop this PR back into draft mode until I can implement the suggestions by Ibrahim

@bonedaddy bonedaddy marked this pull request as draft May 1, 2020 20:11
@bonedaddy bonedaddy changed the title Update Datastore Dependency & Fix Badger Slow Tests Update Datastore Dependency & Fix Badger Slow Tests (WIP) May 1, 2020
@xiegeo
Copy link
Contributor

xiegeo commented May 5, 2020

IMO, using async just for testing is fine.

@jarifibrahim
Copy link

I agree with @xiegeo, unless you're trying to test some odd data consistency issues, async mode should be fine.

@bonedaddy
Copy link
Contributor Author

Makes sense, I will mark PR as ready and the suggestions you made about how to use batch can be done in a separate PR

@bonedaddy bonedaddy marked this pull request as ready for review May 7, 2020 01:44
@bonedaddy bonedaddy requested a review from xiegeo May 7, 2020 01:44
@bonedaddy bonedaddy changed the title Update Datastore Dependency & Fix Badger Slow Tests (WIP) Update Datastore Dependency & Fix Badger Slow Tests May 7, 2020
@bonedaddy bonedaddy merged commit 4671873 into master May 7, 2020
@github-actions github-actions bot locked and limited conversation to collaborators May 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants