-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Syncstorage inital sqlite support #1520
base: master
Are you sure you want to change the base?
Conversation
To pass the checks I have the choice between upgrading diesel or ignoring the issue with libsqlite-sys (which tbh is probably a non-issue because we should not be using SQLite printf anyway). |
1047197
to
9d83956
Compare
6927869
to
62774e3
Compare
62774e3
to
5ea3b39
Compare
I choose to ignore the libsqlite3-sys CVE because it's not impacting this project. That should let me run the other tests in the CI. |
clippy_sqlite: | ||
# Matches what's run in circleci | ||
cargo clippy --workspace --all-targets --no-default-features --features=syncstorage-db/sqlite,tokenserver-db/sqlite -- -D warnings | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if you saw #1513, but one of the things we're trying to do is to pull more of tokenserver into syncstorage-rs.
(Weird history lesson: Tokenserver was supposed to support more than just sync, but that never really happened. Having it distinct turned out to be a headache for a lot of reasons, so we're folding as much of Tokenserver as we can into the main syncstorage)
As a side note, this also means that the tokenserver storage stuff can probably live along-side the syncstorage data fairly easily. Possibly as just a bunch of other tables.
It's a bit more complicated for us since we have yet to migrate tokenserver over to spanner, but that's on our internal roadmap. When we get to doing that, however, is unknown.
We also have a few crate refactoring tickets open. I'll update those to indicate that we should really put the data handlers for both syncstorage and tokenserver into their own distinct crate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not saw that PR.
You want to remove tokenserver completely. Is that right ?
I don't understand what it means for my PR.
Should I change how I handle tokenserver ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that it impacts much of your PR, since you're only focused on the Storage part. There may be a few opportunities for folk running their own "Stand Alone" version to have just one database running instead of two, and depending on how we decide to do this for Spanner, there may be some additional functions we introduce to the database layer. We'll try to isolate and highlight those so that we don't cause more work.
331d54f
to
cd814d9
Compare
I'm really jazzed about this PR, thanks for all the work on it! I noticed this in the PR desc:
If this is a question of building a Docker container from this codebase that bundles all the SQLite deps and nothing more, I'm happy to take on that work. |
Something like that yes. There might be some docker-compose configuration files to make too. That would leave me to clean up my code |
Sounds good, I'll give it a shot! |
Okay, I think I've got something approximately working in Eragonfr/syncstorage-rs@syncstorage-sqlite...bcspragu:syncstorage-rs:syncstorage-sqlite E.g. docker build -t mozsync-sqlite --build-arg DATABASE_BACKEND=sqlite .
# The --userns line is a rootless Podman thing, can be ignored mostly
docker run --rm -it \
--userns=keep-id:uid=10001,gid=10001 \
-v $PWD/temp-data-do-not-submit:/data \
-e "SYNC_HOST=0.0.0.0" \
-e "SYNC_MASTER_SECRET=secret0" \
-e "SYNC_SYNCSTORAGE__DATABASE_URL=sqlite:///data/syncdb.sqlite" \
-e "SYNC_TOKENSERVER__DATABASE_URL=sqlite:///data/tokenserverdb.sqlite" \
-e "SYNC_TOKENSERVER__RUN_MIGRATIONS=true" \
mozsync-sqlite Seems to start up successfully, but I'm not brave enough to connect it to my real Firefox sync yet |
Thanks @bcspragu ! |
For sure! Also, looking at the current syncstorage-rs/.circleci/config.yml Line 67 in 4a503f8
Not sure what the team would prefer here, but there's two options:
Unrelated, one thing I forgot to mention is that regardless of which features are enabled, MySQL client libs are required on the host (e.g. there are places where it isn't gated behind the feature flag, it gets linked in regardless). I started adding the relevant |
I thought of dropping the default flags because the user should choose their storage backend instead of relying on a default one. We want to have the proper |
63942fe
to
f781fc2
Compare
5e2b64c
to
22d3aaf
Compare
The ci fails on sqlite e2e tests. I think it is because the /data/ folder is missing, which means that the database can't be created and that makes the ci to fail. The checks are unrelated and are probably failing on the master branch too. |
🤦♀️ I'm dumb the e2e tests can't work because they require a connection to the database from the python script and syncserver at the same time. Which is not possible with sqlite. |
7ca1578
to
1c08e86
Compare
Hi @Eragonfr , thanks for your hard work! I am trying to get this to run locally, but I can't execute this query against the token database, because the database file is empty. Am I doing something wrong? Could you try running the server with a missing token database and check if the server correctly sets it up? Thanks again! I am really looking forward to this! |
@JulianWgs I also have the issue. I think that I broke the auto migration when doing some refactor. EDIT: Can't reproduce that bug anymore… wtf |
ff5be44
to
a8dda23
Compare
8eb5041
to
75a9a26
Compare
If anyone has an idea on how to get the e2e tests to run, I'll take it. But I can't figure out how to force the Rust SQLite driver (diesel with r2d2) and the Python driver (sqlalchemy) to close the database after each transaction. One option would be to stop syncstorage after each transaction and restart it after the check. That's why there is currently no new progress. I just don't understand enough of both database driver to force them to do something that they are not designed to do. (I mean closing your connection is horrible in term of efficiency, a good driver should never do that after each transaction) |
I tried, it only caused the MySQL e2e test to fail c8f21bd |
At this point ffd1bd9 it's certain that it's python that holds the lock on the database but I really don't know how I can force it to close the file |
I found a way for python to not pool anything and not keep the lock on the file. One idea for syncstorage is that I have to implement a fake DbPool for SQLite e2e tests. This pool would open a new connection when a connection is requested from it. And close the connection as it is returned. EDIT: It seems far easier for me to add PostgreSQL as a backend and pass e2e tests immediately than to fix the mess with SQLite |
a2a033f
to
45f5c82
Compare
45f5c82
to
ed0e727
Compare
I'm just a passer-by offering my opinion, and I haven't tested this, but hopefully it helps. SQLite supports a busy handler and WAL mode to allow multiple writers and readers, across multiple processes, to use the same database without blocking each other. It's still only allowing one write transaction at a time, but processes automatically coordinate with each other through a shared memory file as to who has the lock. Here is a good article on StackOverflow on enabling WAL with Diesel / r2d2, including some sample code for inspiration. I imagine you would only want to enable this for tests where you want this concurrent access. I haven't used SQLAlchemy, but the Python tests should use the same SQLite pragmas as Rust to access the database file, e.g. |
Description
Re-implementation of #935 on the newer codebase.
Thanks for @mqus for the first implementation.
Issue(s)
Partially fixes #498
Todo
Current state is that it runs on my computer, but it's probably not clean at all, there's still a lot of duplicate code, and some SQL queries are still in MySQL-flavored SQL but could be used by the SQLite backend.