-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to remove the schemaCache #6743
Conversation
Can you write some test cases? I’m thinking updating Mongo directly to see if Parse Cache updates? |
Oh I'm so bad at test... 😔 |
Just to let you know, I've tried this update in production and there's an improvement of around 30% in our latency |
@dplewis sorry I tried but I clearly know nothing about tests and flow type so it's a fail... |
@dplewis apparently there's a memory issue with my changes, but I can't figure out where... any idea? |
@SebC99 Thanks for testing this. I think we can go forward with this and improve it. |
@SebC99 Can you give me access to your branch? |
@SebC99 I recieved the following error. Uncaught exception: MongoError: The $changeStream stage is only supported on replica sets I use mongo atlas which always runs Replica Sets, we would need to document this as a requirement for using parse for with than one instance or just running tests. |
Sure! |
There problem I'm running into (it may not be a problem) there are a few places where the SchemaCache needs to be cleared reloaded such as deleting from the schema. I will try to get the tests to pass first then we can look into this proplem. |
You can still "clear" the cache with something like |
Where is this PR stuck at the moment? I think this would be a considerable improvement for many deployments. |
Don't know exactly but we have run it in production for quite a while now, and it's working great. |
As I understand from the thread, the open issues are
Did I miss anything? |
Also run benchmarks against it. I have some time this week to look back into it |
@davimacedo @SebC99 @TomWFox @mtrezza I wanted to get your opinion on this before I get started. This feature requires replica set for Mongo. For users with standalone instance they can be changed to single node replica set. Do you see any problems with this or a way to approach this? |
Do I understand this correctly: The change would disable the schema cache for MongoDB, which means that Parse Server will only work with replica set when using MongoDB.
In principle I don't see an issue with that. I think that any serious production and development environment is using a replica set. Replica sets are the de-facto standard, regardless of the node count. Even the smallest, free MongoDB Atlas plan uses a replica set. However, this seems to be a significant change in Parse Server. It may be that we come across certain edge cases or limitations when using these DB hooks. As a transition I would suggest to introduce a Parse Server configuration option to switch between using the schema cache and using DB hooks. Or maybe add the feature as experimental? |
First of all, why is replica set mandatory for this feature? I think it is important to understand. If that's the only way to go, I don't see any problem but I believe it should be a feature for the developers to opt in (disabled by default). I know that replica set is the right way to go with mongo but I am afraid that it can add more complexity for the developers that are trying out Parse Server for the first time. |
My plan was to add a new configuration Replica set if required to read the oplogs which is what $changeStream uses under the hood. |
Here are a few points.
The purpose of this feature is for users who have multiple parse instances that share the same db. If a user has a single instance this will work outside of the box which is the case for most new users. We can add a note saying: "If you are using Multiple Parse Server instances with the same MongoDB database, your database must be a sharded cluster / replica set and storage engine WiredTiger (default). Set the
Keeping the old singleSchemaCache will create complexity in the future when we try to optimize this further (reduce the number of loadSchema(), reduces calls to schema reloadData(), remove shared SchemaController, etc).
This is basically reverting most the changes made here. Unfortunately these DB hooks didn't exist 4 years ago.
We can probably do this as an alpha release or a major release @SebC99 has been using it for a while now in production. I'd prefer a missed db hook call (this will be handled by the current schema validation system) than doing 22 schema lookups to redis when I Parse.Object.saveAll([10 objects]); Most users had to migrate from MLab (deadline was 2 days ago) and like me chose Atlas for migration, which all plans use replica sets. Thoughts? |
It sounds good for me. Let's go fot it. |
Codecov Report
@@ Coverage Diff @@
## master #6743 +/- ##
===========================================
- Coverage 93.85% 81.07% -12.79%
===========================================
Files 169 168 -1
Lines 12407 12361 -46
===========================================
- Hits 11645 10022 -1623
- Misses 762 2339 +1577
Continue to review full report at Codecov.
|
|
IMO, we should be cautious of adding features only to one adapter when they can be implemented on both. If the changes to the Mongo side can be summarized and pointed to (it’s hard for me to tell from this PR what changes actually have to deal with the feature), I can help or attempt to implement the change for Postgres. Also, there are changes to the Postgres adapter, let me know if these were attempts to implement the feature and what problems are being run into. Of course, pointing to the added test cases will help also. |
@cbaker6 you should only look at the very first commits of this PR, which are not impacted by the merge of master branch for rebasing purposes.
So I'm guessing you should only look at the changes on
|
Are most of the tests currently failing because the CI does not start a MongoDB replica set but a single instance? Because from what I understand, the implementation for MongoDB is basically complete. |
I'm not sure, we should see an explicit "MongoError: The $changeStream stage is only supported on replica sets" error somewhere in the tests, shouldn't we? |
@SebC99 I created a new branch https://github.com/parse-community/parse-server/compare/improve-schema-cache It passes all test and has Postgres support. I'm going to run it in production for a while. I believe there are a few more places for optimization. |
@dplewis Can you already quantify any performance improvement? |
I'm currently building a test suite. Here is a simple test saving, querying and deleting 10,000 objects. Schema Lookups means querying the database for the schema.
I'll keep you all posted with further improvements. |
That's interesting. I would assume that there are no schema look-ups at all anymore except for the initial one on server start and a look-up triggered by the DB hook when the schema is modified. So the right column should always be 0. Is that what you meant by "few more places for optimization" or am I missing something? |
@mtrezza You got it right, there are still schema looks ups that haven't been removed yet or changed to integrate with the new schema cache. I'll create a PR with the new branch when its ready. |
Maybe the test somehow also counted queries to the schema object instead of only to the DB? |
@dplewis I ran Maybe I misunderstand the table above - or did you fix these in the meantime and hence they pass? |
Thats the difference between running in production and writing tests cases. In our tests cases we start with a brand new database between which allows to see what happens when we create a new schema, update and delete, what would happen during batch updates and requests etc. In production your schema rarely changes. @SebC99 Feel free to copy SchemaPerformance.spec from my branch to your branch and compare schema lookups.
These have been fixed since I posted that table. The Postgres tests started failing on my branch. I don't know if it's my local PG database or something else. |
@SebC99 Would you want to compare dplewis' branch to your current approach and see whether you have any feedback?
I'll look into that, maybe @cbaker6 also has some input there. |
@SebC99 I think we should use your implementation for now. We can add improvements slowly in the future. My branch will be a proof of concept. |
@dplewis I am currently trying out your branch in a dev environment and it works fine for MongoDB. It also passes all the tests. I assumed your branch was more recent, including also tests and Postgres support. |
To put this PR into a cost context: a deployment that has |
I had the same though when I saw a previous GraphQL conversation on TTL. Will the schemaCache always be up to date when you have multiple instances? I can foresee hooks being more performant for new additions to schema cache. I’ll run some tests to be sure. |
Yes, I think the main advantage of hooks is the reduction of schema inconsistencies among multiple instances, compared to long TTLs with |
I’ve never thought about that but yes I guess it’s kind of an infinite TTL with a better inconsistency management as there can’t be any missed cache invalidation |
and to replace it with the database live change streams