Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to remove the schemaCache #6743

Closed
wants to merge 8 commits into from

Conversation

SebC99
Copy link
Contributor

@SebC99 SebC99 commented Jun 19, 2020

and to replace it with the database live change streams

  • add single schema cache for MongoAdapter
  • add single schema cache for PostgreSQL adapter
  • add specific tests

@dplewis
Copy link
Member

dplewis commented Jun 20, 2020

Can you write some test cases?

I’m thinking updating Mongo directly to see if Parse Cache updates?

@SebC99
Copy link
Contributor Author

SebC99 commented Jun 20, 2020

Oh I'm so bad at test... 😔
That's what I've done locally, changing the Schema with mongo shell to see it's changed in Parse, but I have no idea how could I do that in tests....
It would need to create a fake Schema class with MongoClient, and update it, but the cache is no more publicly available as it's just a private property of the SchemaController.

@SebC99
Copy link
Contributor Author

SebC99 commented Jun 22, 2020

Just to let you know, I've tried this update in production and there's an improvement of around 30% in our latency

@SebC99
Copy link
Contributor Author

SebC99 commented Jun 25, 2020

@dplewis sorry I tried but I clearly know nothing about tests and flow type so it's a fail...
If you have time to look at it and add the postgresql version, it would great!

@SebC99
Copy link
Contributor Author

SebC99 commented Jul 5, 2020

@dplewis apparently there's a memory issue with my changes, but I can't figure out where... any idea?

@SebC99
Copy link
Contributor Author

SebC99 commented Jul 8, 2020

Ok so no memory leaks after all...
we do have a memory leak but it’s due to something else (logs...), and the reduced latency increased our servers lifetime, making this leak appear. 😅
So we can go on with this work.
Do you have time to look into it @dplewis? Or @mtrezza?

@dplewis
Copy link
Member

dplewis commented Jul 10, 2020

@SebC99 Thanks for testing this. I think we can go forward with this and improve it.

@dplewis
Copy link
Member

dplewis commented Jul 17, 2020

@SebC99 Can you give me access to your branch?

@dplewis
Copy link
Member

dplewis commented Jul 17, 2020

@SebC99 I recieved the following error.

Uncaught exception: MongoError: The $changeStream stage is only supported on replica sets

I use mongo atlas which always runs Replica Sets, we would need to document this as a requirement for using parse for with than one instance or just running tests.

@SebC99
Copy link
Contributor Author

SebC99 commented Jul 17, 2020

Sure!
Ah yes you’re right, replica set is mandatory for this. I use Atlas too, so it was not a subject for me

@dplewis
Copy link
Member

dplewis commented Jul 30, 2020

There problem I'm running into (it may not be a problem) there are a few places where the SchemaCache needs to be cleared reloaded such as deleting from the schema. watch() will update the SchemaCache but after a few milliseconds. I think we can get away with this as user rarely delete from the schema.

I will try to get the tests to pass first then we can look into this proplem.

@SebC99
Copy link
Contributor Author

SebC99 commented Jul 30, 2020

You can still "clear" the cache with something like this._cache.allClasses = undefined; if you need an immediate change of the cache I think

@mtrezza
Copy link
Member

mtrezza commented Oct 25, 2020

Where is this PR stuck at the moment? I think this would be a considerable improvement for many deployments.

@SebC99
Copy link
Contributor Author

SebC99 commented Oct 25, 2020

Don't know exactly but we have run it in production for quite a while now, and it's working great.

@mtrezza
Copy link
Member

mtrezza commented Oct 25, 2020

As I understand from the thread, the open issues are

  • only works on replica sets -> needs proper handling / error message for non-replica and docs
  • watch latency when deleting from schema -> we can tolerate the milliseconds latency; manually deleting from schema wouldn't work in a cluster with multiple server instances, because deleting would only affect one instance
  • writing tests
  • writing docs
  • optional: add support for Postgres -> could be a separate, later PR if that's complicated

Did I miss anything?

@dplewis
Copy link
Member

dplewis commented Oct 25, 2020

Also run benchmarks against it. I have some time this week to look back into it

@dplewis
Copy link
Member

dplewis commented Nov 13, 2020

@davimacedo @SebC99 @TomWFox @mtrezza

I wanted to get your opinion on this before I get started. This feature requires replica set for Mongo. For users with standalone instance they can be changed to single node replica set.
https://docs.mongodb.com/manual/tutorial/convert-standalone-to-replica-set/

Do you see any problems with this or a way to approach this?

@mtrezza
Copy link
Member

mtrezza commented Nov 13, 2020

Do I understand this correctly:

The change would disable the schema cache for MongoDB, which means that Parse Server will only work with replica set when using MongoDB.

  • no action required for deployments using MongoDB with replica set (single or multi-node)
  • action required for deployments using MongoDB without replica set -> need to convert according to MongoDB guide
  • no action required for deployments using Postgres (Postgres still uses schema cache for now)

In principle I don't see an issue with that. I think that any serious production and development environment is using a replica set. Replica sets are the de-facto standard, regardless of the node count. Even the smallest, free MongoDB Atlas plan uses a replica set.

However, this seems to be a significant change in Parse Server. It may be that we come across certain edge cases or limitations when using these DB hooks. As a transition I would suggest to introduce a Parse Server configuration option to switch between using the schema cache and using DB hooks. Or maybe add the feature as experimental?

@davimacedo
Copy link
Member

First of all, why is replica set mandatory for this feature? I think it is important to understand.

If that's the only way to go, I don't see any problem but I believe it should be a feature for the developers to opt in (disabled by default).

I know that replica set is the right way to go with mongo but I am afraid that it can add more complexity for the developers that are trying out Parse Server for the first time.

@dplewis
Copy link
Member

dplewis commented Nov 13, 2020

My plan was to add a new configuration replicaSet: true for users to opt in. We can keep the existing singleSchemaCache and if users opt in load the new schema cache.

Replica set if required to read the oplogs which is what $changeStream uses under the hood.

@dplewis
Copy link
Member

dplewis commented Nov 13, 2020

Here are a few points.

add more complexity for the developers that are trying out Parse Server for the first time

The purpose of this feature is for users who have multiple parse instances that share the same db. If a user has a single instance this will work outside of the box which is the case for most new users.

We can add a note saying: "If you are using Multiple Parse Server instances with the same MongoDB database, your database must be a sharded cluster / replica set and storage engine WiredTiger (default). Set the replicaSet: true in your Parse Server Configuration"

it should be a feature for the developers to opt in

Keeping the old singleSchemaCache will create complexity in the future when we try to optimize this further (reduce the number of loadSchema(), reduces calls to schema reloadData(), remove shared SchemaController, etc).

However, this seems to be a significant change in Parse Server.

This is basically reverting most the changes made here. Unfortunately these DB hooks didn't exist 4 years ago.

It may be that we come across certain edge cases or limitations when using these DB hooks.

We can probably do this as an alpha release or a major release @SebC99 has been using it for a while now in production. I'd prefer a missed db hook call (this will be handled by the current schema validation system) than doing 22 schema lookups to redis when I Parse.Object.saveAll([10 objects]);

Most users had to migrate from MLab (deadline was 2 days ago) and like me chose Atlas for migration, which all plans use replica sets.

Thoughts?

@davimacedo
Copy link
Member

It sounds good for me. Let's go fot it.

@codecov
Copy link

codecov bot commented Nov 13, 2020

Codecov Report

Merging #6743 (873168f) into master (c1971b2) will decrease coverage by 12.78%.
The diff coverage is 59.21%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #6743       +/-   ##
===========================================
- Coverage   93.85%   81.07%   -12.79%     
===========================================
  Files         169      168        -1     
  Lines       12407    12361       -46     
===========================================
- Hits        11645    10022     -1623     
- Misses        762     2339     +1577     
Impacted Files Coverage Δ
src/Adapters/Auth/vkontakte.js 89.47% <0.00%> (ø)
src/Options/Definitions.js 100.00% <ø> (ø)
src/Options/index.js 100.00% <ø> (ø)
src/PromiseRouter.js 90.58% <ø> (+0.47%) ⬆️
...dapters/Storage/Postgres/PostgresStorageAdapter.js 2.42% <1.14%> (-93.54%) ⬇️
src/GraphQL/helpers/objectsQueries.js 58.46% <40.00%> (-32.38%) ⬇️
src/GraphQL/helpers/objectsMutations.js 33.33% <50.00%> (-50.01%) ⬇️
src/Adapters/Auth/google.js 92.95% <66.66%> (ø)
src/GraphQL/loaders/filesMutations.js 83.33% <66.66%> (ø)
src/GraphQL/loaders/defaultGraphQLTypes.js 90.10% <89.47%> (-6.96%) ⬇️
... and 40 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c1971b2...873168f. Read the comment docs.

@mtrezza
Copy link
Member

mtrezza commented Nov 26, 2020

  • Do we need to add Postgres support with the PR already, or can this come at a later stage?
    @cbaker6 would you want to look into replicating this feature for Postgres?

@cbaker6
Copy link
Contributor

cbaker6 commented Nov 28, 2020

IMO, we should be cautious of adding features only to one adapter when they can be implemented on both.

If the changes to the Mongo side can be summarized and pointed to (it’s hard for me to tell from this PR what changes actually have to deal with the feature), I can help or attempt to implement the change for Postgres. Also, there are changes to the Postgres adapter, let me know if these were attempts to implement the feature and what problems are being run into. Of course, pointing to the added test cases will help also.

@SebC99
Copy link
Contributor Author

SebC99 commented Nov 28, 2020

@cbaker6 you should only look at the very first commits of this PR, which are not impacted by the merge of master branch for rebasing purposes.
To summarize:

  • the goal is to remove all the calls to SchemaCache
  • and to replace it by a simple object singleSchemaCache in the SchemaController
  • and to do that, the MongoStorageAdapter subscribes to change events on the _Schema collection
  • a new watch(callback:() => void): void method in the storageAdapter is responsible to set the method called by a change event. The SchemaController uses this method to cache the new version of the schema each time the adapter receives a change event.

So I'm guessing you should only look at the changes on MongoStorageAdapter in the first commit to add the same on PostGres:

  • add the watch method to save the callback somewhere in the adapter (for Mongo it is saved in the _onchange property of the adapter)
  • subscribe to any changes on the _Schema collection, and call the saved callback (_onchange)

@mtrezza
Copy link
Member

mtrezza commented Nov 28, 2020

Are most of the tests currently failing because the CI does not start a MongoDB replica set but a single instance? Because from what I understand, the implementation for MongoDB is basically complete.

@SebC99
Copy link
Contributor Author

SebC99 commented Nov 28, 2020

I'm not sure, we should see an explicit "MongoError: The $changeStream stage is only supported on replica sets" error somewhere in the tests, shouldn't we?
Maybe the merge?

@dplewis
Copy link
Member

dplewis commented Dec 13, 2020

@SebC99 I created a new branch https://github.com/parse-community/parse-server/compare/improve-schema-cache

It passes all test and has Postgres support. I'm going to run it in production for a while. I believe there are a few more places for optimization.

@mtrezza
Copy link
Member

mtrezza commented Dec 15, 2020

@dplewis Can you already quantify any performance improvement?

@dplewis
Copy link
Member

dplewis commented Dec 15, 2020

I'm currently building a test suite. Here is a simple test saving, querying and deleting 10,000 objects. Schema Lookups means querying the database for the schema.

10,000 objects Schema Lookups Before Schema Lookups After
save 12937 12902
saveAll 520 520
destroyAll 500 0
query 1 0

I'll keep you all posted with further improvements.

@mtrezza
Copy link
Member

mtrezza commented Dec 15, 2020

That's interesting. I would assume that there are no schema look-ups at all anymore except for the initial one on server start and a look-up triggered by the DB hook when the schema is modified. So the right column should always be 0. Is that what you meant by "few more places for optimization" or am I missing something?

@dplewis
Copy link
Member

dplewis commented Dec 15, 2020

@mtrezza You got it right, there are still schema looks ups that haven't been removed yet or changed to integrate with the new schema cache. I'll create a PR with the new branch when its ready.

@SebC99
Copy link
Contributor Author

SebC99 commented Dec 15, 2020

@dplewis @mtrezza I don't understand, there should be almost no request to the _Schema class in database. In our case we have almost none!
We do have requests to the schema object containing the cache, but not to the database...

@mtrezza
Copy link
Member

mtrezza commented Dec 15, 2020

We do have requests to the schema object containing the cache, but not to the database

Maybe the test somehow also counted queries to the schema object instead of only to the DB?

@mtrezza
Copy link
Member

mtrezza commented Dec 26, 2020

10,000 objects Schema Lookups Before Schema Lookups After
save 12937 12902
saveAll 520 520
destroyAll 500 0
query 1 0

@dplewis I ran SchemaPerformance.spec.js but test new object and test saveAll / destroyAll shows 0 calls for saveAll / save. Also, couldn't see any flakiness for test test saveAll / destroyAll.

Maybe I misunderstand the table above - or did you fix these in the meantime and hence they pass?

@dplewis
Copy link
Member

dplewis commented Dec 26, 2020

I don't understand, there should be almost no request to the _Schema class in database. In our case we have almost none!

Thats the difference between running in production and writing tests cases. In our tests cases we start with a brand new database between which allows to see what happens when we create a new schema, update and delete, what would happen during batch updates and requests etc. In production your schema rarely changes.

@SebC99 Feel free to copy SchemaPerformance.spec from my branch to your branch and compare schema lookups.

Maybe I misunderstand the table above - or did you fix these in the meantime and hence they pass?

These have been fixed since I posted that table. The Postgres tests started failing on my branch. I don't know if it's my local PG database or something else.

@mtrezza
Copy link
Member

mtrezza commented Dec 26, 2020

@SebC99 Feel free to copy SchemaPerformance.spec from my branch to your branch and compare schema lookups

@SebC99 Would you want to compare dplewis' branch to your current approach and see whether you have any feedback?

The Postgres tests started failing on my branch. I don't know if it's my local PG database or something else.

I'll look into that, maybe @cbaker6 also has some input there.

@mtrezza mtrezza closed this Dec 26, 2020
@mtrezza mtrezza reopened this Dec 26, 2020
@dplewis
Copy link
Member

dplewis commented Dec 30, 2020

@SebC99 I think we should use your implementation for now. We can add improvements slowly in the future. My branch will be a proof of concept.

@mtrezza
Copy link
Member

mtrezza commented Dec 30, 2020

@dplewis I am currently trying out your branch in a dev environment and it works fine for MongoDB. It also passes all the tests. I assumed your branch was more recent, including also tests and Postgres support.

@mtrezza
Copy link
Member

mtrezza commented Jan 5, 2021

To put this PR into a cost context: a deployment that has singleSchemaCache enabled with a long enough TTL won't see a significant reduction in DB schema reads when switching to using hooks with this PR, is that understanding correct?

@dplewis
Copy link
Member

dplewis commented Jan 5, 2021

I had the same though when I saw a previous GraphQL conversation on TTL. Will the schemaCache always be up to date when you have multiple instances? I can foresee hooks being more performant for new additions to schema cache. I’ll run some tests to be sure.

@mtrezza
Copy link
Member

mtrezza commented Jan 5, 2021

I can foresee hooks being more performant for new additions to schema cache.

Yes, I think the main advantage of hooks is the reduction of schema inconsistencies among multiple instances, compared to long TTLs with singleSchemaCache. The read once every few minutes with singleSchemaCache already should avoid a huge data transfer or performance impact, even for multiple instances.

@SebC99
Copy link
Contributor Author

SebC99 commented Jan 5, 2021

I’ve never thought about that but yes I guess it’s kind of an infinite TTL with a better inconsistency management as there can’t be any missed cache invalidation

@dplewis dplewis mentioned this pull request Feb 12, 2021
6 tasks
@dplewis dplewis mentioned this pull request Feb 21, 2021
8 tasks
@dplewis
Copy link
Member

dplewis commented Mar 16, 2021

@SebC99 Thanks again for getting started on this. There were some moments when I wanted to give up.

There is still room for improvement.

Closing via #7214

@dplewis dplewis closed this Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants