-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-Vendor setup extremely slow on loading #2910
Comments
Hi, thanks for the detailed report. I think there are 2 separate issues here:
|
Hi @Swampy469, is the slowiness happening when you log into a admin user who has access to all facets and channels? or all users including users with restricted channels and permissions? |
Hi @snapshotleisure, The slowness occurs in two different situations: During the login process of a SuperAdmin user (as you mentioned). |
Hi @snapshotleisure I'm curious, Did you notice this slowness after migrating to Vendure 2.2.4, or did it develop over time? In this case what was the latest version where it was working fine for you? I'm currently troubleshooting why all my requests hang or take an abnormally long time (or never respond) just after upgrading Vendure from version 2.1.8 to 2.2.7. I'm using Postgres on Google cloud SQL. I don't have much more informations. I'm using around 60 channels and 1'000 products. I cannot reproduce this behavior on development where I have just a few channels and products. I haven't any errors logs or anything for now so it's quite hard to understand what is happening right now. |
Hi @gkielwasser, unfortunately I haven't encountered the problem yet, I havent fully used vendure yet outside of a POC. However it was @Swampy469 who was having the problem, so he might be able to share some insight to what he was experiencing, sorry! |
Hi @gkielwasser, In the early days of using Vendure (version 2.1.4) in our production environment, everything ran smoothly. Even with over 300 channels, the requests responded rapidly as usual. However, we started noticing some slowness after updating to version 2.2.0. Initially, it was just the login requests (only for superadmin user) that experienced delays, taking about 5 to 10 seconds to respond. But since updating to version 2.2.4, overall performance has noticeably declined, as we've reported in this issue. |
@snapshotleisure Indeed I have quote the wrong issue owner, sorry for that. @Swampy469 In my case I have 2 cloud run revisions on the production environment, one is on 2.1.8, one is on 2.2.7. The slowness issue is reproductible. As soon as I will swap to 2.2.7, all requests will hang. Going back to 2.1.8, requests time will returns to "normal" but not immediately. I will investigate between 2.1.4 and 2.2.4 to determine since which version it is happening. |
I managed to reproduce on my development env after migrating my sqlite dev database to Postgres. I will be far easier to solve. The issue start happening specifically on Vendure 2.2.0. It does not happen in Vendure 2.1.9 and it does happen in 2.2.7. Environment: Products number: 500 In my local env the @michaelbromley Do you know what introduced such a behavior in 2.2.0 ? |
@gkielwasser good to hear that you can reproduce this locally. That's a major step towards isolating the cause so we can get it fixed. Vendure v2.2.0 made a significant update to the underlying TypeORM version, so there was quite a lot of work done on the data layer. In most cases that we measured this work actually sped things up, sometimes significantly. But it looks like there are cases where the opposite is true. With the I'll try to reproduce this myself. |
Analyzing the issue with @Swampy469, we noticed CPU usage spikes on the server while the database did not seem to be impacted. This leads me to think that the server might be heavily parsing the data returned by the queries. Could this be the cause? |
@michaelbromley Some more insights. In a development environment with a fresh Vendure 2.2.7 project, initial data populated, and using Postgres, I cannot reproduce any request hang issues. From what I have observed on my side, I would say the issue is linked to Postgres. I was not able to reproduce the issue with the exact same data using SQLite. However, once I migrated the data from SQLite to Postgres using pgloader, while I was able to migrate the db and to reproduce the issue, i'm not sure of the integrity of my new Postgres db. I have the It would be interesting to know if @Swampy469 you can reproduce the issue in your development environment. Also, are you using Postgres in your development setup? Currently, I'm not convinced that the issue is related to the number of channels or products. It could also be also project specific. I will try to streamline my database to validate this, particularly by cleaning up channels. |
I have catched the function taking on its own all that long time
In my case it iterates over 73 channels permissions. |
Great detective work. Yes, I can see that code path is absolutely un-optimized. I'm certain that this execution time can be drastically reduced. In particular this part: vendure/packages/core/src/service/services/role.service.ts Lines 212 to 215 in 27432c0
is being executed for every channel, but only needs to be executed once and then cached. |
Some of my initial assumptions were a bit biased. While debugging the slowness, mainly related to the GetActiveAdministrator function, I discovered that my production package.json was enforcing typeorm 0.3.13 with a resolution rule, which I have now removed. I also updated pg to 8.12.0. Now, in production with Vendure 2.2.7, my requests no longer hang like before. However, I still have this famous request that takes around 1 second. I have 53 channels, so it averages less than 20ms per channel. While this is not critical in my case, it could become problematic with a higher channel count. I'm still pleased to have completely removed SQLite and now have a mirrored Postgres database type on dev & prod. This has been on my roadmap for a long time, and removing SQLite has eliminated a lot of additional work and complexity for no reason. |
Good news on this: I ran some benchmarks locally to measure the performance optimization I mention above. Using the following query to the Admin API (the exact same query run in the Admin UI app): query GetActiveAdministrator {
activeAdministrator {
...Administrator
__typename
}
}
fragment Administrator on Administrator {
id
createdAt
updatedAt
firstName
lastName
emailAddress
user {
id
identifier
lastLogin
roles {
...Role
__typename
}
__typename
}
__typename
}
fragment Role on Role {
id
createdAt
updatedAt
code
description
permissions
channels {
id
code
token
__typename
}
__typename
} Baseline5 channels, unoptimized average response time: 85ms Many channels505 channels, unoptimized average response time: 2,905ms Optimized505 channels, optimized per the commit above average response time: 106ms So that's a speedup of around 275x for ~500 channels. |
Describe the bug
We are experiencing performance issues on our production (MultiVendor setup) environment. The main slowdowns occur in:
We believe the cause of the slowness is due to having around 400 channels (one for each seller) in production, whereas our test environment has only 20 channels and is very fast.
Regarding the login process, the slowness seems to be caused by loading the list of all channels. As for the facets, we are almost certain the issue is related to some interaction with the channels that somehow burdens the server.
The Vendure instance is hosted on an AWS EC2, with a very high CPU (C5.2xlarge). However, we are experiencing CPU usage spikes at 100% when these operations are performed. We have not noticed any slowdowns on the RDS Postgres (in general) or Elasticsearch where the products are loaded.
From your performance tests, we understand they were conducted with about 22 channels. Could you verify with a multi-vendor configuration and around 400 channels? We can provide all the details related to our data and environment if necessary.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The login process should be very fast.
Loading the facets should be quick within the facets page.
Loading the facets should be fast in the facet selection modal on the product page.
Environment (please complete the following information):
Additional context
The affected slow queries are:
activeAdministrator ~ 11 seconds to load
facets ~ 7 seconds
faceetValues ~ 12 seconds
We think the issue is inside the ListQueryBuilder and the way he handle the relationship between Vendure Entities, in particular the method build.
The text was updated successfully, but these errors were encountered: