-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[PM-21031] Optimize GET Members endpoint performance #5907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PM-21031] Optimize GET Members endpoint performance #5907
Conversation
…Domains that uses CTE for better performance
…_V2 for retrieving user details, group associations, and collection associations by organization ID.
…zationWithClaimedDomainsAsync_vNext in IOrganizationUserRepository to enhance performance by reducing database round trips.
…ery when the feature flag is enabled
…n the feature flag is enabled
…c_vNext to validate behavior with verified and unverified domains.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5907 +/- ##
==========================================
+ Coverage 48.52% 52.05% +3.53%
==========================================
Files 1734 1734
Lines 76764 76932 +168
Branches 6852 6864 +12
==========================================
+ Hits 37246 40044 +2798
+ Misses 38021 35320 -2701
- Partials 1497 1568 +71 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
New Issues (1)Checkmarx found the following issues in this Pull Request
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just had a question around the domain splitting in the CTE
WITH CTE_UserWithDomain AS ( | ||
SELECT | ||
OU.*, | ||
SUBSTRING(U.Email, CHARINDEX('@', U.Email) + 1, LEN(U.Email)) AS EmailDomain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional idea might to be to make this a computed column so we could index against it. Check with @rkac-bw to see if that's necessary at all. We've been making improvements to sprocs for getting the domain out of the email, so maybe it's time to do it. 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So creating a computed column on 12 million rows might have an extended down time during migration, can we not just create a column and have a migration that populates it in batches to reduce locking and then parse in the code to insert domain name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a great idea to me. I'll work on the batch script
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A second thought on this, @withinfocus suggested an indexed view and I tested it out and it is non blocking on creation:
CREATE VIEW dbo.UserEmailDomain
WITH SCHEMABINDING
AS
SELECT
Id,
Email,
SUBSTRING(Email, CHARINDEX('@', Email) + 1, LEN(Email)) AS EmailDomain
FROM dbo.[User]
WHERE Email IS NOT NULL
AND CHARINDEX('@', Email) > 0
CREATE UNIQUE CLUSTERED INDEX IX_UserEmailDomain_Id
ON dbo.UserEmailDomain (Id);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call these "details views" generally and you'll see some around. What's new for us here is being schema-bound, so we should think through how these are applied because the view will need to be dropped before column changes happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkac-bw and @withinfocus - any further thoughts on our direction here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After being out and re-reading the thread, the schema-bound view stands out to me as the cleanest with respect to data presence (and lack of duplication) as well as maintenance. Yes, there's a little more to do when working with one but we'll see impacts and bugs immediately with what we have in CI. Is there something I am missing besides what Thomas stated around its recreation?
I like where the third solution could be headed but that's a significant refactor the team would need to take on -- is that something you want to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The consensus seems to be that the schema-bound view is the simplest and best approach for now.
@eliykat, would you be okay with me going ahead with this for now? We can revisit the idea of the third option as part of a future refactor if it makes sense then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some performance testing and found that rewriting query as CTE, using with recompile (due to small number of large orgs skew) and three non clustered indexes that it out performs the indexed view
Jira Task: PR-5907 Performance Analysis: OrganizationUser Domain Filtering
This will avoid schemabinding and deployment issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless @withinfocus or any one else has comments I think we should go ahead with rewriting query as CTE, using with recompile (due to small number of large orgs skew) and three non clustered indexes and the view without schema binding, let me now if you have questions
var organizationUsersWithClaimedDomain = _featureService.IsEnabled(FeatureFlagKeys.MembersGetEndpointOptimization) | ||
? await _organizationUserRepository.GetManyByOrganizationWithClaimedDomainsAsync_vNext(organizationId) | ||
: await _organizationUserRepository.GetManyByOrganizationWithClaimedDomainsAsync(organizationId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🌱 Run this in parallel and wait for them before L38?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but the organizationAbility
value comes from the cache, so it's quick to access. If if (organizationAbility is { Enabled: true, UseOrganizationDomains: true })
returns false
there's no need to query the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache may have issues for that lookup https://bitwarden.atlassian.net/wiki/x/iYDibg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats an issue on the cache mechanism, right? If so, that seems to be out of scope here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…stom user types in OrganizationUserUserDetailsQuery.
…rformance # Conflicts: # src/Core/Constants.cs
…that uses UserEmailDomainView to fetch Email domains
…pository and its implementations
…tored procedure to use UserEmailDomainView for email domain extraction, improving query efficiency and clarity.
…etManyDetailsByOrganizationAsync method, clarifying its purpose and performance optimizations. Added remarks for better understanding of its functionality.
…Domain_V2.sql to adhere to coding standards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkac-bw please give this a final review.
…use CTE and add indexes
…d related migration script adjustments
I have implemented the suggestions by @rkac-bw above in #5907 (comment) |
…rformance # Conflicts: # src/Core/AdminConsole/Repositories/IOrganizationUserRepository.cs
END | ||
GO | ||
|
||
IF NOT EXISTS(SELECT name FROM sys.indexes WHERE name = 'IX_OrganizationDomain_Org_VerifiedDomain') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check uses wrong name (IX_OrganizationDomain_Org_VerifiedDomain) but creates
IX_OrganizationDomain_OrganizationId_VerifiedDate
- This will create duplicate indexes on every migration run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely spotted! Its fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just clearing my approval here.
|
🎟️ Tracking
https://bitwarden.atlassian.net/browse/PM-21031
📔 Objective
This PR improves the performance of the GET
/organizations/{orgId}/users
endpoint behind theMembersGetEndpointOptimization
feature flag. When enabled, it significantly reduces DB round trips and improves query times.This is the benchmark result from running
OrganizationUsersControllerPerformanceTest
, comparing the current (unoptimized) version with the new (optimized) version using the feature flag:Summary of Changes
Optimized Endpoint Methods
Get_vNext()
andGetAccountRecoveryEnrolledUsers_vNext()
Task.WhenAll()
to run multiple DB queries in parallelRepository Enhancements
GetManyDetailsByOrganizationAsync_vNext()
OrganizationUserUserDetails_ReadByOrganizationId_V2
GetManyByOrganizationWithClaimedDomainsAsync_vNext()
OrganizationUser_ReadByOrganizationIdWithClaimedDomains_V2
Performance Gains
Integration Tests
OrganizationUserRepositoryTests.cs
⏰ Reminders before review
🦮 Reviewer guidelines
:+1:
) or similar for great changes:memo:
) or ℹ️ (:information_source:
) for notes or general info:question:
) for questions:thinking:
) or 💭 (:thought_balloon:
) for more open inquiry that's not quite a confirmed issue and could potentially benefit from discussion:art:
) for suggestions / improvements:x:
) or:warning:
) for more significant problems or concerns needing attention:seedling:
) or ♻️ (:recycle:
) for future improvements or indications of technical debt:pick:
) for minor or nitpick changes