Skip to content

GEN-5150/GEN-5151 sdk #76

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

GEN-5150/GEN-5151 sdk #76

wants to merge 3 commits into from

Conversation

andreatamezm
Copy link
Contributor

@andreatamezm andreatamezm commented May 12, 2025

Intermittent failure: SDK > TimeOfUseServiceTests.addPrivateTouGroupWorksCorrectly

The test itself passes (adding the TOU group works)
The failure occurs during cleanup with a 404 error when trying to delete the TOU group
The resource was getting deleted by another process or there's a race condition

family-main-sdk-lnc5p: addPrivateTouGroupWorksCorrectly(com.genability.client.api.service.TimeOfUseServiceTests)  Time elapsed: 0.06 sec  <<< ERROR!
family-main-sdk-lnc5p: com.genability.client.api.service.GenabilityException: Failed DELETE http://family-main-preview/rest/timeofuses/734/634: HTTP error code : 404

Intermittent failure: SDK > AccountServiceTests.testPaginatedAccountList:139

family-main-sdk-lnc5p: testPaginatedAccountList(com.genability.client.api.service.AccountServiceTests)  Time elapsed: 20.444 sec  <<< ERROR!
family-main-sdk-lnc5p: com.genability.client.api.service.GenabilityException: Failed GET http://family-main-preview/rest/v1/accounts?fields=ext&pageCount=5&search=JAVA+CLIENT+TEST+ACCOUNT&searchOn=accountName: HTTP error code : 500
Screenshot 2025-05-12 at 1 15 36 PM

@andreatamezm andreatamezm changed the title cleanup_fix GEN-5150/GEN-5151 sdk May 12, 2025
@andreatamezm andreatamezm requested a review from a team May 12, 2025 20:16
Copy link
Contributor

@dlopuch dlopuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of tighter scoping / more context on the TimeOfUseServiceTests please.

On the other one, how did you identify it's an indexing concern? We've been tracking an issue with intermittent errors trying to use profiles shortly after creation... that one's looking like replication lag between read-replicas. Might that be also what's going on here?

@@ -213,7 +213,17 @@ public void deletePrivateTouGroupWorksCorrectly() {
}

private void cleanUpPrivateTou(TimeOfUseGroup grp) {
touService.deletePrivateTimeOfUseGroup(grp.getLseId(), grp.getTouGroupId());
try {
// Check if the TOU group has been deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add context to someone reading the code without knowledge of this issue: "Try to delete the private TOU only if it still exists (sometimes it might be already deleted)"

touService.deletePrivateTimeOfUseGroup(grp.getLseId(), grp.getTouGroupId());
}
} catch (GenabilityException e) {
if (!e.getMessage().contains("404")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have two calls, which throws the GenabilityException? Wrap the try/catch around just that one -- you generally want try/catches as tightly scoped as possible.

Also explain why a 404-containing message should be ignored (didn't you already check it exists?)

@@ -132,6 +132,13 @@ public void testPaginatedAccountList() {
}

try {
// Add a delay to allow for indexing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh. How did you figure out it's an indexing concern? (Can you add a comment or note on the PR?)

How confident are you that 2000ms is enough? If it's just a guess / subject to additional tuning, it's helpful for future developers to eg add a comment to the line that errors like "This line fail with xxxException? Consider tuning the sleep above to give more time..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question - I assumed it was indexing or a consistency issue because the test creates accounts and immediately searches for them using a name filter. I added the .sleep(2000) to give mongo time to catch up, which fixed the flakiness locally, but its definitely just a best guess. Its kind of tricky to test this since the failures were intermittent.

// Could be due to indexing delay or read-replica sync delay.
// 2000ms is a best-guess based on some testing; if this test continues to fail, consider increasing this delay.
try {
Thread.sleep(2000);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 seconds is a high value. I think replica lag would be under 100ms typically. I wonder if 500ms or 200ms are better starting values.

try {
touService.deletePrivateTimeOfUseGroup(grp.getLseId(), grp.getTouGroupId());
} catch (GenabilityException e) {
// It's possible the TOU group was deleted after the check but before the delete call.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my concern that we are retrieving/creating TOU grp by generic test name but deleting TOUs by integer ID. This opens us up to concurrency condition with other simultaneous tests. Not sure with github job runner if this a real issue, but why would the tou group be deleted before the clean up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants