Ignoring all corrupt opportunities to show #2995

Jignesh-dimagi · 2025-03-20T14:49:57Z

Product Description

No change in UI as such

Technical Summary

Part of https://dimagi.atlassian.net/browse/CCCT-825

Feature Flag

Bug fixes

Safety Assurance

Safety story

Tested on prod server

coderabbitai · 2025-03-20T14:51:34Z

📝 Walkthrough

Walkthrough

The changes enhance the error handling within the processSuccess method of the ConnectJobsListsFragment class. A new inner try-catch block is introduced to individually handle JSON parsing errors (namely JSONException and ParseException) for each JSON object parsed from the response. This modification ensures that when a parsing error occurs for one object, it is caught and logged, allowing the iteration over the remaining objects to continue uninterrupted. Additionally, the outer catch block has been updated to no longer catch ParseException and provides an updated log message indicating that errors may now encompass parsing and database operations.

Sequence Diagram(s)

sequenceDiagram
    participant PS as processSuccess
    participant JI as JSON Iterator
    participant JP as JSON Parser
    participant EL as Error Logger
    participant OC as Outer Catch

    PS->>JI: Start iterating over JSON objects
    loop For each JSON object
        JI->>JP: Attempt to parse JSON object
        alt Parsing succeeds
            JP-->>JI: Parsed object returned
        else Parsing fails (JSONException/ParseException)
            JP--xJI: Exception thrown
            JI->>EL: Log individual parsing error
        end
    end
    alt Global error occurs outside iteration
        PS->>OC: Exception caught in outer catch
        OC-->>PS: Log general error (parsing or database)
    end

Suggested labels

skip-integration-tests

Suggested reviewers

avazirna
OrangeAndGreen
pm-dimagi

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java (2)
140-140: Consider removing or updating the TODO comment.

Since you've implemented a solution that keeps the outer try-catch but adds inner try-catch blocks for individual records, the TODO comment about possibly removing the try-catch is no longer relevant and might be confusing for future developers.
-                //TODO: Sounds like we don't want a try-catch here, better to crash. Verify before changing
+                // We maintain the outer try-catch for JSON array parsing while handling individual record parse errors separately
158-159: Consider adding error count tracking.

Since you're now handling individual record parsing errors, it might be useful to track how many records failed to parse and log this information or consider exposing it to the user if the failure rate is significant.
                        //Store retrieved jobs
                        totalJobs = jobs.size();
+                       int failedParseCount = json.length() - jobs.size();
+                       if (failedParseCount > 0) {
+                           Logger.log("ERROR", "Failed to parse " + failedParseCount + " out of " + json.length() + " opportunities");
+                       }
                        newJobs = ConnectDatabaseHelper.storeJobs(getContext(), jobs, true);
                        setJobListData(jobs);

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d887c2 and 4a40d19.

📒 Files selected for processing (1)

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java (1 hunks)

🧰 Additional context used

🧬 Code Definitions (1)

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java (1)

app/src/org/commcare/connect/ConnectDatabaseHelper.java (1) (1)

ConnectDatabaseHelper (56-1024)

🔇 Additional comments (2)

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java (2)

149-154: Good enhancement to error isolation when parsing job opportunities.

Adding individual try-catch blocks around each JSON object processing is a solid approach. This ensures that a single corrupt record won't prevent the entire batch from being processed, improving the application's resilience.

162-163: Appropriate update to exception handling and logging message.

The updated catch block and error message better reflects the potential sources of errors - either in parsing the overall JSON array or in database operations. Removing ParseException here makes sense as it's now handled at the individual record level.

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java

shubham1g5

@Jignesh-dimagi Our strategy in cases like this to have been crash hard and get to know about the errors as soon as possible rather than have them lingering into the codebase and silently affecting users. This is also because we don't always have a great connect with users and struggle to hear about issues happening out into the field until late.

I do see the tradeoff in why sometimes not crashing the app and handling gracefully seems like a more convenient/better UX but with our user demographics, we would rather crash hard and fix our code sooner than have lingering bugs in our app.

That said we are not rigid and will be curious to hear your thoughts here as well.

Jignesh-dimagi · 2025-03-21T06:39:37Z

@shubham1g5 I do understand that knowing the problem is very important but whenever it effects business, we should not compromise. User have 10 opportunities but due to 1 corrupt opportunity, he/she is not able to use remaining 9 and it effects business.

I also agree with @OrangeAndGreen that we should inform users that one of the opportunity is corrupt and so not showing here. But as discussed with @OrangeAndGreen we will find some better way for such messaging going forward.
As such we already have the event logged here but I guess that's not enough.

I am open to your suggestions.

shubham1g5 · 2025-03-21T07:03:45Z

I do understand that knowing the problem is very important but whenever it effects business, we should not compromise. User have 10 opportunities but due to 1 corrupt opportunity, he/she is not able to use remaining 9 and it effects business.

Agree that It does seems intuitive in this case for user to be able to continue with other opportunities but also strongly thing that we would never have found this issue if we were not hard crashing potentially causing a lot more users to risk not getting new opportunities. Also a lot of times catching an exception may introduce some very hard to debug app states which are preferrable to prevent upfront with crashing the app. Plus I think we can fix these errors quite quickly on web side once we get to know them and add better server side validations to protect against more of such failures. As such I think I still prefer crashing and fixing things early specially since we don't have a good error/crash monitoring process at the moment.

Also inviting opinions from @OrangeAndGreen @ctsims on the same.

Jignesh-dimagi · 2025-03-21T08:31:47Z

@shubham1g5 App was not crashing previously also but was just showing blank connect screen.
I do believe that it should be reported through some alert. For the same, can we somehow set up an email notification for such events (e.g. corrupt_opportunity)? In this way, we can create channel for reporting other use cases also. I guess we can use Firebase key events and Google cloud functions to achieve the same. https://firebase.google.com/docs/functions/analytics-events and https://firebase.google.com/docs/functions. Of course its bit of work and might cost extra for cloud but seems worth. Thoughts?

ctsims · 2025-03-25T21:54:48Z

@shubham1g5 Strong agreement on a few points:

If things are malformed on the web side, it's better to fix them on the web side, and we shouldn't try to make the mobile ignore issues that are clearly a sign of misconfiguration
We should be very concerned with fault tolerant processing resulting in invisible errors, and our design philosophy should be failfast whenever practical.
We likely would not have identified the core issue, or identified it as easily, if the app had been more fault tolerant.

One thing that's kind of funny in this case is that we were actually in the worst of all worlds, in that we both were overly fault tolerant (IE: The app didn't crash or show a visible error that would have allowed the user to know something was wrong) and under-tolerant (failure of one model prevented the use of another model uncoupled with it).

There are two other considerations that come to mind for me in how we think about this going forward.

One major thing to highlight is that Opportunities that are shown to a user are independent models defined by different users between different Opportunities. To me that does make how we manage fault tolerance something to consider, because it's weird for, say, LLO A to be actively working with an FLW, and for LLO B to be able to invite them to an Opportunity with a malformed unicode character in the name and have that brick the app.
I don't think we've quite fully addressed how we are going to be adapting protocols here for future looking fault tolerance. IE: What's the sequencing for changing the payload data on the Server side, and how does an old copy of the phone runtime handle the payloads in between.

I think that we should consider a few different approaches in this case, some of which are probably worth doing simultaneously.

We need a generally good answer to how to fail on this page (and within Connect screens backed ~fully by online data) when the envelope itself is bad. Even if we were fault tolerant at a more granular level, if we can't process the response correctly at a full payload level, the current behavior of displaying an empty screen isn't good enough. We need an experience that can't be confused with Bad Network or with a valid empty response.
We should consider having a design pattern / style guidance for when "partial processing tolerance" is acceptable, if ever, and how. In this case, I think it's probably justifiable since the payloads include user-defined data, contain models that are totally independent from each other, and the end user doesn't necessarily even have a feedback loop upstream (say, if the error came from a new invite from a totally new org). If we accept partial processing I think we should have stated guidance on how to handle it. For example, I think silent partial failures are poisonous, but it might be acceptable to include a row in the list that has an error in it to let the user know that something is wrong. (IE: A dummy row inserted for any models that failed to process, telling the user that there was an error loading this opportunity).
No matter what, though, I think it's probably more important to address how we manage signaling back upstream to us. In a busy, dense future, I don't think we should assume that users reporting errors through the LLO (which is how CommCare does most significant mobile error reporting) is practical or likely, so how we can get this kind of error (where we directly in our code detected that there is malformed payload that should never happen) back to us is probably more important that how it appears to a user. I think if the clearest way to guarantee we get that is a hard crash, that's more important than my considerations above. That's a fairly noisy channel today, though, so I think it's worth looking into whether we can start using Crashlytics non-fatal exception logging or something similar that we can set up specific monitoring around. My guess is that Crashlytics is a better starting point than analytics for these kinds of Soft Asserts, but it would need to be assessed whether there's a good system in place there for non-fatal logging.

shubham1g5 · 2025-03-27T09:40:11Z

Thanks @ctsims for the detailed thoughts here.

I don't think we've quite fully addressed how we are going to be adapting protocols here for future looking fault tolerance.

That's a good catch. Something we should try to get more attention to as a followup here.

We need a generally good answer to how to fail on this page (and within Connect screens backed ~fully by online data) when the envelope itself is bad.

We have been changing the current behaviour to hard crash by removing a lot of JsonException catches during the merge work. I think we will continue to assume failing hard and fast is still a generic rule in most scenarios for the merge work.

A dummy row inserted for any models that failed to process, telling the user that there was an error loading this opportunity

+1 to this approach specifically in context of this issue.

how we manage signaling back upstream to us.

Reg. error reporting, We do have a good mechanism for errors to reported on Firebase be it hard crashes or non fatals. The problem there is the amount of non-fatals can be quite high as we get almost most catched exceptions reported over there and unless a particular issue is very frequent, it's unlikely for it to be stand out in that reporting. With an issue rare enough like this, I am quite sure we would not notice it in both hard crash and non fatals reporting unless we try to look for it intentionally in response to a raised bug. But I do think crashing still gives this issue higher probability to get reported sooner from our users given it's a more disrupting experience for the user.

Jignesh-dimagi · 2025-04-01T08:08:34Z

App will show the corrupt opportunities along with other

shubham1g5 · 2025-04-01T12:26:02Z

app/src/org/commcare/adapters/JobListConnectHomeAppsAdapter.java

+        if(viewType==NON_CORRUPT_JOB_VIEW){
+            ItemLoginConnectHomeAppsBinding binding = ItemLoginConnectHomeAppsBinding.inflate(
+                    LayoutInflater.from(parent.getContext()), parent, false);
+            return new NonCorruptJobViewHolder(binding);
+        }else{
+            ItemLoginConnectHomeCorruptAppsBinding binding = ItemLoginConnectHomeCorruptAppsBinding.inflate(
+                    LayoutInflater.from(parent.getContext()), parent, false);
+            return new CorruptJobViewHolder(binding);
+        }
+


nit: code spacing seems off (format all new code with style file guide)

@shubham1g5 I am already using this style

Think you will need to manually format the new code in Android Studio once you apply that style.

app/src/org/commcare/android/database/connect/models/ConnectJobRecord.java

shubham1g5 · 2025-04-01T12:33:55Z

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java

+                            try {
+                                obj = (JSONObject)json.get(i);
+                                jobs.add(ConnectJobRecord.fromJson(obj));
+                            }catch (JSONException | ParseException e) {


It seems to me that ParseException is getting thrown uncessarily from underlying code and think would be great to be removed.

@shubham1g5 Its required ConnectJobRecord.fromJson is throwing this ParseException

I think we can remove it from there as well. The underlying method is throwing this in method signature without it actually being thrown ever from the code.

@shubham1g5 Underlying method is throwing this error when it tries to parse payment related stuff.

this is what I meant - dba5628

shubham1g5 · 2025-04-01T12:36:56Z

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java

-                    Logger.exception("Parsing return from Opportunities request", e);
+                } catch (IOException | JSONException e) {
+                    Logger.exception("Parsing / database error return from Opportunities request", e);
+                    handleCorruptJobs();


think we need to still crash altogether from here as coming here means that the complete response/Json is incorrect and user won't be able to see any opportunitities in this case.

@shubham1g5 done, app will crash after showing message to user

I think we want to crash instantly without doing any special handling for exception here. IMO there is not any solid benefit of user seeing a page for sometime that will crash in 3 seconds to do this special handling.

Also we would always want to crash in these cases even in future as user can't meanigfully proceed in the app workflow once they are on this page with no opportunities.

@shubham1g5 We are not handling this scenario gracefully as of now so I believe lets at least show user some message before crashing so that they are aware what is happening with their account. That's my view point.

Getting back to the discussion we had I think we still prefer crashing instead of any graceful handling in scenarios where user can't do anything else even if we handle them gracefully. Crashing already gives a strong and self sufficent signal to the user that something is wrong with the app and I am wary of introducing timed exceptions at various places to the code if they don't have a value addition.

@shubham1g5 App is crashing and not handling gracefully. handleCorruptJobs is crashing it. Its just showing message to user before crashing. I can make it immediate crash if you still insist.

Thanks, I do think we should crash immediately as there is not much value added by user messaging just before crash, in both cases we are conveying the same thing that we are unexpectedly closing.

…tunities' into jignesh/fix/ignore_corrupt_opportunities # Conflicts: # app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java

shubham1g5 · 2025-04-03T05:19:51Z

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java

+                } catch (IOException | JSONException e) {
+                    Logger.exception("Parsing / database error return from Opportunities request", e);
+                    throw new RuntimeException(e);


I think we only want to crash for JSONException and not IOException here. IOException doesn't necessarily mean a code error and can happen due to reasons outside the control of our app.

@shubham1g5 You are right for other cases but here IOException is raised if any error while reading the network response so I think we should include it.

I think that signifies an error in network stream which should not be related to our application logic.

shubham1g5 · 2025-04-04T10:29:26Z

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java


    public void refreshData() {
+        corruptJobs.clear();
        ApiConnect.getConnectOpportunities(getContext(), new IApiCallback() {


Can we call retrieveOpportunities directly here, seems like a lot of code duplication here to me.

@shubham1g5 Sorry I didn't get you. Which retrieveOpportunities are you referring?

ConnectUnlockFragment.retrieveOpportunities

They both are in different fragment so not sure if we can use that. Already code is separate originally.

shubham1g5 · 2025-04-04T10:30:03Z

app/src/org/commcare/fragments/connect/ConnectUnlockFragment.java

+                                ConnectJobRecord job = ConnectJobRecord.fromJson(obj);
+                                jobs.add(job);
+                            }catch (JSONException  e) {
+                                Logger.exception("Parsing return from Opportunities request", e);


does this mean we would swallow exception without showing any errors to the user ?

@shubham1g5 Here, there is no UI to display and code is just retrieving opportunities only to store locally. Also, local store should only have good opportunities and not corrupt ones.

Agree on DB containing only good opportunities. I am trying to think if it's possible for user to not see an error on jobs list page when there is a corrupt opportunity in the response here ? For eg. it would be confusing if the corrupt opportunity shows only some times on UI and not all times due to us reusing the stored jobs list from DB that doesn't have corrupt opportunities.

In both cases, DB has good jobs only. ConnectUnlockFragment is not showing any UI for opportunities.

shubham1g5

Offlined and approving based on flowwing discussion -

We can't show UI indicators for background tasks, so it's ok for user to just see the error when user navigates to the job list page itself.
we are crashing even IOException here as it's a result of bad response characters instead of network errors.

Ignoring all corrupt opportunities to show

4a40d19

Jignesh-dimagi requested a review from OrangeAndGreen March 20, 2025 14:49

Jignesh-dimagi self-assigned this Mar 20, 2025

coderabbitai bot reviewed Mar 20, 2025

View reviewed changes

OrangeAndGreen requested changes Mar 20, 2025

View reviewed changes

app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java Show resolved Hide resolved

shubham1g5 reviewed Mar 20, 2025

View reviewed changes

added corrupt opportunities in the list

ea89121

shubham1g5 requested changes Apr 1, 2025

View reviewed changes

Jignesh-dimagi and others added 6 commits April 2, 2025 13:13

App crash if failed to load full opportunities envelope

e341068

remove un-necessary parseException as it's not getting thrown anywhere

dba5628

Crashing application when opportunity envelope is corrupt

d3de1e2

Merge remote-tracking branch 'origin/jignesh/fix/ignore_corrupt_oppor…

282cdb9

…tunities' into jignesh/fix/ignore_corrupt_opportunities # Conflicts: # app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java

Merge branch 'connect_qa' into jignesh/fix/ignore_corrupt_opportunities

e764235

Corrected line spacing

7d90f3d

shubham1g5 reviewed Apr 4, 2025

View reviewed changes

shubham1g5 approved these changes Apr 7, 2025

View reviewed changes

OrangeAndGreen approved these changes Apr 7, 2025

View reviewed changes

Jignesh-dimagi merged commit 52c7e4a into connect_qa Apr 7, 2025
1 of 2 checks passed

OrangeAndGreen deleted the jignesh/fix/ignore_corrupt_opportunities branch April 7, 2025 15:27

coderabbitai bot mentioned this pull request Jun 5, 2025

Improved Connect exception handling #3158

Merged

Uh oh!

Ignoring all corrupt opportunities to show #2995

Ignoring all corrupt opportunities to show #2995

Uh oh!

Conversation

Jignesh-dimagi commented Mar 20, 2025

Product Description

Technical Summary

Feature Flag

Safety Assurance

Safety story

Uh oh!

coderabbitai bot commented Mar 20, 2025

Walkthrough

Sequence Diagram(s)

Suggested labels

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shubham1g5 left a comment

Choose a reason for hiding this comment

Uh oh!

Jignesh-dimagi commented Mar 21, 2025

Uh oh!

shubham1g5 commented Mar 21, 2025

Uh oh!

Jignesh-dimagi commented Mar 21, 2025

Uh oh!

ctsims commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shubham1g5 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jignesh-dimagi commented Apr 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jignesh-dimagi Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shubham1g5 Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shubham1g5 Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

CodeRabbit Configuration File (`.coderabbit.yaml`)

ctsims commented Mar 25, 2025 •

edited

Loading

shubham1g5 commented Mar 27, 2025 •

edited

Loading

Jignesh-dimagi Apr 2, 2025 •

edited

Loading

shubham1g5 Apr 2, 2025 •

edited

Loading

shubham1g5 Apr 2, 2025 •

edited

Loading

shubham1g5 Apr 4, 2025 •

edited

Loading