Skip to content

Conversation

@Jignesh-dimagi
Copy link
Contributor

Product Description

No change in UI as such

Technical Summary

Part of https://dimagi.atlassian.net/browse/CCCT-825

Feature Flag

Bug fixes

Safety Assurance

Safety story

Tested on prod server

@coderabbitai
Copy link

coderabbitai bot commented Mar 20, 2025

📝 Walkthrough

Walkthrough

The changes enhance the error handling within the processSuccess method of the ConnectJobsListsFragment class. A new inner try-catch block is introduced to individually handle JSON parsing errors (namely JSONException and ParseException) for each JSON object parsed from the response. This modification ensures that when a parsing error occurs for one object, it is caught and logged, allowing the iteration over the remaining objects to continue uninterrupted. Additionally, the outer catch block has been updated to no longer catch ParseException and provides an updated log message indicating that errors may now encompass parsing and database operations.

Sequence Diagram(s)

sequenceDiagram
    participant PS as processSuccess
    participant JI as JSON Iterator
    participant JP as JSON Parser
    participant EL as Error Logger
    participant OC as Outer Catch

    PS->>JI: Start iterating over JSON objects
    loop For each JSON object
        JI->>JP: Attempt to parse JSON object
        alt Parsing succeeds
            JP-->>JI: Parsed object returned
        else Parsing fails (JSONException/ParseException)
            JP--xJI: Exception thrown
            JI->>EL: Log individual parsing error
        end
    end
    alt Global error occurs outside iteration
        PS->>OC: Exception caught in outer catch
        OC-->>PS: Log general error (parsing or database)
    end
Loading

Suggested labels

skip-integration-tests

Suggested reviewers

  • avazirna
  • OrangeAndGreen
  • pm-dimagi
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java (2)

140-140: Consider removing or updating the TODO comment.

Since you've implemented a solution that keeps the outer try-catch but adds inner try-catch blocks for individual records, the TODO comment about possibly removing the try-catch is no longer relevant and might be confusing for future developers.

-                //TODO: Sounds like we don't want a try-catch here, better to crash. Verify before changing
+                // We maintain the outer try-catch for JSON array parsing while handling individual record parse errors separately

158-159: Consider adding error count tracking.

Since you're now handling individual record parsing errors, it might be useful to track how many records failed to parse and log this information or consider exposing it to the user if the failure rate is significant.

                        //Store retrieved jobs
                        totalJobs = jobs.size();
+                       int failedParseCount = json.length() - jobs.size();
+                       if (failedParseCount > 0) {
+                           Logger.log("ERROR", "Failed to parse " + failedParseCount + " out of " + json.length() + " opportunities");
+                       }
                        newJobs = ConnectDatabaseHelper.storeJobs(getContext(), jobs, true);
                        setJobListData(jobs);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d887c2 and 4a40d19.

📒 Files selected for processing (1)
  • app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java (1 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java (1)
app/src/org/commcare/connect/ConnectDatabaseHelper.java (1) (1)
  • ConnectDatabaseHelper (56-1024)
🔇 Additional comments (2)
app/src/org/commcare/fragments/connect/ConnectJobsListsFragment.java (2)

149-154: Good enhancement to error isolation when parsing job opportunities.

Adding individual try-catch blocks around each JSON object processing is a solid approach. This ensures that a single corrupt record won't prevent the entire batch from being processed, improving the application's resilience.


162-163: Appropriate update to exception handling and logging message.

The updated catch block and error message better reflects the potential sources of errors - either in parsing the overall JSON array or in database operations. Removing ParseException here makes sense as it's now handled at the individual record level.

Copy link
Contributor

@shubham1g5 shubham1g5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jignesh-dimagi Our strategy in cases like this to have been crash hard and get to know about the errors as soon as possible rather than have them lingering into the codebase and silently affecting users. This is also because we don't always have a great connect with users and struggle to hear about issues happening out into the field until late.

I do see the tradeoff in why sometimes not crashing the app and handling gracefully seems like a more convenient/better UX but with our user demographics, we would rather crash hard and fix our code sooner than have lingering bugs in our app.

That said we are not rigid and will be curious to hear your thoughts here as well.

@Jignesh-dimagi
Copy link
Contributor Author

@shubham1g5 I do understand that knowing the problem is very important but whenever it effects business, we should not compromise. User have 10 opportunities but due to 1 corrupt opportunity, he/she is not able to use remaining 9 and it effects business.

I also agree with @OrangeAndGreen that we should inform users that one of the opportunity is corrupt and so not showing here. But as discussed with @OrangeAndGreen we will find some better way for such messaging going forward.
As such we already have the event logged here but I guess that's not enough.

I am open to your suggestions.

@shubham1g5
Copy link
Contributor

I do understand that knowing the problem is very important but whenever it effects business, we should not compromise. User have 10 opportunities but due to 1 corrupt opportunity, he/she is not able to use remaining 9 and it effects business.

Agree that It does seems intuitive in this case for user to be able to continue with other opportunities but also strongly thing that we would never have found this issue if we were not hard crashing potentially causing a lot more users to risk not getting new opportunities. Also a lot of times catching an exception may introduce some very hard to debug app states which are preferrable to prevent upfront with crashing the app. Plus I think we can fix these errors quite quickly on web side once we get to know them and add better server side validations to protect against more of such failures. As such I think I still prefer crashing and fixing things early specially since we don't have a good error/crash monitoring process at the moment.

Also inviting opinions from @OrangeAndGreen @ctsims on the same.

@Jignesh-dimagi
Copy link
Contributor Author

@shubham1g5 App was not crashing previously also but was just showing blank connect screen.
I do believe that it should be reported through some alert. For the same, can we somehow set up an email notification for such events (e.g. corrupt_opportunity)? In this way, we can create channel for reporting other use cases also. I guess we can use Firebase key events and Google cloud functions to achieve the same. https://firebase.google.com/docs/functions/analytics-events and https://firebase.google.com/docs/functions. Of course its bit of work and might cost extra for cloud but seems worth. Thoughts?

@ctsims
Copy link
Member

ctsims commented Mar 25, 2025

@shubham1g5 Strong agreement on a few points:

  1. If things are malformed on the web side, it's better to fix them on the web side, and we shouldn't try to make the mobile ignore issues that are clearly a sign of misconfiguration
  2. We should be very concerned with fault tolerant processing resulting in invisible errors, and our design philosophy should be failfast whenever practical.
  3. We likely would not have identified the core issue, or identified it as easily, if the app had been more fault tolerant.

One thing that's kind of funny in this case is that we were actually in the worst of all worlds, in that we both were overly fault tolerant (IE: The app didn't crash or show a visible error that would have allowed the user to know something was wrong) and under-tolerant (failure of one model prevented the use of another model uncoupled with it).

There are two other considerations that come to mind for me in how we think about this going forward.

  1. One major thing to highlight is that Opportunities that are shown to a user are independent models defined by different users between different Opportunities. To me that does make how we manage fault tolerance something to consider, because it's weird for, say, LLO A to be actively working with an FLW, and for LLO B to be able to invite them to an Opportunity with a malformed unicode character in the name and have that brick the app.
  2. I don't think we've quite fully addressed how we are going to be adapting protocols here for future looking fault tolerance. IE: What's the sequencing for changing the payload data on the Server side, and how does an old copy of the phone runtime handle the payloads in between.

I think that we should consider a few different approaches in this case, some of which are probably worth doing simultaneously.

  1. We need a generally good answer to how to fail on this page (and within Connect screens backed ~fully by online data) when the envelope itself is bad. Even if we were fault tolerant at a more granular level, if we can't process the response correctly at a full payload level, the current behavior of displaying an empty screen isn't good enough. We need an experience that can't be confused with Bad Network or with a valid empty response.
  2. We should consider having a design pattern / style guidance for when "partial processing tolerance" is acceptable, if ever, and how. In this case, I think it's probably justifiable since the payloads include user-defined data, contain models that are totally independent from each other, and the end user doesn't necessarily even have a feedback loop upstream (say, if the error came from a new invite from a totally new org). If we accept partial processing I think we should have stated guidance on how to handle it. For example, I think silent partial failures are poisonous, but it might be acceptable to include a row in the list that has an error in it to let the user know that something is wrong. (IE: A dummy row inserted for any models that failed to process, telling the user that there was an error loading this opportunity).
  3. No matter what, though, I think it's probably more important to address how we manage signaling back upstream to us. In a busy, dense future, I don't think we should assume that users reporting errors through the LLO (which is how CommCare does most significant mobile error reporting) is practical or likely, so how we can get this kind of error (where we directly in our code detected that there is malformed payload that should never happen) back to us is probably more important that how it appears to a user. I think if the clearest way to guarantee we get that is a hard crash, that's more important than my considerations above. That's a fairly noisy channel today, though, so I think it's worth looking into whether we can start using Crashlytics non-fatal exception logging or something similar that we can set up specific monitoring around. My guess is that Crashlytics is a better starting point than analytics for these kinds of Soft Asserts, but it would need to be assessed whether there's a good system in place there for non-fatal logging.

@shubham1g5
Copy link
Contributor

shubham1g5 commented Mar 27, 2025

Thanks @ctsims for the detailed thoughts here.

I don't think we've quite fully addressed how we are going to be adapting protocols here for future looking fault tolerance.

That's a good catch. Something we should try to get more attention to as a followup here.

We need a generally good answer to how to fail on this page (and within Connect screens backed ~fully by online data) when the envelope itself is bad.

We have been changing the current behaviour to hard crash by removing a lot of JsonException catches during the merge work. I think we will continue to assume failing hard and fast is still a generic rule in most scenarios for the merge work.

A dummy row inserted for any models that failed to process, telling the user that there was an error loading this opportunity

+1 to this approach specifically in context of this issue.

how we manage signaling back upstream to us.

Reg. error reporting, We do have a good mechanism for errors to reported on Firebase be it hard crashes or non fatals. The problem there is the amount of non-fatals can be quite high as we get almost most catched exceptions reported over there and unless a particular issue is very frequent, it's unlikely for it to be stand out in that reporting. With an issue rare enough like this, I am quite sure we would not notice it in both hard crash and non fatals reporting unless we try to look for it intentionally in response to a raised bug. But I do think crashing still gives this issue higher probability to get reported sooner from our users given it's a more disrupting experience for the user.

@Jignesh-dimagi
Copy link
Contributor Author

App will show the corrupt opportunities along with other

Failed Opportunity Display

Comment on lines 50 to 59
if(viewType==NON_CORRUPT_JOB_VIEW){
ItemLoginConnectHomeAppsBinding binding = ItemLoginConnectHomeAppsBinding.inflate(
LayoutInflater.from(parent.getContext()), parent, false);
return new NonCorruptJobViewHolder(binding);
}else{
ItemLoginConnectHomeCorruptAppsBinding binding = ItemLoginConnectHomeCorruptAppsBinding.inflate(
LayoutInflater.from(parent.getContext()), parent, false);
return new CorruptJobViewHolder(binding);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: code spacing seems off (format all new code with style file guide)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 I am already using this style

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think you will need to manually format the new code in Android Studio once you apply that style.

try {
obj = (JSONObject)json.get(i);
jobs.add(ConnectJobRecord.fromJson(obj));
}catch (JSONException | ParseException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that ParseException is getting thrown uncessarily from underlying code and think would be great to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 Its required ConnectJobRecord.fromJson is throwing this ParseException

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove it from there as well. The underlying method is throwing this in method signature without it actually being thrown ever from the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 Underlying method is throwing this error when it tries to parse payment related stuff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what I meant - dba5628

Logger.exception("Parsing return from Opportunities request", e);
} catch (IOException | JSONException e) {
Logger.exception("Parsing / database error return from Opportunities request", e);
handleCorruptJobs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we need to still crash altogether from here as coming here means that the complete response/Json is incorrect and user won't be able to see any opportunitities in this case.

Copy link
Contributor Author

@Jignesh-dimagi Jignesh-dimagi Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 done, app will crash after showing message to user

Copy link
Contributor

@shubham1g5 shubham1g5 Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to crash instantly without doing any special handling for exception here. IMO there is not any solid benefit of user seeing a page for sometime that will crash in 3 seconds to do this special handling.

Also we would always want to crash in these cases even in future as user can't meanigfully proceed in the app workflow once they are on this page with no opportunities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 We are not handling this scenario gracefully as of now so I believe lets at least show user some message before crashing so that they are aware what is happening with their account. That's my view point.

Copy link
Contributor

@shubham1g5 shubham1g5 Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting back to the discussion we had I think we still prefer crashing instead of any graceful handling in scenarios where user can't do anything else even if we handle them gracefully. Crashing already gives a strong and self sufficent signal to the user that something is wrong with the app and I am wary of introducing timed exceptions at various places to the code if they don't have a value addition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 App is crashing and not handling gracefully. handleCorruptJobs is crashing it. Its just showing message to user before crashing. I can make it immediate crash if you still insist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I do think we should crash immediately as there is not much value added by user messaging just before crash, in both cases we are conveying the same thing that we are unexpectedly closing.

Comment on lines +167 to +169
} catch (IOException | JSONException e) {
Logger.exception("Parsing / database error return from Opportunities request", e);
throw new RuntimeException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we only want to crash for JSONException and not IOException here. IOException doesn't necessarily mean a code error and can happen due to reasons outside the control of our app.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 You are right for other cases but here IOException is raised if any error while reading the network response so I think we should include it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that signifies an error in network stream which should not be related to our application logic.


public void refreshData() {
corruptJobs.clear();
ApiConnect.getConnectOpportunities(getContext(), new IApiCallback() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call retrieveOpportunities directly here, seems like a lot of code duplication here to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 Sorry I didn't get you. Which retrieveOpportunities are you referring?

Copy link
Contributor

@shubham1g5 shubham1g5 Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectUnlockFragment.retrieveOpportunities

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They both are in different fragment so not sure if we can use that. Already code is separate originally.

ConnectJobRecord job = ConnectJobRecord.fromJson(obj);
jobs.add(job);
}catch (JSONException e) {
Logger.exception("Parsing return from Opportunities request", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean we would swallow exception without showing any errors to the user ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubham1g5 Here, there is no UI to display and code is just retrieving opportunities only to store locally. Also, local store should only have good opportunities and not corrupt ones.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on DB containing only good opportunities. I am trying to think if it's possible for user to not see an error on jobs list page when there is a corrupt opportunity in the response here ? For eg. it would be confusing if the corrupt opportunity shows only some times on UI and not all times due to us reusing the stored jobs list from DB that doesn't have corrupt opportunities.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both cases, DB has good jobs only. ConnectUnlockFragment is not showing any UI for opportunities.

Copy link
Contributor

@shubham1g5 shubham1g5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offlined and approving based on flowwing discussion -

  1. We can't show UI indicators for background tasks, so it's ok for user to just see the error when user navigates to the job list page itself.

  2. we are crashing even IOException here as it's a result of bad response characters instead of network errors.

@Jignesh-dimagi Jignesh-dimagi merged commit 52c7e4a into connect_qa Apr 7, 2025
1 of 2 checks passed
@OrangeAndGreen OrangeAndGreen deleted the jignesh/fix/ignore_corrupt_opportunities branch April 7, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants