Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for the IllegalArgumentException required value was null crash #13191

Merged

Conversation

kidinov
Copy link
Contributor

@kidinov kidinov commented Dec 23, 2024

Closes: #13178

Description

There has been a fundamentally flawed approach regarding the manner in which an order is fetched and accessed. The Order is retrieved asynchronously, while they can be accessed in anytime, without waiting if it's retireved or. This discrepancy may, in certain instances, lead to accessing fields before they have been initialized. Furthermore, there are instances where view model methods directly return data to the fragment, complicating the resolution of these issues.

The PR makes accessing the order via the suspend function, so if the order has not yet been retrieved, then the caller side will be waiting for it.
As mentioned above the whole structure of the fragment and VM brings challenges and without deep refactoring the solution also brings downsides, like using lifecycle scope to access things in VM

I am frankly not sure if the solution is better than the problem as this probably also may create another unforeseen issue. @AnirudhBhat @samiuelson wdyt?

Steps to reproduce

I don't know how to reproduce the crash, it's race condition

Testing information

Try to validate that order details functionality is not broken as much as possible

The tests that have been performed

Above

Images/gif

  • I have considered if this change warrants release notes and have added them to RELEASE-NOTES.txt if necessary. Use the "[Internal]" label for non-user-facing changes.

Reviewer (or Author, in the case of optional code reviews):

Please make sure these conditions are met before approving the PR, or request changes if the PR needs improvement:

  • The PR is small and has a clear, single focus, or a valid explanation is provided in the description. If needed, please request to split it into smaller PRs.
  • Ensure Adequate Unit Test Coverage: The changes are reasonably covered by unit tests or an explanation is provided in the PR description.
  • Manual Testing: The author listed all the tests they ran, including smoke tests when needed (e.g., for refactorings). The reviewer confirmed that the PR works as expected on big (tablet) and small (phone) in case of UI changes, and no regressions are added.

@kidinov kidinov requested a review from AnirudhBhat December 23, 2024 13:43
@kidinov kidinov linked an issue Dec 23, 2024 that may be closed by this pull request
@kidinov kidinov requested a review from samiuelson December 23, 2024 13:43
@kidinov kidinov added the type: crash The worst kind of bug. label Dec 23, 2024
@kidinov kidinov added this to the 21.4 milestone Dec 23, 2024
@kidinov kidinov changed the title Fix for the illegalargumentexception required value was null crash Fix for the IllegalArgumentException required value was null crash Dec 23, 2024
@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Dec 23, 2024

📲 You can test the changes from this Pull Request in WooCommerce-Wear Android by scanning the QR code below to install the corresponding build.
App Name WooCommerce-Wear Android
Platform⌚️ Wear OS
FlavorJalapeno
Build TypeDebug
Commitb97b210
Direct Downloadwoocommerce-wear-prototype-build-pr13191-b97b210.apk

@wpmobilebot
Copy link
Collaborator

wpmobilebot commented Dec 23, 2024

📲 You can test the changes from this Pull Request in WooCommerce Android by scanning the QR code below to install the corresponding build.

App Name WooCommerce Android
Platform📱 Mobile
FlavorJalapeno
Build TypeDebug
Commitb97b210
Direct Downloadwoocommerce-prototype-build-pr13191-b97b210.apk

@codecov-commenter
Copy link

codecov-commenter commented Dec 23, 2024

Codecov Report

Attention: Patch coverage is 59.39850% with 54 lines in your changes missing coverage. Please review.

Project coverage is 40.64%. Comparing base (9a2b63d) to head (b97b210).
Report is 54 commits behind head on trunk.

Files with missing lines Patch % Lines
.../android/ui/orders/details/OrderDetailViewModel.kt 59.39% 44 Missing and 10 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##              trunk   #13191      +/-   ##
============================================
+ Coverage     40.62%   40.64%   +0.01%     
+ Complexity     6372     6368       -4     
============================================
  Files          1345     1345              
  Lines         77271    77322      +51     
  Branches      10608    10612       +4     
============================================
+ Hits          31395    31429      +34     
- Misses        43115    43130      +15     
- Partials       2761     2763       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@AnirudhBhat AnirudhBhat self-assigned this Dec 25, 2024
Copy link
Contributor

@AnirudhBhat AnirudhBhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement @kidinov. I tested the order detail screen and couldn't reproduce any crash either.

Here are my observations regarding the new approach:

  1. If _order starts with null, consumers relying on awaitOrder will suspend until a valid value is provided. While this behaviour is intentional, we should ensure that all paths calling awaitOrder account for this suspension and avoid deadlocks or unexpected delays.

What do you think about doing something like this? Is it worth it?

suspend fun awaitOrderWithTimeout(timeoutMillis: Long): Order {
    return withTimeout(timeoutMillis) { _order.filterNotNull().first() }
}
  1. The viewState update logic in updateOrder is decoupled from _order. If _order is updated outside updateOrder (e.g., directly emitting a new value), viewState might become inconsistent with the _order state.

We have already taken care of this by making _order private and just exposing updateOrder method to outside class. However, this still doesn't prevent from someone using _order directly inside the view model itself. Probably, commenting on the updateOrder method to just use that method to update the order and nothing else would help create awareness?

  1. Use a synchronized block or atomic operation to ensure _order and viewState are updated atomically?
@Synchronized
fun updateOrder(newOrder: Order) {
...
}
  1. If multiple coroutines simultaneously call awaitOrder before an Order is available, they will all suspend. Once an Order is emitted, they will resume at the same time. Right?

We could limit simultaneous collectors by using a mechanism like StateIn to share the state between multiple coroutines and avoid redundant suspensions:

private val _orderFlow = MutableStateFlow<Order?>(null)

val orderFlow: StateFlow<Order?> = _orderFlow.filterNotNull()
    .stateIn(
        scope = viewModelScope,
        started = SharingStarted.WhileSubscribed(5000L),
        initialValue = null
    )

fun updateOrder(newOrder: Order) {
    _orderFlow.value = newOrder
}

All these are just my observations and I'm mostly interested in hearing your thoughts more than any changes to the codebase.

@@ -939,7 +939,7 @@ class OrderDetailViewModelTest : BaseUnitTest() {
if (it is ShowSnackbar) snackbar = it
}

viewModel.order = order
viewModel.updateOrder(order)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ What do you think about writing test that ensures both _order and viewState are updated when we call updateOrder method?

Something like below


@Test
fun `updateOrder updates both _order and viewState correctly`() {
    val newOrder = Order(id = 1, ...)
    viewModel.updateOrder(newOrder)

    assertEquals(newOrder, viewModel.order.value)
    assertEquals(newOrder, viewModel.viewState.orderInfo?.order)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly on this case but I added a test on the update logic

e0da78e

@kidinov
Copy link
Contributor Author

kidinov commented Dec 25, 2024

Hey @AnirudhBhat. Thanks for the review valid points

If _order starts with null, consumers relying on awaitOrder will suspend until a valid value is provided. While this behaviour is intentional, we should ensure that all paths calling awaitOrder account for this suspension and avoid deadlocks or unexpected delays.

Could you please explain a bit on this idea? My understanding is that after the timeout is gone, we will just crash with TimeoutCancellationException. Is this to make sure that this case won't go unnoticed?

We have already taken care of this by making _order private and just exposing updateOrder method to outside class. However, this still doesn't prevent from someone using _order directly inside the view model itself. Probably, commenting on the updateOrder method to just use that method to update the order and nothing else would help create awareness?

Good point! I thought about the same problem. Changed the approach to make it impossible to update the order without updating the state

e0da78e

Use a synchronized block or atomic operation to ensure _order and viewState are updated atomically?

That method is gone. Now, it's not transactional (is this what you mean by atomically?), but I don't think that this is a problem, or its? Wdyt?

If multiple coroutines simultaneously call awaitOrder before an Order is available, they will all suspend. Once an Order is emitted, they will resume at the same time. Right?
Yep, it should

We could limit simultaneous collectors by using a mechanism like StateIn to share the state between multiple coroutines and avoid redundant suspensions:

Sorry, I’m not following this one. Could you please explain it a bit more? How can we avoid redundant suspensions? If no order has been fetched yet, then everyone who accesses that order field must wait (suspend) until it's available. How does the flow helps with redundancy?

@AnirudhBhat
Copy link
Contributor

Could you please explain a bit on this idea? My understanding is that after the timeout is gone, we will just crash with TimeoutCancellationException. Is this to make sure that this case won't go unnoticed?

Yes, you’re absolutely correct. The idea behind using withTimeout is to avoid indefinite suspension in cases where no Order is ever emitted. A TimeoutCancellationException ensures that such scenarios are detected and handled explicitly rather than leading to silent deadlocks. For example:

If a bug or unexpected behaviour prevents an Order from being emitted, the timeout provides a fallback mechanism.
Consumers of awaitOrderWithTimeout can catch the exception and implement fallback logic, such as showing an error message or retrying the operation.

Good point! I thought about the same problem. Changed the approach to make it impossible to update the order without updating the state

Awesome. Thanks!

That method is gone. Now, it's not transactional (is this what you mean by atomically?), but I don't think that this is a problem, or its? Wdyt?

Yes. By “atomic,” I meant ensuring that _order and viewState are updated together as a single, consistent operation.

Sorry, I’m not following this one. Could you please explain it a bit more? How can we avoid redundant suspensions? If no order has been fetched yet, then everyone who accesses that order field must wait (suspend) until it's available. How does the flow helps with redundancy?

The idea behind using stateIn is not to eliminate suspension entirely but to reduce redundant upstream computations and ensure that all collectors share the same state. However, you’re right that if no value has been emitted yet, all collectors must suspend until a value is available. I just realised that we are not doing any expensive calculations upstream. In that case, We are good here.

@dangermattic
Copy link
Collaborator

1 Warning
⚠️ This PR is larger than 300 lines of changes. Please consider splitting it into smaller PRs for easier and faster reviews.

Generated by 🚫 Danger

@kidinov
Copy link
Contributor Author

kidinov commented Dec 26, 2024

@AnirudhBhat

Yes, you’re absolutely correct. The idea behind using withTimeout is to avoid indefinite suspension in cases where no Order is ever emitted. A TimeoutCancellationException ensures that such scenarios are detected and handled explicitly rather than leading to silent deadlocks. For example:

If a bug or unexpected behaviour prevents an Order from being emitted, the timeout provides a fallback mechanism.
Consumers of awaitOrderWithTimeout can catch the exception and implement fallback logic, such as showing an error message or retrying the operation.

Yeah, good call. I was thinking about what timeout it should be, and end up with three seconds. As per the current code, it looks like we even fetched the order from the server if it's locally not stored, but before the change, that would always be a crash, as clearly the order field would be accessed by then. Now, it's not the case, but I guess even 3 seconds of showing nothing on the screen maybe too much and we should crash. If this happens we will fix this behavior. wdyt? Maybe set the timeout even lower?

@AnirudhBhat
Copy link
Contributor

Yeah, good call. I was thinking about what timeout it should be, and end up with three seconds. As per the current code, it looks like we even fetched the order from the server if it's locally not stored, but before the change, that would always be a crash, as clearly the order field would be accessed by then. Now, it's not the case, but I guess even 3 seconds of showing nothing on the screen maybe too much and we should crash. If this happens we will fix this behavior. wdyt? Maybe set the timeout even lower?

A 3-second timeout seems reasonable in most cases. If we set a short timeout (e.g., 1-2 seconds) and the user is on a slow or unstable connection, it might prematurely lead to a crash or fallback behaviour. In such cases:

  1. A timeout that's too short could penalise users unnecessarily, especially when the fetch might succeed given slightly more time.
  2. Crashing after a short timeout may obscure real-world performance issues rather than address them effectively.

I guess 3 second timeout should cover best and worst (slow internet) cases? WDYT?

@@ -993,6 +992,13 @@ class OrderDetailViewModel @Inject constructor(
}
}

suspend fun awaitOrder(): Order =
withTimeout(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Do you think its worth writing test to ensure the logic crash after 3 seconds of suspension with not result?

@Test
    fun `awaitOrder should throw TimeoutCancellationException after 3 seconds`() = testBlocking {
        val exception = assertThrows(TimeoutCancellationException::class.java) {
            awaitOrder()
        }
}

I realise that there's no point in testing withTimeout function as the Kotlin team has already done it. My thought process is that the test explicitly documents that the function is expected to crash with a TimeoutCancellationException if no value is emitted within the specified time. It helps clarify the intended behaviour of the function to future developers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test here 0cca9c0

@kidinov
Copy link
Contributor Author

kidinov commented Dec 26, 2024

@AnirudhBhat

A 3-second timeout seems reasonable in most cases. If we set a short timeout (e.g., 1-2 seconds) and the user is on a slow or unstable connection, it might prematurely lead to a crash or fallback behaviour. In such cases:

The thing is that remote fetch now causes the crash if I understand correctly, as if the order is not in the VM property by the time everyone accesses it - there is a crash. So, it looks like there is an assumption that the order must be present locally in the DB. BUT At the same time, there is this code

    fun start() {
        if (navArgs.orderId != -1L) {
            launch {
                orderDetailRepository.getOrderById(navArgs.orderId)?.let {
                    order = it
                    displayOrderDetails()
                    fetchOrder(showSkeleton = false)
                } ?: "**fetchOrder**(showSkeleton = true)"
            }
        } else {
            viewState = viewState.copy(isOrderDetailSkeletonShown = true)
        }
    }

Which, iic doesn't make sense, and that should just crash anyway. Maybe this is only for the cases when the view was recreated, and somehow the order is gone from the database but still in the memory in the VM - no idea.

The timeout may not be measured based on the network call duration. If it would, I would put it much bigger than 3s, probably 30s as I believe our network timeout

wdyt?

@AnirudhBhat
Copy link
Contributor

@kidinov Thanks for explaining! I see the point now. In that case, your below suggestion sounds good to me.

Now, it's not the case, but I guess even 3 seconds of showing nothing on the screen maybe too much and we should crash. If this happens we will fix this behaviour. wdyt?

I think 3 seconds works well for now. We can monitor the impact of this change and decide later if further refinements are needed.

@AnirudhBhat AnirudhBhat merged commit 6f99cb6 into trunk Dec 30, 2024
16 of 18 checks passed
@AnirudhBhat AnirudhBhat deleted the 13178-illegalargumentexception-required-value-was-null branch December 30, 2024 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: crash The worst kind of bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IllegalArgumentException: Required value was null.
5 participants