Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Locking to TableInfo to Support Mutating a DataFrame from Multiple Threads #427

Closed

Conversation

mdemoret-nv
Copy link
Contributor

If you run multiple threads with the AddScores or AddClassification stages, they may try to append a new column to the dataframe. In this situation, you can have race conditions where some threads are using the dataframe object at the same time you are adding/removing columns.

To fix this, this PR separates the TableInfo class into two classes:

  1. TableInfo
    2. Allow for read operations to happen in parallel
  2. MutableTableInfo
    4. Requires exclusive access to the underlying dataframe before it can be constructed. Allows for add/remove operations on columns

The effect is that holding an instance of TableInfo will ensure that the column types and count do not change for the lifetime of that object. Holding an instance of MutableTableInfo ensures that you have exclusive access to the dataframe and can mutate it.

Outstanding questions:

  • How do we handle the as_py_object() method on TableInfo? This should probably make a copy.
  • How do we properly test concurrent access?

@mdemoret-nv mdemoret-nv marked this pull request as ready for review October 31, 2022 22:35
@mdemoret-nv mdemoret-nv requested a review from a team as a code owner October 31, 2022 22:35
@mdemoret-nv mdemoret-nv added breaking Breaking change feature request New feature or request labels Nov 2, 2022
@mdemoret-nv
Copy link
Contributor Author

Moving to 23.01 since the majority of the multi-threaded issues have been addressed.


// return py_table;
return self.get_py_table();
return self.get_info().copy_to_py_object();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make this change, we should also make a copy in the Python impl of this class.

Otherwise we will get different behaviors in Python stages when C++ execution is enabled/disabled.

@mdemoret-nv
Copy link
Contributor Author

Closing in favor of #586

@mdemoret-nv mdemoret-nv closed this Jan 9, 2023
ghost pushed a commit that referenced this pull request Feb 6, 2023
* Builds on changes in #427
* Adds a `PreallocatorMixin` which when added to a stage performs pre-allocation. This should be added to the first stage in a pipeline which emits a DataFrame or MessageMeta in a pipeline.
* Morpheus' TypeId enum exposed to the Python API, allowing stages to define types for columns needing pre-allocation
* `MutableTableInfo` exposed to Python via a context manager to be used in `with` blocks
* `type_util` (`Dtype`) and `type_util_detail` (`DataType`) merged into a new compilation unit `dtype` fixes #490 


fixes #456

Authors:
  - David Gardner (https://github.com/dagardner-nv)
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #586
jjacobelli pushed a commit to jjacobelli/Morpheus that referenced this pull request Mar 7, 2023
* Builds on changes in nv-morpheus#427
* Adds a `PreallocatorMixin` which when added to a stage performs pre-allocation. This should be added to the first stage in a pipeline which emits a DataFrame or MessageMeta in a pipeline.
* Morpheus' TypeId enum exposed to the Python API, allowing stages to define types for columns needing pre-allocation
* `MutableTableInfo` exposed to Python via a context manager to be used in `with` blocks
* `type_util` (`Dtype`) and `type_util_detail` (`DataType`) merged into a new compilation unit `dtype` fixes nv-morpheus#490 


fixes nv-morpheus#456

Authors:
  - David Gardner (https://github.com/dagardner-nv)
  - Michael Demoret (https://github.com/mdemoret-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: nv-morpheus#586
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Breaking change feature request New feature or request
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants