-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Locking to TableInfo
to Support Mutating a DataFrame from Multiple Threads
#427
Add Locking to TableInfo
to Support Mutating a DataFrame from Multiple Threads
#427
Conversation
…ainst the classification labels
deserialize stage to slice up the input and trigger the bug
…us into david-add-scores-bug
…etection stage tests, also fix multi-segment tests
Moving to 23.01 since the majority of the multi-threaded issues have been addressed. |
|
||
// return py_table; | ||
return self.get_py_table(); | ||
return self.get_info().copy_to_py_object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we make this change, we should also make a copy in the Python impl of this class.
Otherwise we will get different behaviors in Python stages when C++ execution is enabled/disabled.
Closing in favor of #586 |
* Builds on changes in #427 * Adds a `PreallocatorMixin` which when added to a stage performs pre-allocation. This should be added to the first stage in a pipeline which emits a DataFrame or MessageMeta in a pipeline. * Morpheus' TypeId enum exposed to the Python API, allowing stages to define types for columns needing pre-allocation * `MutableTableInfo` exposed to Python via a context manager to be used in `with` blocks * `type_util` (`Dtype`) and `type_util_detail` (`DataType`) merged into a new compilation unit `dtype` fixes #490 fixes #456 Authors: - David Gardner (https://github.com/dagardner-nv) - Michael Demoret (https://github.com/mdemoret-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: #586
* Builds on changes in nv-morpheus#427 * Adds a `PreallocatorMixin` which when added to a stage performs pre-allocation. This should be added to the first stage in a pipeline which emits a DataFrame or MessageMeta in a pipeline. * Morpheus' TypeId enum exposed to the Python API, allowing stages to define types for columns needing pre-allocation * `MutableTableInfo` exposed to Python via a context manager to be used in `with` blocks * `type_util` (`Dtype`) and `type_util_detail` (`DataType`) merged into a new compilation unit `dtype` fixes nv-morpheus#490 fixes nv-morpheus#456 Authors: - David Gardner (https://github.com/dagardner-nv) - Michael Demoret (https://github.com/mdemoret-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: nv-morpheus#586
If you run multiple threads with the
AddScores
orAddClassification
stages, they may try to append a new column to the dataframe. In this situation, you can have race conditions where some threads are using the dataframe object at the same time you are adding/removing columns.To fix this, this PR separates the
TableInfo
class into two classes:TableInfo
2. Allow for read operations to happen in parallel
MutableTableInfo
4. Requires exclusive access to the underlying dataframe before it can be constructed. Allows for add/remove operations on columns
The effect is that holding an instance of
TableInfo
will ensure that the column types and count do not change for the lifetime of that object. Holding an instance ofMutableTableInfo
ensures that you have exclusive access to the dataframe and can mutate it.Outstanding questions:
as_py_object()
method onTableInfo
? This should probably make a copy.