Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table locking & column preallocation #586

Merged
152 commits merged into from
Feb 6, 2023
Merged
Show file tree
Hide file tree
Changes from 148 commits
Commits
Show all changes
152 commits
Select commit Hold shift + click to select a range
815ad6a
Add an AddScoresStage to the sid pipeline, and validate the scores ag…
dagardner-nv Oct 27, 2022
3a8109a
Parametarize running the test for both row and column major
dagardner-nv Oct 28, 2022
e24d4bc
Use get_meta method to ensure slixing is applied
dagardner-nv Oct 28, 2022
3761187
Parametarize the pipeline_batch_size
dagardner-nv Oct 28, 2022
feb8589
Perform cuda copy using cudf::type_dispatcher
dagardner-nv Oct 29, 2022
be4a60c
Parametarize values
dagardner-nv Oct 29, 2022
d35343a
Move compare_class_to_scores to utils.py
dagardner-nv Oct 29, 2022
6cc99ab
Add add-scores test
dagardner-nv Oct 29, 2022
f1c2636
Merge branch 'branch-22.11' into david-add-scores-bug
dagardner-nv Oct 29, 2022
ac3fda4
Concatinate the input rather that repeating it, this forces the
dagardner-nv Oct 31, 2022
4be8737
Merge branch 'david-add-scores-bug' of github.com:dagardner-nv/Morphe…
dagardner-nv Oct 31, 2022
eb1dc97
fix tests
dagardner-nv Oct 31, 2022
e1f65e6
Fix type hint on convmesg class
dagardner-nv Oct 31, 2022
81207a3
Fix offset in ConvMdg
dagardner-nv Oct 31, 2022
0edcf0a
Remove unused imports
dagardner-nv Oct 31, 2022
86317f6
Parametrize column ordering, pipeline size, and data size in filter d…
dagardner-nv Oct 31, 2022
a6dae37
Adding fixes for making sliced copies and for calling set_meta() with…
mdemoret-nv Oct 31, 2022
a35743c
FIrst pass at making a TableInfo and MutableTableInfo
mdemoret-nv Oct 31, 2022
c2be38c
Merge branch 'branch-22.11' into mdd_table-info-locking
mdemoret-nv Oct 31, 2022
4843335
Partially working
mdemoret-nv Nov 1, 2022
62f4dd3
All tests passing locally
mdemoret-nv Nov 1, 2022
61e9759
Undoing incorrectly committed changes to test_sid.py
mdemoret-nv Nov 2, 2022
2f2598b
General cleanup
mdemoret-nv Nov 2, 2022
3064543
Forgot to add imports
mdemoret-nv Nov 2, 2022
dcdd653
Fixing file source stage and ensuring the repeat option works correctly
mdemoret-nv Nov 2, 2022
3a2cf11
Fixing imports
mdemoret-nv Nov 2, 2022
a165a92
Merge branch 'branch-22.11' into mdd_table-info-locking
mdemoret-nv Nov 4, 2022
f4e9f98
Merge branch 'branch-22.11' into david-mdd_table-info-locking
dagardner-nv Nov 14, 2022
927f6b8
Remove warning about tests not having a return value, also removes co…
dagardner-nv Nov 14, 2022
a98f415
wip
dagardner-nv Nov 14, 2022
0e091a9
Merge branch 'branch-22.11' into david-mdd_table-info-locking
dagardner-nv Nov 14, 2022
215391f
Add debug logging
dagardner-nv Nov 14, 2022
91393fb
Fixed preallocation in python only exec
dagardner-nv Nov 14, 2022
81e9a30
wip
dagardner-nv Nov 15, 2022
cf9de2e
wip
dagardner-nv Nov 15, 2022
954ddd1
logging
dagardner-nv Nov 15, 2022
6432a09
Fix handling of bool string '?' as '?1' isn't supported by numpy
dagardner-nv Nov 15, 2022
7c7d28c
wip
dagardner-nv Nov 15, 2022
994f233
Clean up includes
dagardner-nv Nov 15, 2022
5e369c7
Handle multi messages
dagardner-nv Nov 15, 2022
afcabae
Use an ordered dict
dagardner-nv Nov 15, 2022
6052d41
Use an ordered dict and remove debug logs
dagardner-nv Nov 15, 2022
3a957f2
PreallocateStage is now a template class allowing support for both Me…
dagardner-nv Nov 15, 2022
95ecb35
Add some doc strings
dagardner-nv Nov 15, 2022
21cc146
Perform column pre-allocation on a per-segment basis
dagardner-nv Nov 15, 2022
f011732
Change type string to one supported by Morpheus' DType class
dagardner-nv Nov 15, 2022
39a369f
Update ConvMsg to set a specific type for probs
dagardner-nv Nov 15, 2022
601a17c
Merge branch 'branch-22.11' into david-mdd_table-info-locking
dagardner-nv Nov 15, 2022
4bb82db
Merge branch 'branch-22.11' into mdd_table-info-locking
mdemoret-nv Nov 15, 2022
411a91c
Move 'Building Segment' logs to the inner_build so that they corelate…
dagardner-nv Nov 15, 2022
ebdbf01
Use the C++ impl of Preallocate if we can even if the souce stage doe…
dagardner-nv Nov 15, 2022
028d2ff
Merge branch 'mdd_table-info-locking' into david-mdd_table-info-locking
dagardner-nv Nov 15, 2022
caba4a2
Add docstring for probs_type
dagardner-nv Nov 15, 2022
dc5c973
Remove wip prints
dagardner-nv Nov 15, 2022
bccfb43
Fixing merge issue
mdemoret-nv Nov 16, 2022
ddf5e6a
Return the boundary egress/ingress nodes that were created
dagardner-nv Nov 16, 2022
ecb890d
Tests for preallocation feature
dagardner-nv Nov 16, 2022
541b67a
Group the docstring to apply to both overloads
dagardner-nv Nov 16, 2022
6abe441
Move preallocation logic to a mixin class
dagardner-nv Nov 16, 2022
43992fd
Add a docstring for the class and re-introduce support for DataFrames
dagardner-nv Nov 16, 2022
e95b879
wip
dagardner-nv Nov 17, 2022
5432ab6
Move inheritance order
dagardner-nv Nov 17, 2022
ba5ec3b
Add PreallocatorMixin class to the other sources
dagardner-nv Nov 17, 2022
ce47522
Replace needed_columns property with get_needed_columns & set_needed_…
dagardner-nv Nov 17, 2022
22ff5a2
Add some docstrings
dagardner-nv Nov 17, 2022
a487f37
Pre-instantiate PreallocateStage & PreallocateStageInterfaceProxy for…
dagardner-nv Nov 17, 2022
4d4aa1d
wip
dagardner-nv Nov 17, 2022
4acb7ac
Use a single vector of tuples rather than two vectors
dagardner-nv Nov 17, 2022
0df54ad
Merge branch 'mdd_table-info-locking' into david-mdd_table-info-locking
dagardner-nv Nov 17, 2022
2425dc7
Tighten up the set_meta method
dagardner-nv Nov 17, 2022
3793491
Catch any potential exceptions encountered fetching the column, and t…
dagardner-nv Nov 17, 2022
280e0c3
Move error handling to get_meta which is where the exception is throw…
dagardner-nv Nov 17, 2022
b099550
wip
dagardner-nv Nov 17, 2022
11de2e5
wip
dagardner-nv Nov 17, 2022
5e32eae
Revert "wip"
dagardner-nv Nov 18, 2022
84a265e
Update the real world phishing example to reflect that the returned d…
dagardner-nv Nov 18, 2022
f7ec0db
Merge branch 'david-mdd_table-info-locking-mixin' into david-mdd_tabl…
dagardner-nv Nov 18, 2022
2a8b6c8
Add PreallocatorMixin to NLPVizFileSource
dagardner-nv Nov 18, 2022
48927ca
Merge DataType & DType classes, rename type_util.cu to type_util.cpp …
dagardner-nv Nov 21, 2022
f2d4381
Rename type_util to dtype
dagardner-nv Nov 21, 2022
b0ff09b
Replace make_str_to_type_id with an initializer list, make str_to_typ…
dagardner-nv Nov 21, 2022
6280c0f
Expose TypeId enum to python
dagardner-nv Nov 21, 2022
792ee37
Use TypeId enum on python side to specify needed column types
dagardner-nv Nov 21, 2022
a369c8a
Remove unused function
dagardner-nv Nov 22, 2022
7434bb5
Use cudf.Scalar to define the new columns instead of cupy.zeros as th…
dagardner-nv Nov 22, 2022
5e07b46
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Dec 13, 2022
9e1f184
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Dec 29, 2022
fe5d394
Fix missed srf ref
dagardner-nv Dec 29, 2022
9cf28d0
srf->mrc
dagardner-nv Dec 29, 2022
aa5e26a
Remove reundant definition of size_in_bits and fix doc group
dagardner-nv Dec 29, 2022
cb95bef
Fix merge error
dagardner-nv Dec 29, 2022
51765c1
srf->mrc
dagardner-nv Dec 29, 2022
c430ecb
First pass at a context manager around mutating the dataframe
dagardner-nv Dec 29, 2022
9f4ec9d
First pass at exposing MutableTableInfo to python as a context manage…
dagardner-nv Dec 29, 2022
db966bb
wip
dagardner-nv Dec 30, 2022
1b81dec
Define explicit constructors for MessageMeta and subclasses, dataclas…
dagardner-nv Dec 30, 2022
0adf557
Update MultiMessage to use the mutable_dataframe ctx manager and _df …
dagardner-nv Dec 30, 2022
69a4781
Update preallocate mixin to use mutable_dataframe ctx manager
dagardner-nv Dec 30, 2022
ada52cb
Unittest for new MessageMeta functionality
dagardner-nv Dec 30, 2022
6c9d8ac
Fixes a problem where `with` blocks don't have their own scope, and t…
dagardner-nv Dec 31, 2022
4fe3b68
Both MutableTableInfo and pybind11::object are now unique_ptrs, allow…
dagardner-nv Jan 3, 2023
4ac1dc8
Update to mutable dataframe changes
dagardner-nv Jan 3, 2023
50f2e00
Mark any test that executes a pipeline as slow
dagardner-nv Jan 3, 2023
6d83cb4
Ignore our own warning about the df property when runnin tests
dagardner-nv Jan 3, 2023
59bd99e
Manipulate the dataframe in-place using the new mutable_dataframe con…
dagardner-nv Jan 3, 2023
f6b0183
Add a comment explaining the usage of the _needed_columns attribute
dagardner-nv Jan 3, 2023
a7af011
Use the _sep_token attribute
dagardner-nv Jan 3, 2023
0f09ed5
Update docs to reflect the new preallocate and mutable_dataframe changes
dagardner-nv Jan 3, 2023
da76b60
Restore _preallocate_df method
dagardner-nv Jan 3, 2023
008309c
Document PreallocatorMixin and fix type-os
dagardner-nv Jan 3, 2023
24c9999
punctuation
dagardner-nv Jan 3, 2023
2bc98ea
Add blank line
dagardner-nv Jan 3, 2023
b5ca5bb
Update copyright year
dagardner-nv Jan 4, 2023
0db9003
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Jan 4, 2023
4c0c3eb
Add MORPHEUS_CONTAINER as an arg
dagardner-nv Jan 4, 2023
caa432d
Update CR year
dagardner-nv Jan 4, 2023
1af8a82
Fix DFP example due to recent changes
dagardner-nv Jan 4, 2023
3089088
Formatting fixes
dagardner-nv Jan 4, 2023
4677c06
Remove unused import
dagardner-nv Jan 4, 2023
3c803ef
Remove unused imports
dagardner-nv Jan 4, 2023
df9ae01
Formatting fixes
dagardner-nv Jan 4, 2023
026dbcd
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Jan 6, 2023
bb7cd0c
Move TypeId & tyepid_to_numpy_str into the common module
dagardner-nv Jan 19, 2023
6b0f0be
Rename MutableCtxMgr --> MutableTableCtxMgr per PR feedback
dagardner-nv Jan 19, 2023
b68925c
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Jan 19, 2023
804ad30
Don't call get_mutable_info until the __enter__ method
dagardner-nv Jan 19, 2023
32a887b
Update to return the dataframe from the enter method of the context m…
dagardner-nv Jan 20, 2023
a16fa52
Move MutableTableCtxMgr to its own compilation unit
dagardner-nv Jan 20, 2023
6f6a369
Add mutable_table_ctx_mgr.cpp to cmake
dagardner-nv Jan 20, 2023
cea873a
Update docs
dagardner-nv Jan 20, 2023
340363e
MutableTableCtxMgr moved to separate compilation unit
dagardner-nv Jan 21, 2023
31484f4
py::attribute_error is probably a better error than a runtime
dagardner-nv Jan 21, 2023
bc44913
MutableTableCtxMgr moved to separate compilation unit
dagardner-nv Jan 21, 2023
550ebd5
Update python to match cpp changes
dagardner-nv Jan 21, 2023
6a98a61
More tests
dagardner-nv Jan 21, 2023
1d665b6
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Jan 21, 2023
9a30f5b
Fix CR year
dagardner-nv Jan 21, 2023
6ff759c
Add CR header
dagardner-nv Jan 21, 2023
ced00b8
Merge branch 'david-mdd_table-info-locking' of github.com:dagardner-n…
dagardner-nv Jan 21, 2023
29ac4ae
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Jan 25, 2023
d1e44c3
Fix merge issues
dagardner-nv Jan 25, 2023
d0dabbd
Fix merge errors
dagardner-nv Jan 25, 2023
fc10fd1
fix_all.sh updates
dagardner-nv Jan 26, 2023
7594eda
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Jan 26, 2023
ef879ec
Merge branch 'branch-23.01' into david-mdd_table-info-locking
dagardner-nv Jan 27, 2023
e4622a1
Merge branch 'branch-23.03' into david-mdd_table-info-locking
dagardner-nv Jan 31, 2023
4faa7ca
Merge branch 'branch-23.03' into david-mdd_table-info-locking
dagardner-nv Feb 1, 2023
64b98de
Merge branch 'branch-23.03' into david-mdd_table-info-locking
dagardner-nv Feb 1, 2023
6d972c9
Merge branch 'branch-23.03' into david-mdd_table-info-locking
dagardner-nv Feb 6, 2023
bd553ca
Merge branch 'david-mdd_table-info-locking' of github.com:dagardner-n…
dagardner-nv Feb 6, 2023
2bf0059
Remove redundant import
dagardner-nv Feb 6, 2023
678bc71
Merge branch 'branch-23.03' into david-mdd_table-info-locking
dagardner-nv Feb 6, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/scripts/copyright.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,4 +462,6 @@ def checkCopyright_main():
}

if __name__ == "__main__":
import logging
logging.basicConfig(level=logging.DEBUG)
sys.exit(checkCopyright_main())
180 changes: 133 additions & 47 deletions docs/source/developer_guide/guides/2_real_world_phishing.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions docs/source/developer_guide/guides/4_source_cpp_stage.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ RabbitMQSourceStage(const std::string& host,
const std::string& exchange_type = "fanout"s,
const std::string& queue_name = ""s,
std::chrono::milliseconds poll_interval = 100ms);

~RabbitMQSourceStage() override = default;
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

import mrc

from morpheus._lib.common import TypeId
from morpheus.cli.register_stage import register_stage
from morpheus.config import Config
from morpheus.config import PipelineModes
Expand Down Expand Up @@ -48,6 +49,17 @@ def __init__(self, config: Config, sep_token: str = '[SEP]'):
else:
raise ValueError("sep_token cannot be an empty string")

# This stage adds new columns to the DataFrame, as an optimization we define the columns that are needed,
# ensuring that these columns are pre-allocated with null values. This action is performed by Morpheus for any
# stage defining this attribute.
self._needed_columns.update({
'to_count': TypeId.INT32,
'bcc_count': TypeId.INT32,
'cc_count': TypeId.INT32,
'total_recipients': TypeId.INT32,
'data': TypeId.STRING
})

@property
def name(self) -> str:
return "recipient-features"
Expand All @@ -59,18 +71,17 @@ def supports_cpp_node(self) -> bool:
return False

def on_data(self, message: MessageMeta) -> MessageMeta:
# Get the DataFrame from the incoming message
df = message.df

df['to_count'] = df['To'].str.count('@')
df['bcc_count'] = df['BCC'].str.count('@')
df['cc_count'] = df['CC'].str.count('@')
df['total_recipients'] = df['to_count'] + df['bcc_count'] + df['cc_count']

# Attach features to string data
df['data'] = (df['to_count'].astype(str) + '[SEP]' + df['bcc_count'].astype(str) + '[SEP]' +
df['cc_count'].astype(str) + '[SEP]' + df['total_recipients'].astype(str) + '[SEP]' +
df['Message'])
# Open the DataFrame from the incoming message for in-place modification
with message.mutable_dataframe() as df:
df['to_count'] = df['To'].str.count('@')
df['bcc_count'] = df['BCC'].str.count('@')
df['cc_count'] = df['CC'].str.count('@')
df['total_recipients'] = df['to_count'] + df['bcc_count'] + df['cc_count']

# Attach features to string data
df['data'] = (df['to_count'].astype(str) + self._sep_token + df['bcc_count'].astype(str) + self._sep_token +
df['cc_count'].astype(str) + self._sep_token + df['total_recipients'].astype(str) +
self._sep_token + df['Message'])

# Return the message for the next stage
return message
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,15 @@
from morpheus.cli.register_stage import register_stage
from morpheus.config import Config
from morpheus.messages.message_meta import MessageMeta
from morpheus.pipeline.preallocator_mixin import PreallocatorMixin
from morpheus.pipeline.single_output_source import SingleOutputSource
from morpheus.pipeline.stream_pair import StreamPair

logger = logging.getLogger(__name__)


@register_stage("from-rabbitmq")
class RabbitMQSourceStage(SingleOutputSource):
class RabbitMQSourceStage(PreallocatorMixin, SingleOutputSource):
"""
Source stage used to load messages from a RabbitMQ queue.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,15 @@
from morpheus.cli.register_stage import register_stage
from morpheus.config import Config
from morpheus.messages.message_meta import MessageMeta
from morpheus.pipeline.preallocator_mixin import PreallocatorMixin
from morpheus.pipeline.single_output_source import SingleOutputSource
from morpheus.pipeline.stream_pair import StreamPair

logger = logging.getLogger(__name__)


@register_stage("from-rabbitmq")
class RabbitMQSourceStage(SingleOutputSource):
class RabbitMQSourceStage(PreallocatorMixin, SingleOutputSource):
"""
Source stage used to load messages from a RabbitMQ queue.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ services:
dockerfile: ./Dockerfile
target: jupyter
args:
- MORPHEUS_CONTAINER=${MORPHEUS_CONTAINER:-nvcr.io/nvidia/morpheus/morpheus}
- MORPHEUS_CONTAINER_VERSION=${MORPHEUS_CONTAINER_VERSION:-v23.01.00-runtime}
deploy:
resources:
Expand Down Expand Up @@ -71,6 +72,7 @@ services:
dockerfile: ./Dockerfile
target: runtime
args:
- MORPHEUS_CONTAINER=${MORPHEUS_CONTAINER:-nvcr.io/nvidia/morpheus/morpheus}
- MORPHEUS_CONTAINER_VERSION=${MORPHEUS_CONTAINER_VERSION:-v23.01.00-runtime}
image: dfp_morpheus
container_name: morpheus_pipeline
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,15 @@
import logging
import typing

import pandas as pd

from morpheus.messages.message_meta import MessageMeta
from morpheus.messages.multi_message import MultiMessage

logger = logging.getLogger(__name__)


@dataclasses.dataclass
@dataclasses.dataclass(init=False)
class DFPMessageMeta(MessageMeta, cpp_class=None):
"""
This class extends MessageMeta to also hold userid corresponding to batched metadata.
Expand All @@ -37,11 +39,15 @@ class DFPMessageMeta(MessageMeta, cpp_class=None):
"""
user_id: str

def __init__(self, df: pd.DataFrame, user_id: str) -> None:
super().__init__(df)
self.user_id = user_id

def get_df(self):
return self.df

def set_df(self, df):
self.df = df
self._df = df


@dataclasses.dataclass
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
from morpheus._lib.common import FileTypes
from morpheus.config import Config
from morpheus.io.deserializers import read_file_to_df
from morpheus.pipeline.preallocator_mixin import PreallocatorMixin
from morpheus.pipeline.single_port_stage import SinglePortStage
from morpheus.pipeline.stream_pair import StreamPair
from morpheus.utils.column_info import DataFrameInputSchema
Expand Down Expand Up @@ -78,7 +79,7 @@ def _single_object_to_dataframe(file_object: fsspec.core.OpenFile,
return s3_df


class DFPFileToDataFrameStage(SinglePortStage):
class DFPFileToDataFrameStage(PreallocatorMixin, SinglePortStage):

def __init__(self,
c: Config,
Expand Down
3 changes: 2 additions & 1 deletion examples/sid_visualization/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
from morpheus.config import PipelineModes
from morpheus.io.deserializers import read_file_to_df
from morpheus.pipeline.linear_pipeline import LinearPipeline
from morpheus.pipeline.preallocator_mixin import PreallocatorMixin
from morpheus.pipeline.single_output_source import SingleOutputSource
from morpheus.pipeline.stream_pair import StreamPair
from morpheus.stages.general.monitor_stage import MonitorStage
Expand All @@ -39,7 +40,7 @@
from morpheus.utils.logger import configure_logging


class NLPVizFileSource(SingleOutputSource):
class NLPVizFileSource(PreallocatorMixin, SingleOutputSource):
"""
Source stage is used to load messages from a file and dumping the contents into the pipeline immediately. Useful for
testing performance and accuracy of a pipeline.
Expand Down
6 changes: 3 additions & 3 deletions morpheus/_lib/cmake/libraries/cuda_utils.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,7 @@ find_package(pybind11 REQUIRED)
# Place the two cuda sources in their own target and disable IWYU for that target.
add_library(cuda_utils_objs
OBJECT
${MORPHEUS_LIB_ROOT}/src/utilities/matx_util.cu
${MORPHEUS_LIB_ROOT}/src/utilities/type_util.cu
${MORPHEUS_LIB_ROOT}/src/utilities/matx_util.cu
)

set_target_properties(
Expand Down Expand Up @@ -51,11 +50,12 @@ target_link_libraries(cuda_utils_objs
add_library(cuda_utils
SHARED
$<TARGET_OBJECTS:cuda_utils_objs>
${MORPHEUS_LIB_ROOT}/src/objects/data_table.cpp
${MORPHEUS_LIB_ROOT}/src/objects/dev_mem_info.cpp
${MORPHEUS_LIB_ROOT}/src/objects/dtype.cpp
${MORPHEUS_LIB_ROOT}/src/objects/table_info.cpp
${MORPHEUS_LIB_ROOT}/src/objects/tensor_object.cpp
${MORPHEUS_LIB_ROOT}/src/utilities/tensor_util.cpp
${MORPHEUS_LIB_ROOT}/src/utilities/type_util_detail.cpp
)

target_include_directories(cuda_utils
Expand Down
2 changes: 2 additions & 0 deletions morpheus/_lib/cmake/libraries/morpheus.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ add_library(morpheus
${MORPHEUS_LIB_ROOT}/src/messages/multi_tensor.cpp
${MORPHEUS_LIB_ROOT}/src/objects/fiber_queue.cpp
${MORPHEUS_LIB_ROOT}/src/objects/file_types.cpp
${MORPHEUS_LIB_ROOT}/src/objects/mutable_table_ctx_mgr.cpp
${MORPHEUS_LIB_ROOT}/src/objects/wrapped_tensor.cpp
${MORPHEUS_LIB_ROOT}/src/objects/python_data_table.cpp
${MORPHEUS_LIB_ROOT}/src/objects/rmm_tensor.cpp
Expand All @@ -44,6 +45,7 @@ add_library(morpheus
${MORPHEUS_LIB_ROOT}/src/stages/file_source.cpp
${MORPHEUS_LIB_ROOT}/src/stages/filter_detection.cpp
${MORPHEUS_LIB_ROOT}/src/stages/kafka_source.cpp
${MORPHEUS_LIB_ROOT}/src/stages/preallocate.cpp
${MORPHEUS_LIB_ROOT}/src/stages/preprocess_fil.cpp
${MORPHEUS_LIB_ROOT}/src/stages/preprocess_nlp.cpp
${MORPHEUS_LIB_ROOT}/src/stages/serialize.cpp
Expand Down
34 changes: 0 additions & 34 deletions morpheus/_lib/cmake/python_modules/file_types.cmake

This file was deleted.

37 changes: 10 additions & 27 deletions morpheus/_lib/cudf_helpers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -37,23 +37,17 @@ from cudf._lib.utils cimport table_view_from_table

cdef extern from "morpheus/objects/table_info.hpp" namespace "morpheus" nogil:

cdef cppclass IDataTable:
IDataTable()

cdef cppclass TableInfo:
TableInfo()
TableInfo(shared_ptr[const IDataTable] parent,
table_view view,
vector[string] index_names,
vector[string] column_names)
cdef cppclass TableInfoData:
TableInfoData()
TableInfoData(table_view view,
vector[string] indices,
vector[string] columns)

table_view get_view() const
vector[string] get_index_names()
vector[string] get_column_names() const
table_view table_view
vector[string] index_names
vector[string] column_names

int num_indices() const
int num_columns() const
int num_rows() const

cdef public api:
object make_table_from_table_with_metadata(table_with_metadata table, int index_col_count):
Expand Down Expand Up @@ -83,19 +77,8 @@ cdef public api:

return cudf.DataFrame._from_data(data, index)

object make_table_from_table_info(TableInfo info, object owner):

i_names = info.get_index_names()
c_names = info.get_column_names()

index_names = [x.decode() for x in i_names]
column_names = [x.decode() for x in c_names]

data, index = data_from_table_view(info.get_view(), owner, column_names=column_names, index_names=index_names)

return cudf.DataFrame._from_data(data, index)

TableInfo make_table_info_from_table(object table, shared_ptr[const IDataTable] parent):
TableInfoData make_table_info_data_from_table(object table):

cdef vector[string] temp_col_names = get_column_names(table, True)

Expand Down Expand Up @@ -123,4 +106,4 @@ cdef public api:

column_names.push_back(str.encode(name))

return TableInfo(parent, input_table_view, index_names, column_names)
return TableInfoData(input_table_view, index_names, column_names)
12 changes: 8 additions & 4 deletions morpheus/_lib/include/morpheus/io/serializers.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,18 +55,22 @@ void df_to_csv(const TableInfo& tbl, std::ostream& out_stream, bool include_head
* @param tbl : A wrapper around data in the dataframe
* @param include_index_col : Determines whether or not to include the dataframe index
* @return std::string
*
* Note the include_index_col is currently being ignored in both versions of `df_to_json` due to a known issue in
* Pandas: https://github.com/pandas-dev/pandas/issues/37600
* Requires MutableTableInfo since there is no C++ implementation of the JSON writer
*/
// Note the include_index_col is currently being ignored in both versions of `df_to_json` due to a known issue in
// Pandas: https://github.com/pandas-dev/pandas/issues/37600
std::string df_to_json(const TableInfo& tbl, bool include_index_col = true);
std::string df_to_json(MutableTableInfo& tbl, bool include_index_col = true);

/**
* @brief Serialize a dataframe into a JSON formatted string
* @param tbl : A wrapper around data in the dataframe
* @param out_stream : Output stream to write the results to a destination
* @param include_index_col : Determines whether or not to include the dataframe index
*
* Requires MutableTableInfo since there is no C++ implementation of the JSON writer
*/
void df_to_json(const TableInfo& tbl, std::ostream& out_stream, bool include_index_col = true);
void df_to_json(MutableTableInfo& tbl, std::ostream& out_stream, bool include_index_col = true);

/** @} */ // end of group
} // namespace morpheus
Loading