Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix detecting index column when reading from CSV in C++ #714

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
4fe6771
Fix finding index col
dagardner-nv Feb 16, 2023
b4b833a
Use load_table_from_file instead of CuDFTableUtil::load_table
dagardner-nv Feb 16, 2023
749488f
Add _should_use_cpp class method giving class methods in message clas…
dagardner-nv Feb 16, 2023
cc9018f
Expose make_from_file method
dagardner-nv Feb 16, 2023
5c1dd2a
New tests
dagardner-nv Feb 16, 2023
ce1562b
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Feb 16, 2023
db45ee8
Remove unused includes
dagardner-nv Feb 16, 2023
f26fb18
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Feb 22, 2023
94e4aa5
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Feb 22, 2023
34bb25b
Merge branch 'david-from-file-no-index' of github.com:dagardner-nv/Mo…
dagardner-nv Feb 22, 2023
44961f1
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Feb 23, 2023
a0fcc45
Split off the mutating bit of get_index_col_count into prepare_df_index
dagardner-nv Feb 23, 2023
06212ac
Split GetIndexColCount test into GetIndexColCountNoIdx and GetIndexCo…
dagardner-nv Feb 23, 2023
eee1e4d
Make index_regex a const so we aren't building it on every invocation…
dagardner-nv Feb 23, 2023
2526817
More tests
dagardner-nv Feb 23, 2023
c044213
fix year
dagardner-nv Feb 23, 2023
597c397
Use prepare_df_index
dagardner-nv Feb 23, 2023
05ea4ad
wip
dagardner-nv Feb 23, 2023
857b581
Make serializers visible
dagardner-nv Feb 23, 2023
d6a4aa0
wip
dagardner-nv Feb 23, 2023
7e4853a
Support make_from_file in subclasses
dagardner-nv Feb 24, 2023
685846e
Applying Devin's changes
dagardner-nv Feb 24, 2023
6f101b4
Merge branch 'david-devin-test-embedded' into david-from-file-no-index
dagardner-nv Feb 24, 2023
91a9419
wip
dagardner-nv Feb 24, 2023
4154591
wip
dagardner-nv Feb 24, 2023
292f40a
Merge branch 'david-devin-test-embedded' into david-from-file-no-index
dagardner-nv Feb 24, 2023
53b5d22
Fix tests
dagardner-nv Feb 24, 2023
19205e1
Remove unused include
dagardner-nv Feb 24, 2023
652bff3
Remove unused include
dagardner-nv Feb 24, 2023
accac71
Merge branch 'david-devin-test-embedded' into david-from-file-no-index
dagardner-nv Feb 24, 2023
8211db9
Add missing include
dagardner-nv Feb 24, 2023
a2cd582
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Feb 27, 2023
92acc38
Revert classmethod
dagardner-nv Feb 27, 2023
199af79
Update load_table_from_file to use determine_file_type, add read_file…
dagardner-nv Feb 27, 2023
1e5baf5
Remove FileTypesInterfaceProxy as determine_file_type didn't need any…
dagardner-nv Feb 27, 2023
ba412a2
Add binding for read_file_to_df
dagardner-nv Feb 27, 2023
9c46e22
FileTypes::Auto is working
dagardner-nv Feb 27, 2023
e81b6f9
Use the C++ deserializers when C++ is enabled and df_type is cudf
dagardner-nv Feb 27, 2023
b750a9c
Update tests
dagardner-nv Feb 27, 2023
9690518
Formatting
dagardner-nv Feb 27, 2023
f07b091
IWYU changes
dagardner-nv Feb 28, 2023
d97f363
Remove unused import
dagardner-nv Feb 28, 2023
5a99001
Add a comment about the cudf issue I ran into
dagardner-nv Feb 28, 2023
6a58650
Fix pybind11 link error
dagardner-nv Feb 28, 2023
9a212ae
IWYU
dagardner-nv Feb 28, 2023
3a136da
Merge branch 'branch-23.03' into david-from-file-no-index [no ci]
dagardner-nv Mar 7, 2023
89e345b
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Mar 7, 2023
5403d67
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Mar 8, 2023
c26d719
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Mar 13, 2023
58e79b6
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Mar 13, 2023
092837d
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Mar 17, 2023
3d9c333
Fix merge errors
dagardner-nv Mar 17, 2023
2728e5c
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Mar 20, 2023
b116f74
Merge branch 'branch-23.03' into david-from-file-no-index
dagardner-nv Mar 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
New tests
  • Loading branch information
dagardner-nv committed Feb 16, 2023
commit 5c1dd2ad5b81d03de9d5b867b9f1a4bb6435c360
25 changes: 25 additions & 0 deletions tests/test_file_in_out_stage_pipe.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,3 +110,28 @@ def test_file_rw_multi_segment_pipe(tmp_path, config, output_type):
# Somehow 0.7 ends up being 0.7000000000000001
output_data = np.around(output_data, 2)
assert output_data.tolist() == input_data.tolist()


@pytest.mark.slow
@pytest.mark.parametrize("input_file",
[
os.path.join(TEST_DIRS.tests_data_dir, "filter_probs.csv"),
os.path.join(TEST_DIRS.tests_data_dir, "filter_probs_w_id_col.csv")
])
def test_file_rw_index_pipe(tmp_path, config, input_file):
out_file = os.path.join(tmp_path, 'results.csv')

pipe = LinearPipeline(config)
pipe.set_source(FileSourceStage(config, filename=input_file))
pipe.add_stage(WriteToFileStage(config, filename=out_file, overwrite=False, include_index_col=False))
pipe.run()

assert_path_exists(out_file)

validation_file = os.path.join(TEST_DIRS.tests_data_dir, "filter_probs.csv")
validation_data = np.loadtxt(validation_file, delimiter=",", skiprows=1)
output_data = np.loadtxt(out_file, delimiter=",", skiprows=1)

# Somehow 0.7 ends up being 0.7000000000000001
output_data = np.around(output_data, 2)
assert output_data.tolist() == validation_data.tolist()
18 changes: 18 additions & 0 deletions tests/test_message_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,12 @@
import operator
import os

import numpy as np
import pytest

from morpheus._lib.common import FileTypes
from morpheus.io.deserializers import read_file_to_df
from morpheus.io.serializers import df_to_csv
from morpheus.messages.message_meta import MessageMeta
from utils import TEST_DIRS

Expand Down Expand Up @@ -67,3 +69,19 @@ def test_copy_dataframe(config):

assert meta.copy_dataframe()['v2'][3] != 47
assert meta.df != 47


def test_make_from_file(config, tmp_path):
input_file = os.path.join(TEST_DIRS.tests_data_dir, "filter_probs_w_id_col.csv")
out_file = os.path.join(tmp_path, 'results.csv')
meta = MessageMeta.make_from_file(input_file)
with meta.mutable_dataframe() as df:
assert list(df.columns) == ['v1', 'v2', 'v3', 'v4']

with open(out_file, 'w') as fh:
fh.writelines(df_to_csv(df, include_header=True, include_index_col=True))

input_data = np.loadtxt(input_file, delimiter=",", skiprows=1)
output_data = np.loadtxt(out_file, delimiter=",", skiprows=1)
output_data = np.around(output_data, 2)
assert output_data.tolist() == input_data.tolist()