Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn & replace dataframes with non-unique indexes #691

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
5b90244
Add unittest for issue #686
dagardner-nv Feb 9, 2023
1f168a3
wip
dagardner-nv Feb 10, 2023
f38a07f
wip
dagardner-nv Feb 10, 2023
eca479a
Add 'has_unique_index' helper method to MessageMeta
dagardner-nv Feb 10, 2023
d824440
Add integration test for desrialization stage, along with test for is…
dagardner-nv Feb 10, 2023
5bf99bf
Test for has_unique_index method
dagardner-nv Feb 10, 2023
3add91b
Remove parametrize variables not needed for this test
dagardner-nv Feb 10, 2023
e3be4cf
First pass at replacing a non-unique index
dagardner-nv Feb 10, 2023
e43ac89
Add cpp impl for has_unique_index
dagardner-nv Feb 10, 2023
e60742c
wip
dagardner-nv Feb 10, 2023
53ee170
Move index reset to MutableTableInfo so that the column & index names…
dagardner-nv Feb 10, 2023
3fd0ea3
use logger.warning instead of logger.warn
dagardner-nv Feb 10, 2023
c651744
Update multi-segment test
dagardner-nv Feb 10, 2023
1d41fd6
Select only the columns in the view when writing json
dagardner-nv Feb 10, 2023
f9396be
Log and ignore include_index_col=false, otherwise cudf will throw an …
dagardner-nv Feb 11, 2023
fb141c9
wip
dagardner-nv Feb 11, 2023
c77360f
Document work-around
dagardner-nv Feb 11, 2023
ef3eb30
Fix casing for cuDF
dagardner-nv Feb 13, 2023
0554785
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 13, 2023
ae7d4af
Change fatal log to an error log
dagardner-nv Feb 13, 2023
ccc6e6c
Only set include_index_col=False when writing CSV
dagardner-nv Feb 13, 2023
d9669e2
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 14, 2023
bf4d4e6
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 15, 2023
7acff20
wip
dagardner-nv Feb 15, 2023
2a14392
Move index reset logic to a method on MessageMeta
dagardner-nv Feb 15, 2023
123d216
Merge branch 'david-warn-non-unique-686' of github.com:dagardner-nv/M…
dagardner-nv Feb 15, 2023
aad70ca
Repeat test with dup id occurring at the front and the end of the df
dagardner-nv Feb 15, 2023
d63bdcc
Only use the index for slicing if the index is unique, otherwise use …
dagardner-nv Feb 15, 2023
a46d43e
Add test for replace_non_unique_index method
dagardner-nv Feb 15, 2023
4715a5d
rename reset_index to replace_non_unique_index
dagardner-nv Feb 15, 2023
829cc3d
Remove unused import
dagardner-nv Feb 15, 2023
50dcce7
Add missing docstring
dagardner-nv Feb 15, 2023
a0f1388
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 16, 2023
09c1e0f
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 22, 2023
7c2f770
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 22, 2023
6ad14c8
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 23, 2023
ec0ed04
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 24, 2023
22daccd
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 25, 2023
1f35d00
Add missing includes
dagardner-nv Feb 25, 2023
7ad8d2d
Cleanup includes
dagardner-nv Feb 25, 2023
e4976db
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Feb 27, 2023
0b2d13c
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Mar 7, 2023
4b99d1e
Merge branch 'branch-23.03' into david-warn-non-unique-686
dagardner-nv Mar 7, 2023
482fd45
Adding additional tests to MultiMessage and fixing the bugs it discovers
mdemoret-nv Mar 10, 2023
bb54dad
All multi message tests passing
mdemoret-nv Mar 14, 2023
d4b8761
Most tests now passing
mdemoret-nv Mar 14, 2023
c002a93
Merge branch 'branch-23.03' into david-warn-non-unique-686
mdemoret-nv Mar 15, 2023
f4fb726
Removing files that should not have been committed
mdemoret-nv Mar 15, 2023
51e4e71
Removing stub generation
mdemoret-nv Mar 15, 2023
76921d3
Fixing up post merge failures
mdemoret-nv Mar 15, 2023
65e7edb
Large cleanup and added multi tensor tests
mdemoret-nv Mar 16, 2023
b55f50d
Merge branch 'branch-23.03' into david-warn-non-unique-686
mdemoret-nv Mar 16, 2023
4e92c8b
Style cleanup
mdemoret-nv Mar 16, 2023
68ff815
Merge branch 'branch-23.03' into david-warn-non-unique-686
mdemoret-nv Mar 16, 2023
77e2db0
Cleaning up the code
mdemoret-nv Mar 16, 2023
1ac0c6a
Large cleanup
mdemoret-nv Mar 16, 2023
39beb1f
Non-slow tests passing
mdemoret-nv Mar 17, 2023
42a70b9
Large cleanup. All tests passing locally
mdemoret-nv Mar 17, 2023
1cfa57d
Merge branch 'branch-23.03' into david-warn-non-unique-686
mdemoret-nv Mar 17, 2023
5bf02e9
Removing stubs from the build in CI
mdemoret-nv Mar 17, 2023
345fa78
IWYU fixes
mdemoret-nv Mar 17, 2023
365f583
Final changes to get CI to pass
mdemoret-nv Mar 17, 2023
1d9fe36
Style fixes
mdemoret-nv Mar 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'branch-23.03' into david-warn-non-unique-686
  • Loading branch information
mdemoret-nv committed Mar 15, 2023
commit c002a93ccc7e7b6753461932b0d735d8c24e0e9b
9 changes: 4 additions & 5 deletions morpheus/_lib/include/morpheus/messages/multi.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ class MultiMessage : public DerivedMultiMessage<MultiMessage>
* @param o : Offset into the metadata batch
* @param c : Messages count
*/
MultiMessage(std::shared_ptr<MessageMeta> m, size_t offset = 0, std::optional<size_t> count = std::nullopt);
MultiMessage(std::shared_ptr<MessageMeta> m, TensorIndex offset = 0, TensorIndex count = -1);

std::shared_ptr<MessageMeta> meta;
TensorIndex mess_offset{0};
Expand Down Expand Up @@ -333,8 +333,8 @@ struct MultiMessageInterfaceProxy
* TODO(Documentation)
*/
static std::shared_ptr<MultiMessage> init(std::shared_ptr<MessageMeta> meta,
int32_t mess_offset,
std::optional<int32_t> mess_count);
TensorIndex mess_offset,
TensorIndex mess_count);

/**
* TODO(Documentation)
Expand Down Expand Up @@ -379,8 +379,7 @@ struct MultiMessageInterfaceProxy
/**
* TODO(Documentation)
*/
// Use ssize_t here to give better error messages on negative values
static std::shared_ptr<MultiMessage> get_slice(MultiMessage& self, ssize_t start, ssize_t stop);
static std::shared_ptr<MultiMessage> get_slice(MultiMessage& self, TensorIndex start, TensorIndex stop);

static std::shared_ptr<MultiMessage> copy_ranges(MultiMessage& self,
const std::vector<RangeType>& ranges,
Expand Down
16 changes: 8 additions & 8 deletions morpheus/_lib/include/morpheus/messages/multi_inference.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,11 @@ class MultiInferenceMessage : public DerivedMultiMessage<MultiInferenceMessage,
* @param count Message count in inference memory instance
*/
MultiInferenceMessage(std::shared_ptr<MessageMeta> meta,
size_t mess_offset = 0,
std::optional<size_t> mess_count = std::nullopt,
TensorIndex mess_offset = 0,
TensorIndex mess_count = -1,
std::shared_ptr<InferenceMemory> memory = nullptr,
size_t offset = 0,
std::optional<size_t> count = std::nullopt);
TensorIndex offset = 0,
TensorIndex count = -1);

/**
* @brief Returns the input tensor for the given `name`.
Expand Down Expand Up @@ -111,11 +111,11 @@ struct MultiInferenceMessageInterfaceProxy : public MultiTensorMessageInterfaceP
* @return std::shared_ptr<MultiInferenceMessage>
*/
static std::shared_ptr<MultiInferenceMessage> init(std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<InferenceMemory> memory,
size_t offset,
std::optional<size_t> count);
TensorIndex offset,
TensorIndex count);
};
#pragma GCC visibility pop
/** @} */ // end of group
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,11 @@ class MultiInferenceFILMessage : public DerivedMultiMessage<MultiInferenceFILMes
* @param count Message count in inference memory object
*/
MultiInferenceFILMessage(std::shared_ptr<MessageMeta> meta,
size_t mess_offset = 0,
std::optional<size_t> mess_count = std::nullopt,
std::shared_ptr<InferenceMemory> memory = nullptr,
size_t offset = 0,
std::optional<size_t> count = std::nullopt);
TensorIndex mess_offset = 0,
TensorIndex mess_count = -1,
std::shared_ptr<morpheus::InferenceMemory> memory = nullptr,
TensorIndex offset = 0,
TensorIndex count = -1);

/**
* @brief Returns the 'input__0' tensor, throws a `std::runtime_error` if it does not exist
Expand Down Expand Up @@ -116,11 +116,11 @@ struct MultiInferenceFILMessageInterfaceProxy : public MultiInferenceMessageInte
* @return std::shared_ptr<MultiInferenceFILMessage>
*/
static std::shared_ptr<MultiInferenceFILMessage> init(std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<InferenceMemory> memory,
size_t offset,
std::optional<size_t> count);
TensorIndex offset,
TensorIndex count);

/**
* @brief Get 'input__0' tensor as a python object
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,11 +58,11 @@ class MultiInferenceNLPMessage : public DerivedMultiMessage<MultiInferenceNLPMes
* @param count Message count in inference memory object
*/
MultiInferenceNLPMessage(std::shared_ptr<MessageMeta> meta,
size_t mess_offset = 0,
std::optional<size_t> mess_count = std::nullopt,
TensorIndex mess_offset = 0,
TensorIndex mess_count = -1,
std::shared_ptr<InferenceMemory> memory = nullptr,
size_t offset = 0,
std::optional<size_t> count = std::nullopt);
TensorIndex offset = 0,
TensorIndex count = -1);

/**
* @brief Returns the 'input_ids' tensor, throws a `std::runtime_error` if it does not exist.
Expand Down Expand Up @@ -132,11 +132,11 @@ struct MultiInferenceNLPMessageInterfaceProxy : public MultiInferenceMessageInte
* @return std::shared_ptr<MultiInferenceNLPMessage>
*/
static std::shared_ptr<MultiInferenceNLPMessage> init(std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<InferenceMemory> memory,
size_t offset,
std::optional<size_t> count);
TensorIndex offset,
TensorIndex count);

/**
* @brief Get 'input_ids' tensor as a python object
Expand Down
16 changes: 8 additions & 8 deletions morpheus/_lib/include/morpheus/messages/multi_response.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,11 @@ class MultiResponseMessage : public DerivedMultiMessage<MultiResponseMessage, Mu
* @param count Message count in inference memory instance
*/
MultiResponseMessage(std::shared_ptr<MessageMeta> meta,
size_t mess_offset = 0,
std::optional<size_t> mess_count = std::nullopt,
TensorIndex mess_offset = 0,
TensorIndex mess_count = -1,
std::shared_ptr<ResponseMemory> memory = nullptr,
size_t offset = 0,
std::optional<size_t> count = std::nullopt);
TensorIndex offset = 0,
TensorIndex count = -1);

/**
* @brief Returns the output tensor with the given name.
Expand Down Expand Up @@ -118,11 +118,11 @@ struct MultiResponseMessageInterfaceProxy : public MultiTensorMessageInterfacePr
* @return std::shared_ptr<MultiResponseMessage>
*/
static std::shared_ptr<MultiResponseMessage> init(std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<ResponseMemory> memory,
size_t offset,
std::optional<size_t> count);
TensorIndex offset,
TensorIndex count);

/**
* @brief Returns the output tensor for a given name
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,11 @@ class MultiResponseProbsMessage : public DerivedMultiMessage<MultiResponseProbsM
* @param count Message count in inference memory instance
*/
MultiResponseProbsMessage(std::shared_ptr<MessageMeta> meta,
size_t mess_offset = 0,
std::optional<size_t> mess_count = std::nullopt,
TensorIndex mess_offset = 0,
TensorIndex mess_count = -1,
std::shared_ptr<ResponseMemoryProbs> memory = nullptr,
size_t offset = 0,
std::optional<size_t> count = std::nullopt);
TensorIndex offset = 0,
TensorIndex count = -1);

/**
* @brief Returns the `probs` (probabilities) output tensor
Expand Down Expand Up @@ -104,11 +104,11 @@ struct MultiResponseProbsMessageInterfaceProxy : public MultiResponseMessageInte
* @return std::shared_ptr<MultiResponseProbsMessage>
*/
static std::shared_ptr<MultiResponseProbsMessage> init(std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<ResponseMemoryProbs> memory,
size_t offset,
std::optional<size_t> count);
TensorIndex offset,
TensorIndex count);

/**
* @brief Return the `probs` (probabilities) output tensor
Expand Down
16 changes: 8 additions & 8 deletions morpheus/_lib/include/morpheus/messages/multi_tensor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,11 +74,11 @@ class MultiTensorMessage : public DerivedMultiMessage<MultiTensorMessage, MultiM
* @param count Message count in tensor memory instance
*/
MultiTensorMessage(std::shared_ptr<MessageMeta> meta,
size_t mess_offset = 0,
std::optional<size_t> mess_count = std::nullopt,
TensorIndex mess_offset = 0,
TensorIndex mess_count = -1,
std::shared_ptr<TensorMemory> memory = nullptr,
size_t offset = 0,
std::optional<size_t> count = std::nullopt);
TensorIndex offset = 0,
TensorIndex count = -1);

std::shared_ptr<morpheus::TensorMemory> memory;
TensorIndex offset{0};
Expand Down Expand Up @@ -144,11 +144,11 @@ struct MultiTensorMessageInterfaceProxy
* @return std::shared_ptr<MultiTensorMessage>
*/
static std::shared_ptr<MultiTensorMessage> init(std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<TensorMemory> memory,
size_t offset,
std::optional<size_t> count);
TensorIndex offset,
TensorIndex count);

/**
* @brief Returns a shared pointer of a tensor memory object
Expand Down
20 changes: 13 additions & 7 deletions morpheus/_lib/src/messages/multi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@
#include <cstdint> // for uint8_t
#include <sstream>
#include <stdexcept> // for runtime_error
#include <tuple>
// IWYU pragma: no_include <unordered_map>

namespace morpheus {
Expand All @@ -55,8 +54,8 @@ using namespace py::literals;

/****** Component public implementations *******************/
/****** MultiMessage****************************************/
MultiMessage::MultiMessage(std::shared_ptr<morpheus::MessageMeta> m, size_t offset, std::optional<size_t> count) :
meta(std::move(m)),
MultiMessage::MultiMessage(std::shared_ptr<MessageMeta> meta, TensorIndex offset, TensorIndex count) :
meta(std::move(meta)),
mess_offset(offset)
{
if (!this->meta)
Expand All @@ -65,7 +64,12 @@ MultiMessage::MultiMessage(std::shared_ptr<morpheus::MessageMeta> m, size_t offs
}

// Default to using the count from the meta if it is unset
this->mess_count = count.value_or(this->meta->count());
if (mess_count == -1)
{
mess_count = this->meta->count();
}

this->mess_count = mess_count;

if (this->mess_offset < 0 || this->mess_offset >= this->meta->count())
{
Expand Down Expand Up @@ -222,8 +226,8 @@ std::vector<RangeType> MultiMessage::apply_offset_to_ranges(TensorIndex offset,

/****** MultiMessageInterfaceProxy *************************/
std::shared_ptr<MultiMessage> MultiMessageInterfaceProxy::init(std::shared_ptr<MessageMeta> meta,
int32_t mess_offset,
std::optional<int32_t> mess_count)
TensorIndex mess_offset,
TensorIndex mess_count)
{
return std::make_shared<MultiMessage>(std::move(meta), mess_offset, mess_count);
}
Expand Down Expand Up @@ -397,7 +401,9 @@ void MultiMessageInterfaceProxy::set_meta(MultiMessage& self, pybind11::object c
mutable_info.return_obj(std::move(df));
}

std::shared_ptr<MultiMessage> MultiMessageInterfaceProxy::get_slice(MultiMessage& self, ssize_t start, ssize_t stop)
std::shared_ptr<MultiMessage> MultiMessageInterfaceProxy::get_slice(MultiMessage& self,
TensorIndex start,
TensorIndex stop)
{
if (start < 0)
{
Expand Down
16 changes: 8 additions & 8 deletions morpheus/_lib/src/messages/multi_inference.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ namespace morpheus {
/****** Component public implementations *******************/
/****** <MultiInferenceMessage>****************************************/
MultiInferenceMessage::MultiInferenceMessage(std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<InferenceMemory> memory,
size_t offset,
std::optional<size_t> count) :
TensorIndex offset,
TensorIndex count) :
DerivedMultiMessage(meta, mess_offset, mess_count, memory, offset, count)
{}

Expand All @@ -55,11 +55,11 @@ void MultiInferenceMessage::set_input(const std::string& name, const TensorObjec
/****** <MultiInferenceMessage>InterfaceProxy *************************/
std::shared_ptr<MultiInferenceMessage> MultiInferenceMessageInterfaceProxy::init(
std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<InferenceMemory> memory,
size_t offset,
std::optional<size_t> count)
TensorIndex offset,
TensorIndex count)
{
return std::make_shared<MultiInferenceMessage>(
std::move(meta), mess_offset, mess_count, std::move(memory), offset, count);
Expand Down
16 changes: 8 additions & 8 deletions morpheus/_lib/src/messages/multi_inference_fil.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ namespace morpheus {
/****** Component public implementations *******************/
/****** MultiInferenceFILMessage****************************************/
MultiInferenceFILMessage::MultiInferenceFILMessage(std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<InferenceMemory> memory,
size_t offset,
std::optional<size_t> count) :
TensorIndex offset,
TensorIndex count) :
DerivedMultiMessage(meta, mess_offset, mess_count, memory, offset, count)
{}

Expand All @@ -60,11 +60,11 @@ void MultiInferenceFILMessage::set_seq_ids(const TensorObject& seq_ids)
/****** MultiInferenceFILMessageInterfaceProxy *************************/
std::shared_ptr<MultiInferenceFILMessage> MultiInferenceFILMessageInterfaceProxy::init(
std::shared_ptr<MessageMeta> meta,
size_t mess_offset,
std::optional<size_t> mess_count,
TensorIndex mess_offset,
TensorIndex mess_count,
std::shared_ptr<InferenceMemory> memory,
size_t offset,
std::optional<size_t> count)
TensorIndex offset,
TensorIndex count)
{
return std::make_shared<MultiInferenceFILMessage>(
std::move(meta), mess_offset, mess_count, std::move(memory), offset, count);
Expand Down
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.