Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing support for MultiMessage from stages #1803

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
ba506fa
Remove MultiMessage from AddScoresStage
yczhang-nv Jul 8, 2024
bb18d7e
Remove MultiMessage from AddScoresStage python impl & reformatting
yczhang-nv Jul 8, 2024
6500db8
fix test errors
yczhang-nv Jul 9, 2024
6402965
remove from preprocess_nlp cpp impl
yczhang-nv Jul 9, 2024
7a53076
remove from preprocess_nlp python impl
yczhang-nv Jul 9, 2024
f4f97bd
remove from preprocess_fil cpp impl
yczhang-nv Jul 9, 2024
9e8b2c5
remove from preprocess_fil python impl
yczhang-nv Jul 9, 2024
0bbadbc
remove from preprocess_ae impl
yczhang-nv Jul 9, 2024
a960468
remove from serialize stage impl
yczhang-nv Jul 9, 2024
5c45b62
remove from filter_detections impl
yczhang-nv Jul 9, 2024
6ea61b8
remove from filter_detections_controller
yczhang-nv Jul 10, 2024
81ccd6e
remove from generate_viz_frames
yczhang-nv Jul 10, 2024
fecd7e0
remove from mlflow_drift_stage
yczhang-nv Jul 10, 2024
4a572c9
remove from timeseries stage
yczhang-nv Jul 10, 2024
3e3ad14
remove from validation stage
yczhang-nv Jul 10, 2024
fa3d971
update deserialize stage
yczhang-nv Jul 10, 2024
30a76a2
fix some unit tests
yczhang-nv Jul 11, 2024
6629139
fix unit tests
yczhang-nv Jul 12, 2024
746530c
fix inference
yczhang-nv Jul 12, 2024
f0dbfc0
update fil stage
yczhang-nv Jul 15, 2024
46217dc
rollback to test triton_inference_stage
yczhang-nv Jul 15, 2024
72171fa
test cm for test_inference_stage
yczhang-nv Jul 15, 2024
3753707
passed test_triton_inference_stage
yczhang-nv Jul 17, 2024
b6509f0
fix
yczhang-nv Jul 17, 2024
bddaf5f
debugging test_dfp.py
yczhang-nv Jul 19, 2024
cc945c7
fix test_dfp.py
yczhang-nv Jul 22, 2024
b436176
fix test_phishing.py
yczhang-nv Jul 22, 2024
83e8367
Merge remote-tracking branch 'upstream/branch-24.10' into verify-and-…
yczhang-nv Jul 22, 2024
9ca9320
fix test
yczhang-nv Jul 22, 2024
55c75a6
remove some multimessage branches
yczhang-nv Jul 23, 2024
90ab421
fix ci
yczhang-nv Jul 23, 2024
94e4639
fix ci
yczhang-nv Jul 24, 2024
aa00fbb
fix naming
yczhang-nv Jul 25, 2024
e77a50a
fix ci
yczhang-nv Jul 25, 2024
d8b60d6
Merge branch 'branch-24.10' into complete-remove-multi-message
yczhang-nv Jul 25, 2024
4581346
fix CI
yczhang-nv Jul 26, 2024
8cd1a7d
fix CI
yczhang-nv Jul 26, 2024
66013af
fix CI
yczhang-nv Jul 26, 2024
554856c
test gpg
yczhang-nv Jul 29, 2024
dfff798
test gpg sign
yczhang-nv Jul 29, 2024
5940311
test gpg
yczhang-nv Jul 29, 2024
5b890d2
test gpg
yczhang-nv Jul 29, 2024
9977a6a
gix abp_pcap_detection
yczhang-nv Jul 29, 2024
9351310
Finalize CI
yczhang-nv Jul 29, 2024
10a01ff
rollback
yczhang-nv Jul 29, 2024
db17566
Merge remote-tracking branch 'upstream/branch-24.10' into complete-re…
yczhang-nv Aug 13, 2024
8d9ecf4
fix python checks
yczhang-nv Aug 13, 2024
fa816ff
fix typo
yczhang-nv Aug 13, 2024
5927b62
Merge remote-tracking branch 'upstream/branch-24.10' into complete-re…
yczhang-nv Aug 14, 2024
95725c8
fix ci
yczhang-nv Aug 14, 2024
c4b1cdf
Merge branch 'branch-24.10' into complete-remove-multi-message
yczhang-nv Aug 14, 2024
c596520
Merge branch 'branch-24.10' into complete-remove-multi-message
yczhang-nv Aug 15, 2024
24c3af0
Merge remote-tracking branch 'upstream/branch-24.10' into complete-re…
yczhang-nv Aug 27, 2024
d3e1b45
support casting TensorObject from Python to C++ for ControlMessage
yczhang-nv Aug 28, 2024
371f001
add overload to TensorObject
yczhang-nv Aug 28, 2024
ac94065
Update comment
yczhang-nv Aug 28, 2024
efd9937
Update comments
yczhang-nv Aug 28, 2024
dd0e0a0
Merge branch 'cast-python-tensor-memory-to-cpp-for-control-message' i…
yczhang-nv Aug 28, 2024
4218d74
fix comments
yczhang-nv Aug 29, 2024
08fb90e
fix comments
yczhang-nv Aug 29, 2024
41e9e2f
fic CI format
yczhang-nv Aug 29, 2024
b811206
fix format
yczhang-nv Aug 29, 2024
c2bf5d3
fix CI
yczhang-nv Aug 29, 2024
8033055
Fix CI
yczhang-nv Aug 29, 2024
f4be468
fix CI
yczhang-nv Aug 29, 2024
809bbb5
revert changes that break the build
yczhang-nv Aug 29, 2024
9c03ee9
fix CI
yczhang-nv Aug 29, 2024
66922a7
fix CI
yczhang-nv Aug 29, 2024
f5c16b0
Merge remote-tracking branch 'upstream/branch-24.10' into cast-python…
yczhang-nv Sep 6, 2024
dddd3c8
fix CI
yczhang-nv Sep 6, 2024
c4a7095
Merge remote-tracking branch 'upstream/branch-24.10' into complete-re…
yczhang-nv Sep 6, 2024
9f30383
revert format
yczhang-nv Sep 6, 2024
38e8f9c
Merge remote-tracking branch 'origin/cast-python-tensor-memory-to-cpp…
yczhang-nv Sep 6, 2024
32ec933
fix CI
yczhang-nv Sep 6, 2024
f43f10a
try to minimize CI errors
yczhang-nv Sep 6, 2024
c74f51f
Update ransomware pipeline to use ControlMessage
yczhang-nv Sep 6, 2024
0876026
Merge branch 'branch-24.10' into complete-remove-multi-message
yczhang-nv Sep 6, 2024
9ed5e32
Merge branch 'complete-remove-multi-message' of github.com:yczhang-nv…
yczhang-nv Sep 6, 2024
ae33cab
remove comments
yczhang-nv Sep 6, 2024
84d179d
fix CI
yczhang-nv Sep 6, 2024
b33e128
fix format
yczhang-nv Sep 6, 2024
1776efc
Merge branch 'branch-24.10' into complete-remove-multi-message
yczhang-nv Sep 7, 2024
a848845
fix doc
yczhang-nv Sep 9, 2024
4115cc9
fix merge conflict
yczhang-nv Sep 9, 2024
8201199
Revert "Merge remote-tracking branch 'origin/cast-python-tensor-memor…
yczhang-nv Sep 9, 2024
0dc0947
Revert "fic CI format"
yczhang-nv Sep 9, 2024
6656224
Revert "support casting TensorObject from Python to C++ for ControlMe…
yczhang-nv Sep 9, 2024
21e3d2f
fix revert error
yczhang-nv Sep 9, 2024
2023469
fix CI
yczhang-nv Sep 9, 2024
adbc5ca
fix CI
yczhang-nv Sep 9, 2024
6d65a6d
TensorMemory
yczhang-nv Sep 10, 2024
7d8a64b
Cleanup during review
mdemoret-nv Sep 10, 2024
8b0833d
Fixing formatting on pyi files
mdemoret-nv Sep 10, 2024
9a975c1
fix header and formatting issue
yczhang-nv Sep 10, 2024
91474a3
fix docstring
yczhang-nv Sep 10, 2024
ecb6766
fix CI
yczhang-nv Sep 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix ci
  • Loading branch information
yczhang-nv committed Jul 23, 2024
commit 90ab42135e4740992cfd98d74e928c57aa80a0cd
159 changes: 49 additions & 110 deletions examples/log_parsing/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,6 @@
from morpheus.config import Config
from morpheus.config import PipelineModes
from morpheus.messages import ControlMessage
from morpheus.messages import MultiInferenceMessage
from morpheus.messages import MultiInferenceNLPMessage
from morpheus.messages import MultiResponseMessage
from morpheus.messages import TensorMemory
from morpheus.pipeline.stage_schema import StageSchema
from morpheus.stages.inference.triton_inference_stage import TritonInferenceStage
Expand Down Expand Up @@ -60,47 +57,26 @@ class TritonInferenceLogParsing(TritonInferenceWorker):
Determines whether a logits calculation is needed for the value returned by the Triton inference response.
"""

def build_output_message(self, x: MultiInferenceMessage | ControlMessage) -> MultiResponseMessage | ControlMessage:
if isinstance(x, MultiInferenceMessage):
seq_ids = cp.zeros((x.count, 3), dtype=cp.uint32)
seq_ids[:, 0] = cp.arange(x.mess_offset, x.mess_offset + x.count, dtype=cp.uint32)
seq_ids[:, 2] = x.get_tensor('seq_ids')[:, 2]

memory = TensorMemory(
count=x.count,
tensors={
'confidences': cp.zeros((x.count, self._inputs[list(self._inputs.keys())[0]].shape[1])),
'labels': cp.zeros((x.count, self._inputs[list(self._inputs.keys())[0]].shape[1])),
'input_ids': cp.zeros((x.count, x.get_tensor('input_ids').shape[1])),
'seq_ids': seq_ids
})

return MultiResponseMessage(meta=x.meta,
mess_offset=x.mess_offset,
mess_count=x.mess_count,
memory=memory,
offset=0,
count=x.count)
if isinstance(x, ControlMessage):
seq_ids = cp.zeros((x.tensors().count, 3), dtype=cp.uint32)
seq_ids[:, 0] = cp.arange(0, x.tensors().count, dtype=cp.uint32)
seq_ids[:, 2] = x.tensors().get_tensor('seq_ids')[:, 2]

memory = _messages.TensorMemory(
count=x.tensors().count,
tensors={
'confidences': cp.zeros((x.tensors().count, self._inputs[list(self._inputs.keys())[0]].shape[1])),
'labels': cp.zeros((x.tensors().count, self._inputs[list(self._inputs.keys())[0]].shape[1])),
'input_ids': cp.zeros((x.tensors().count, x.tensors().get_tensor('input_ids').shape[1])),
'seq_ids': seq_ids
})

resp = ControlMessage(x)
resp.payload(x.payload())
resp.tensors(memory)
return resp

def _build_response(self, batch: MultiInferenceMessage, result: tritonclient.InferResult) -> TensorMemory:
def build_output_message(self, msg: ControlMessage) -> ControlMessage:
seq_ids = cp.zeros((msg.tensors().count, 3), dtype=cp.uint32)
seq_ids[:, 0] = cp.arange(0, msg.tensors().count, dtype=cp.uint32)
seq_ids[:, 2] = msg.tensors().get_tensor('seq_ids')[:, 2]

memory = _messages.TensorMemory(
count=msg.tensors().count,
tensors={
'confidences': cp.zeros((msg.tensors().count, self._inputs[list(self._inputs.keys())[0]].shape[1])),
'labels': cp.zeros((msg.tensors().count, self._inputs[list(self._inputs.keys())[0]].shape[1])),
'input_ids': cp.zeros((msg.tensors().count, msg.tensors().get_tensor('input_ids').shape[1])),
'seq_ids': seq_ids
})

resp = ControlMessage(msg)
resp.payload(msg.payload())
resp.tensors(memory)
return resp

def _build_response(self, batch: ControlMessage, result: tritonclient.InferResult) -> TensorMemory:

outputs = {output.mapped_name: result.as_numpy(output.name) for output in self._outputs.values()}
outputs = {key: softmax(val, axis=2) for key, val in outputs.items()}
Expand Down Expand Up @@ -161,83 +137,46 @@ def supports_cpp_node(self) -> bool:
return False

def compute_schema(self, schema: StageSchema):
schema.output_schema.set_type(MultiResponseMessage)
schema.output_schema.set_type(ControlMessage)

@staticmethod
def _convert_one_response(output: MultiResponseMessage | ControlMessage, inf: MultiInferenceNLPMessage | ControlMessage,
res: TensorMemory) -> MultiResponseMessage | ControlMessage:
if isinstance(output, MultiResponseMessage):
memory = output.memory

out_seq_ids = memory.get_tensor('seq_ids')
input_ids = memory.get_tensor('input_ids')
confidences = memory.get_tensor('confidences')
labels = memory.get_tensor('labels')

seq_ids = inf.get_id_tensor()

seq_offset = seq_ids[0, 0].item() - output.mess_offset
seq_count = (seq_ids[-1, 0].item() + 1 - seq_offset) - output.mess_offset

input_ids[inf.offset:inf.count + inf.offset, :] = inf.get_tensor('input_ids')
out_seq_ids[inf.offset:inf.count + inf.offset, :] = seq_ids

resp_confidences = res.get_tensor('confidences')
resp_labels = res.get_tensor('labels')

# Two scenarios:
if (inf.mess_count == inf.count):
assert seq_count == res.count
confidences[inf.offset:inf.offset + inf.count, :] = resp_confidences
labels[inf.offset:inf.offset + inf.count, :] = resp_labels
else:
assert inf.count == res.count

mess_ids = seq_ids[:, 0].get().tolist()

for i, idx in enumerate(mess_ids):
confidences[idx, :] = cp.maximum(confidences[idx, :], resp_confidences[i, :])
labels[idx, :] = cp.maximum(labels[idx, :], resp_labels[i, :])

return MultiResponseMessage.from_message(inf, memory=memory, offset=inf.offset, count=inf.mess_count)

if isinstance(output, ControlMessage):
memory = output.tensors()
def _convert_one_response(output: ControlMessage, inf: ControlMessage, res: TensorMemory) -> ControlMessage:
memory = output.tensors()

out_seq_ids = memory.get_tensor('seq_ids')
input_ids = memory.get_tensor('input_ids')
confidences = memory.get_tensor('confidences')
labels = memory.get_tensor('labels')
out_seq_ids = memory.get_tensor('seq_ids')
input_ids = memory.get_tensor('input_ids')
confidences = memory.get_tensor('confidences')
labels = memory.get_tensor('labels')

seq_ids = inf.tensors().get_tensor('seq_ids')
seq_ids = inf.tensors().get_tensor('seq_ids')

seq_offset = seq_ids[0, 0].item()
seq_count = seq_ids[-1, 0].item() + 1 - seq_offset
seq_offset = seq_ids[0, 0].item()
seq_count = seq_ids[-1, 0].item() + 1 - seq_offset

input_ids[0:inf.tensors().count, :] = inf.tensors().get_tensor('input_ids')
out_seq_ids[0:inf.tensors().count, :] = seq_ids
input_ids[0:inf.tensors().count, :] = inf.tensors().get_tensor('input_ids')
out_seq_ids[0:inf.tensors().count, :] = seq_ids

resp_confidences = res.get_tensor('confidences')
resp_labels = res.get_tensor('labels')
resp_confidences = res.get_tensor('confidences')
resp_labels = res.get_tensor('labels')

# Two scenarios:
if (inf.payload().count == inf.tensors().count):
assert seq_count == res.count
confidences[0:inf.tensors().count, :] = resp_confidences
labels[0:inf.tensors().count, :] = resp_labels
else:
assert inf.tensors().count == res.count
# Two scenarios:
if (inf.payload().count == inf.tensors().count):
assert seq_count == res.count
confidences[0:inf.tensors().count, :] = resp_confidences
labels[0:inf.tensors().count, :] = resp_labels
else:
assert inf.tensors().count == res.count

mess_ids = seq_ids[:, 0].get().tolist()
mess_ids = seq_ids[:, 0].get().tolist()

for i, idx in enumerate(mess_ids):
confidences[idx, :] = cp.maximum(confidences[idx, :], resp_confidences[i, :])
labels[idx, :] = cp.maximum(labels[idx, :], resp_labels[i, :])
for i, idx in enumerate(mess_ids):
confidences[idx, :] = cp.maximum(confidences[idx, :], resp_confidences[i, :])
labels[idx, :] = cp.maximum(labels[idx, :], resp_labels[i, :])

resp = ControlMessage(inf)
resp.payload(inf.payload())
resp.tensors(memory)
return resp
resp = ControlMessage(inf)
resp.payload(inf.payload())
resp.tensors(memory)
return resp

def _get_inference_worker(self, inf_queue: ProducerConsumerQueue) -> TritonInferenceLogParsing:
return TritonInferenceLogParsing(inf_queue=inf_queue,
Expand Down
1 change: 0 additions & 1 deletion examples/log_parsing/postprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,6 @@ def _postprocess(self, x: MultiResponseMessage | ControlMessage):
else:
parsed_df[col_name] = ext_parsed[label]


# decode cleanup
parsed_df = self.__decode_cleanup(parsed_df)
parsed_df["doc"] = parsed_dfs.index
Expand Down
68 changes: 32 additions & 36 deletions morpheus/_lib/include/morpheus/stages/deserialize.hpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
/*
* SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: Apache-2.0
* SPDX-FileCopyrightText: Copyright (c) 2021-2024, NVIDIA CORPORATION &
* AFFILIATES. All rights reserved. SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -17,29 +17,23 @@

#pragma once

#include "morpheus/export.h"
#include "morpheus/messages/control.hpp"
#include "morpheus/messages/meta.hpp"
#include "morpheus/messages/multi.hpp"
#include "morpheus/types.hpp" // for TensorIndex
#include "morpheus/utilities/python_util.hpp" // for show_warning_message
#include "morpheus/utilities/string_util.hpp" // for MORPHEUS_CONCAT_STR
#include "morpheus/export.h" // for MORPHEUS_EXPORT
#include "morpheus/messages/control.hpp" // for ControlMessage
#include "morpheus/messages/meta.hpp" // for MessageMeta
#include "morpheus/types.hpp" // for TensorIndex

#include <glog/logging.h>
#include <mrc/segment/builder.hpp>
#include <mrc/segment/object.hpp>
#include <nlohmann/json.hpp>
#include <pybind11/pytypes.h> // for object
#include <pyerrors.h> // for PyExc_RuntimeWarning
#include <pymrc/node.hpp>
#include <rxcpp/rx.hpp>
#include <boost/fiber/context.hpp> // for operator<<
#include <mrc/segment/builder.hpp> // for Builder
#include <mrc/segment/object.hpp> // for Object
#include <nlohmann/json.hpp> // for basic_json, json
#include <pybind11/pytypes.h> // for object
#include <pymrc/node.hpp> // for PythonNode
#include <rxcpp/rx.hpp> // for decay_t, trace_activity, from, observable_member

#include <algorithm> // IWYU pragma: keep for std::min
#include <exception> // for exception_ptr
#include <memory>
#include <sstream> // IWYU pragma: keep for glog
#include <string>
#include <utility> // for pair
#include <memory> // for shared_ptr, unique_ptr
#include <string> // for string
#include <thread> // for operator<<
#include <utility> // for move, pair

namespace morpheus {
/****** Component public implementations *******************/
Expand Down Expand Up @@ -72,7 +66,8 @@ class MORPHEUS_EXPORT DeserializeStage
* @brief Construct a new Deserialize Stage object
*
* @param batch_size Number of messages to be divided into each batch
* @param ensure_sliceable_index Whether or not to call `ensure_sliceable_index()` on all incoming `MessageMeta`
* @param ensure_sliceable_index Whether or not to call
* `ensure_sliceable_index()` on all incoming `MessageMeta`
* @param task Optional task to be added to all outgoing `ControlMessage`s
*/
DeserializeStage(TensorIndex batch_size,
Expand All @@ -98,26 +93,27 @@ class MORPHEUS_EXPORT DeserializeStage
struct MORPHEUS_EXPORT DeserializeStageInterfaceProxy
{
/**
* @brief Create and initialize a DeserializationStage that emits ControlMessage's, and return the result.
* If `task_type` is not None, `task_payload` must also be not None, and vice versa.
* @brief Create and initialize a DeserializationStage that emits
* ControlMessage's, and return the result. If `task_type` is not None,
* `task_payload` must also be not None, and vice versa.
*
* @param builder : Pipeline context object reference
* @param name : Name of a stage reference
* @param batch_size : Number of messages to be divided into each batch
* @param ensure_sliceable_index Whether or not to call `ensure_sliceable_index()` on all incoming `MessageMeta`
* @param ensure_sliceable_index Whether or not to call
* `ensure_sliceable_index()` on all incoming `MessageMeta`
* @param task_type : Optional task type to be added to all outgoing messages
* @param task_payload : Optional json object describing the task to be added to all outgoing messages
* @param task_payload : Optional json object describing the task to be added
* to all outgoing messages
* @return std::shared_ptr<mrc::segment::Object<DeserializeStage>>
*/
static std::shared_ptr<mrc::segment::Object<DeserializeStage>> init(
mrc::segment::Builder& builder,
const std::string& name,
TensorIndex batch_size,
bool ensure_sliceable_index,
const pybind11::object& task_type,
const pybind11::object& task_payload);
static std::shared_ptr<mrc::segment::Object<DeserializeStage>> init(mrc::segment::Builder& builder,
const std::string& name,
TensorIndex batch_size,
bool ensure_sliceable_index,
const pybind11::object& task_type,
const pybind11::object& task_payload);
};


/** @} */ // end of group
} // namespace morpheus
Loading
Loading