Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] log dedup should not dedup number only lines #45485

Merged
merged 3 commits into from
May 24, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
comments
Signed-off-by: hongchaodeng <hongchaodeng1@gmail.com>
  • Loading branch information
hongchaodeng committed May 24, 2024
commit 047b8e7bafde7f6726ddd4fcad6237d2b5ece5a1
35 changes: 27 additions & 8 deletions python/ray/tests/test_log_dedup.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def test_nodedup_logs_single_process():
assert out1 == [batch1]


def test_nodedup_logs_only_canonicalized_lines():
def test_nodedup_logs_buffer_only_lines():
now = 142300000.0

def gettime():
Expand All @@ -32,27 +32,46 @@ def gettime():
# numbers are canonicalised, so this would lead to empty dedup_key
"lines": ["1"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we are testing LogDeduplicator can you add more duplicate lines? like 2 then 2 and all are outputed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
batch2 = {
"ip": "node1",
"pid": 200,
"lines": ["2"],
}

# Immediately prints always.
out1 = dedup.deduplicate(batch1)
assert out1 == [batch1]

now += 1.0
# Should buffer duplicates.

# Should print new lines even if it is number only again
batch2 = {
"ip": "node2",
"pid": 200,
"lines": ["2"],
}
out2 = dedup.deduplicate(batch2)
assert out2 == [
{
"ip": "node1",
"ip": "node2",
"pid": 200,
"lines": ["2"],
}
]

now += 3.0

# Should print new lines even if it is same number
batch3 = {
"ip": "node3",
"pid": 300,
"lines": ["2"],
}
# Should buffer duplicates.
out3 = dedup.deduplicate(batch3)
assert out3 == [
{
"ip": "node3",
"pid": 300,
"lines": ["2"],
}
]


def test_dedup_logs_multiple_processes():
now = 142300000.0
Expand Down
Loading