-
-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RzShell: refactor string, regex and byte search #4919
Open
Rot127
wants to merge
10
commits into
dev
Choose a base branch
from
dist-fuzz-rz-search
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+15,181
−2,008
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This comment was marked as resolved.
This comment was marked as resolved.
39 tasks
This comment was marked as resolved.
This comment was marked as resolved.
…pe annotations. Part 1/9. Likely won't build in between parts. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
This commit adds several improvements, updates and fixes to Unicode related logic. - Update Unicode tables to version 16. - Escaped strings now escape valid Unicode code points to /Uhhhhhh and invalid code points to /xhh. - Generally applies RzStrEscOptions way more consistently. The legacy escape is still used at some places though. - Fix inconsistencies in Unicode decoders/encoders and checkers. They now either return 0 on an invalid decode or the number of bytes the code point requires. - Add many unit tests for Unicode related logic. - Add helpers to check code points. Part 2/9. Likely won't build in between parts. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
This commit changes several settings. The main reason is to have them contained in one search group, and not spread over the search and string group. This becomes important with the search refactor, since the search is now also more contained in a single module and can make use of the more settings. - Remove str.search.max_uni_blocks - Effectively a metric the user should not know about; adds too much complexity. Also not documented. - str.search.encoding -> str.encoding - Valid for all string interpretations. - str.search.max_threads -> search.max_threads - This is a general setting for the search now. - str.search.raw_alignment -> search.str.raw_alignment - Unify settings (only used for RzBin search.). - str.search.min_length -> search.str.min_length - Unify settings. - str.search.buffer_size -> search.str.max_length - Unify settings. - str.search.max_region_size -> search.str.max_region_size - Unify settings. - str.search.check_ascii_freq -> search.str.check_ascii_freq - Unify settings. Part 3/9. Likely won't build in between parts. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
- This commit adds the ability to print any supported string encoding with 'ps' (also EBCDIC). - Adds alias 'psu' for 'ps utf8' - It also allows to select unprintable characters as string delimeter. Part 4/9. Likely won't build in between parts. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
The commit moves the new and legacy search commands to RzShell, adds more details to the search help, deletes some undocumented or unnecessary commands and adds the stubs for the new search handler implementions. Legacy commands still do their string parsing on arguments and are not touched. The new searches (string and bytes) ahve their actual implementation in the following commmits. Renamed and replaced commands: - Renamed '/' -> '/z' - Replaced '/e' -> '/z' or '/xr' - Replaced '/w' - All Unicode is searched now properly with '/z'. Removed commands: - '/!' - Because the command modifiers are not properly handled in RzShell yet and the advantage of this one is dubious. - '/f' - Modifiers are obsolete, because search is dispatched into threads. - '/b' - Modifiers are obsolete, because search is dispatched into threads. - '/+' - Because no idea what it does. Seems not particular useful. Part 5/9. Likely won't build in between parts. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
…ad). Adds the core implementation of the new search. The rough architecture is the following: A search for a certain type of information (strings, bytes, keys etc.) creates a collections of items to search for (byte patterns, regular expressions etc.). Then specifies some settings how the search (number of threads, maxum hits...) and the finding is performed (string length, inverse match etc.). It also defines a search space, which is currently only the IO buffer. But can be anything in the future, like a graphs or the knowledge base. The search splits up the search space into windows (for IO: address ranges) and dispatches each search window into a 'find()' thread. The 'find()' handler (provided by a specific search implementation) checks the given window and produces search hits matching the elements in the search collection. The main search handler collects the hits of the dispatched workers and returns them to the user. Note: The byte and string search implementations are added in the next two commits. Part 6/9. Likely won't build in between parts. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
Adds the byte search implementation of the new search. The normal byte search works just as before. But adds way more examples in the help message and more test cases. Additionally, it adds an regex byte search. Part 7/9. Likely won't build in between parts. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
Adds the new string search implementation, fixes many bugs and makes performance improvements. - Adds support to search reliably for all supported encodings (fixes non-ASCII string search). - Fixes some wrong assumptions what valid code points are (e.g. 0x000000ff is a valid code point in UTF-32/UTF-16 BE). - Adds several '/z' command options how to perform the string search (literal, regex, extended regex, caseless). - Checks every decoded code point for validity to improve correctness. - Improves performance of string decoding by not writing to the heap in all cases. Part 8/9. Likely won't build in between parts. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
de92480
to
e1c4129
Compare
This comment was marked as resolved.
This comment was marked as resolved.
e1c4129
to
4d49f17
Compare
…search refactor. Co-authored-by: wargio <deroad@kumo.xn--q9jyb4c>
4d49f17
to
37a85a5
Compare
notxvilka
approved these changes
Feb 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Your checklist for this pull request
Supersedes #4762
Detailed description
Changes made
/
to/z
./xr
.ps
psu
alias forps utf8
str.search.max_uni_blocks
- Effectively a metric the user should not know about; adds too much complexity.str.search.max_threads
->search.max_threads
- This is a general setting for the search now.str.search.raw_alignment
->search.str.raw_alignment
- Unify settings (only used for RzBin search.).str.search.encoding
->str.encoding
- Valid for all string interpretations.str.search.min_length
->search.str.min_length
- Unify settings.str.search.buffer_size
->search.str.max_length
- Unify settings.str.search.max_region_size
->search.str.max_region_size
- Unify settings.str.search.check_ascii_freq
->search.str.check_ascii_freq
- Unify settings./!
- Because the command modifiers are not properly handled in RzShell yet and the advantage of this one is dubious (IMHO)./f
- Modifier and obsolete, because search is dispatched into threads./b
- Modifier and obsolete, because search is dispatched into threads./+
- Because no idea what it does. Seems not particular useful./e
- Replaced with regex search in bytes and string search./w
- All Unicode is searched now properly with/z
.RzStrEscOptions
were inconsistently used.E.g. show_asciidot (replace non-printable ascii with dot) was ignored for \n, \t etc.
\U00hhhhhh
. All other non-printable bytes are escaped with\xhh
. There are still some exceptions (when legacy escape functions are used) but most places are ok now./Uhhhhhh
(if not requested otherwise by the user) and invalid code points to/xhh
.TODO Overview
Open issues
\t
,\n
etc.).Documentation
Test plan
Tests were added
Closing issues
closes #4910