-
Notifications
You must be signed in to change notification settings - Fork 95
Support assets with null characters in strings. #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #567 +/- ##
===========================================
+ Coverage 46.39% 49.14% +2.74%
===========================================
Files 24 24
Lines 3899 3913 +14
===========================================
+ Hits 1809 1923 +114
+ Misses 1818 1702 -116
- Partials 272 288 +16
Continue to review full report at Codecov.
|
brianolson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should probably always do like the '...ForQuery' variant and escape backslashes and then zeros; then the desanitize() function will actually do the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to replace existing backslashes before adding backslashes (this needs to happen beforre line 67)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically to search for \u0000, if you don't escape the slash it doesn't find anything.
I'm simplifying this to be just return strings.ReplaceAll(str, "\x00", "\\\\u0000").
I think that all other unicode characters would be converted by the json encoder with a single backslash, postgres must convert them into the unicode character.
a779179 to
ac7f352
Compare
| case '\\': | ||
| newlen += 2 | ||
| default: | ||
| newlen+= csize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add space before +=
| } | ||
|
|
||
| // UnescapeNulls is the inverse function of EscapeNulls. | ||
| // UnescapeNulls converts \\ and \uXXXX back into their unescaped form but may not be fully general for input not generated by EscapeNulls(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the method is correct - but relying on the fact that it was correctly generated by EscapeNulls() is an issue. It need to be refactored to return an error in case it's not possible to unescape.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is all only used internally to store a few fields and read them back, then there is no problem?
| input: "has >\000< nu\\ll", | ||
| expected: `has >\u0000< nu\ll`, | ||
| query: `has >\\u0000< nu\ll`, | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| }, | |
| }, | |
| { | |
| name: "already escaped null", | |
| input: "has >\\u0000< nu\\ll", | |
| expected: `has >\u0000< nu\ll`, | |
| query: `has >\u0000< nu\ll`, | |
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but then that's the case where desanitize(expected) != input
|
|
||
| As of April 2020, storing all the raw blocks is about 100 GB and the PostgreSQL database of transactions and accounts is about 1 GB. Much of that size difference is the Indexer ignoring cryptographic signature data; relying on `algod` to validate blocks. Dropping that, the Indexer can focus on the 'what happened' details of transactions and accounts. | ||
|
|
||
| Postgres should be configured to use UTF-8 encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add (this is the default).
| xb := []byte(x) | ||
|
|
||
| escapenull := []byte("\\u0000") | ||
| var out strings.Builder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is worse. The prior implementation will produce less memory garbage by doing exactly one allocation of exactly the right size. Why replace the better implementation with a worse implementation? (I know, readability, but the better implementation already existed so moving backwards is frustrating.)
| } | ||
|
|
||
| // UnescapeNulls is the inverse function of EscapeNulls. | ||
| // UnescapeNulls converts \\ and \uXXXX back into their unescaped form but may not be fully general for input not generated by EscapeNulls(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is all only used internally to store a few fields and read them back, then there is no problem?
|
Replaced by #577 |
Summary
It's possible for users to put null characters in the asset name, asset unit, and asset URL. The indexer should handle these valid transactions in some sensible way.
Test Plan
New unit tests.