Skip to content

Conversation

@winder
Copy link
Contributor

@winder winder commented Jul 12, 2021

Summary

It's possible for users to put null characters in the asset name, asset unit, and asset URL. The indexer should handle these valid transactions in some sensible way.

Test Plan

New unit tests.

@codecov-commenter
Copy link

codecov-commenter commented Jul 12, 2021

Codecov Report

Merging #567 (69aba11) into develop (8980d9e) will increase coverage by 2.74%.
The diff coverage is 75.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #567      +/-   ##
===========================================
+ Coverage    46.39%   49.14%   +2.74%     
===========================================
  Files           24       24              
  Lines         3899     3913      +14     
===========================================
+ Hits          1809     1923     +114     
+ Misses        1818     1702     -116     
- Partials       272      288      +16     
Impacted Files Coverage Δ
idb/postgres/internal/encoding/encode.go 85.36% <66.66%> (-7.74%) ⬇️
idb/postgres/postgres.go 45.26% <100.00%> (+5.60%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8980d9e...69aba11. Read the comment docs.

@winder winder changed the title Support assets with null characts in strings. Support assets with null characters in strings. Jul 12, 2021
Copy link
Contributor

@brianolson brianolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably always do like the '...ForQuery' variant and escape backslashes and then zeros; then the desanitize() function will actually do the right thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to replace existing backslashes before adding backslashes (this needs to happen beforre line 67)

Copy link
Contributor Author

@winder winder Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically to search for \u0000, if you don't escape the slash it doesn't find anything.

I'm simplifying this to be just return strings.ReplaceAll(str, "\x00", "\\\\u0000").

I think that all other unicode characters would be converted by the json encoder with a single backslash, postgres must convert them into the unicode character.

@winder winder force-pushed the will/null-char-support branch from a779179 to ac7f352 Compare July 12, 2021 18:32
@winder winder requested a review from brianolson July 12, 2021 18:55
case '\\':
newlen += 2
default:
newlen+= csize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space before +=

}

// UnescapeNulls is the inverse function of EscapeNulls.
// UnescapeNulls converts \\ and \uXXXX back into their unescaped form but may not be fully general for input not generated by EscapeNulls().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the method is correct - but relying on the fact that it was correctly generated by EscapeNulls() is an issue. It need to be refactored to return an error in case it's not possible to unescape.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is all only used internally to store a few fields and read them back, then there is no problem?

input: "has >\000< nu\\ll",
expected: `has >\u0000< nu\ll`,
query: `has >\\u0000< nu\ll`,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
},
},
{
name: "already escaped null",
input: "has >\\u0000< nu\\ll",
expected: `has >\u0000< nu\ll`,
query: `has >\u0000< nu\ll`,
},

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but then that's the case where desanitize(expected) != input


As of April 2020, storing all the raw blocks is about 100 GB and the PostgreSQL database of transactions and accounts is about 1 GB. Much of that size difference is the Indexer ignoring cryptographic signature data; relying on `algod` to validate blocks. Dropping that, the Indexer can focus on the 'what happened' details of transactions and accounts.

Postgres should be configured to use UTF-8 encoding.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add (this is the default).

xb := []byte(x)

escapenull := []byte("\\u0000")
var out strings.Builder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is worse. The prior implementation will produce less memory garbage by doing exactly one allocation of exactly the right size. Why replace the better implementation with a worse implementation? (I know, readability, but the better implementation already existed so moving backwards is frustrating.)

}

// UnescapeNulls is the inverse function of EscapeNulls.
// UnescapeNulls converts \\ and \uXXXX back into their unescaped form but may not be fully general for input not generated by EscapeNulls().
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is all only used internally to store a few fields and read them back, then there is no problem?

@winder
Copy link
Contributor Author

winder commented Jul 20, 2021

Replaced by #577

@winder winder closed this Jul 20, 2021
@winder winder deleted the will/null-char-support branch June 22, 2022 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants