Skip to content

Commit

Permalink
[yugabyte#13782] ysql: Import Build de-escaped JSON strings in larger…
Browse files Browse the repository at this point in the history
… chunks during lexing

Summary:
Upstream commit was 3838fa269c15706df2b85ce2d6af8aacd5611655

commit message was :
```
During COPY BINARY with large JSONB blobs, it was found that half
the time was spent parsing JSON, with much of that spent in separate
appendStringInfoChar() calls for each input byte.

Add lookahead loop to json_lex_string() to allow batching multiple bytes
via appendBinaryStringInfo(). Also use this same logic when de-escaping
is not done, to avoid code duplication.

Report and proof of concept patch by Jelte Fennema, reworked by Andres
Freund and John Naylor

Discussion: https://www.postgresql.org/message-id/CAGECzQQuXbies_nKgSiYifZUjBk6nOf2%3DTSXqRjj2BhUh8CTeA%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/flat/PR3PR83MB0476F098CBCF68AF7A1CA89FF7B49@PR3PR83MB0476.EURPRD83.prod.outlook.com
```
This revision also includes upstream commit d3117fc1a3e87717a57be0153408e5387e265e1b

    Fix out-of-bounds read in json_lex_string

    Commit 3838fa269 added a lookahead loop to allow building strings multiple
    bytes at a time. This loop could exit because it reached the end of input,
    yet did not check for that before checking if we reached the end of a
    valid string. To fix, put the end of string check back in the outer loop.

    Per Valgrind animal skink

For the dataset from author, the performance gain is 18%.

Test Plan: Existing test suite

Reviewers: smishra, plee

Reviewed By: plee

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D19199
  • Loading branch information
tedyu committed Sep 14, 2022
1 parent b83fea8 commit 20e75df
Showing 1 changed file with 31 additions and 10 deletions.
41 changes: 31 additions & 10 deletions src/postgres/src/backend/utils/adt/jsonapi.c
Original file line number Diff line number Diff line change
Expand Up @@ -661,6 +661,7 @@ json_lex_string(JsonLexContext *lex)
{
char *s;
int len;
char *const end = lex->input + lex->input_length;
int hi_surrogate = -1;

if (lex->strval != NULL)
Expand All @@ -681,13 +682,6 @@ json_lex_string(JsonLexContext *lex)
}
else if (*s == '"')
break;
else if ((unsigned char) *s < 32)
{
/* Per RFC4627, these characters MUST be escaped. */
/* Since *s isn't printable, exclude it from the context string */
lex->token_terminator = s;
return JSON_ESCAPING_REQUIRED;
}
else if (*s == '\\')
{
/* OK, we have an escape character. */
Expand Down Expand Up @@ -828,14 +822,41 @@ json_lex_string(JsonLexContext *lex)
}

}
else if (lex->strval != NULL)
else
{
char *p;
if (hi_surrogate != -1)
return JSON_UNICODE_LOW_SURROGATE;

appendStringInfoChar(lex->strval, *s);
}
/*
* Skip to the first byte that requires special handling, so we
* can batch calls to appendBinaryStringInfo.
*/
for (p = s; p < end; p++)
{
if (*p == '\\' || *p == '"')
break;
else if ((unsigned char) *p < 32)
{
/* Per RFC4627, these characters MUST be escaped. */
/*
* Since *p isn't printable, exclude it from the context
* string
*/
lex->token_terminator = p;
return JSON_ESCAPING_REQUIRED;
}
}

if (lex->strval != NULL)
appendBinaryStringInfo(lex->strval, s, p - s);

/*
* s will be incremented at the top of the loop, so set it to just
* behind our lookahead position
*/
s = p - 1;
}
}

if (hi_surrogate != -1)
Expand Down

0 comments on commit 20e75df

Please sign in to comment.