Add a blog post for LIKE optimizations #8576

xumingming · 2024-01-27T10:22:54Z

No description provided.

netlify · 2024-01-27T10:22:58Z

✅ Deploy Preview for meta-velox ready!

Name	Link
🔨 Latest commit	`9d2dfd8`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65bc375bdfe85b0008d561b9
😎 Deploy Preview	https://deploy-preview-8576--meta-velox.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

xumingming · 2024-01-27T10:23:08Z

cc @mbasmanova

lingbin · 2024-01-28T16:38:41Z

website/blog/2024-01-27-like-optimization.mdx

+relaxed patterns that are not so straightforward:
+
+- `hello_velox%`: matches inputs that start with 'hello', followed by any character, then followed by 'velox'.
+- `hello_velox%`: matches inputs that end with 'hello', followed by any character, then followed by 'velox'.


typo: hello_velox% -> %hello_velox ?

Nice catch! Fixed.

mbasmanova

@xumingming James, thank you for writing this nice. It reads very well.

@pedroerp Pedro, would you also take a look?

facebook-github-bot · 2024-02-01T11:58:13Z

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-02-01T17:08:56Z

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

aditi-pandit

Thanks @xumingming for this blog post. It is very useful.

Had some nits in the writing.

aditi-pandit · 2024-02-01T19:45:54Z

website/blog/2024-01-27-like-optimization.mdx

+
+## What is LIKE?
+
+<a href="https://prestodb.io/docs/current/functions/comparison.html#like">LIKE</a> is a very useful operation,


Nit: Use multiple short sentences instead of commas.

"LIKE is a very useful SQL operator. It is used to do string pattern matching. The following examples for LIKE usage are from the Presto doc:"

aditi-pandit · 2024-02-01T19:46:45Z

website/blog/2024-01-27-like-optimization.mdx

+
+- Use `%` to match zero or more characters.
+- Use `_` to match exactly one character.
+- If we need to match `%` and `_` literally, we can specify escape char to escape them.


Nit : specify "an" escape character

aditi-pandit · 2024-02-01T19:48:13Z

website/blog/2024-01-27-like-optimization.mdx

+into Velox's function call, e.g. `name LIKE '%b%'` is translated to
+`like(name, '%b%')`. Internally Velox converts the pattern string into a regular
+expression and then uses regular expression library <a href="https://github.com/google/re2">RE2</a>
+to do the pattern matching. RE2 is a very good regular expression library, it is fast


Same here : Use full stop between the sentences "RE2 is a very good regular expression library. It is fast and safe, which gives Velox LIKE function a good performance."

aditi-pandit · 2024-02-01T19:49:54Z

website/blog/2024-01-27-like-optimization.mdx

+expression and then uses regular expression library <a href="https://github.com/google/re2">RE2</a>
+to do the pattern matching. RE2 is a very good regular expression library, it is fast
+and safe which gives Velox LIKE a good performance. But some popularly used simple patterns
+can be optimized to use simple C++ string functions to implement directly,


Nit : "can be optimized using direct simple C++ string functions instead of regex."

aditi-pandit · 2024-02-01T19:51:48Z

website/blog/2024-01-27-like-optimization.mdx

+and safe which gives Velox LIKE a good performance. But some popularly used simple patterns
+can be optimized to use simple C++ string functions to implement directly,
+e.g. Pattern `hello%` matches inputs that start with `hello`, which can be implemented by
+memory comparing the prefix bytes of inputs:


Nit : " can be implemented by direct memory comparison of prefix ('hello' in this case) bytes of input"

aditi-pandit · 2024-02-01T19:55:15Z

website/blog/2024-01-27-like-optimization.mdx

+Although these patterns look similar to previous ones, but they are not so straightforward
+to optimize, `_` here matches any single character, we can not simply use memory comparison to
+do the matching. And if user's input is not pure ASCII, `_` might match more than one byte which
+makes the implementation even more complex. And also note that the patterns above are just for


Nit : "Also note that the above patterns are just for illustrative purposes. Actual patterns in practice can be more complex."

aditi-pandit · 2024-02-01T19:58:13Z

website/blog/2024-01-27-like-optimization.mdx

+}
+```
+
+Here `cursor` is the index in the input we are trying to match, `unicodeCharLength` is


Maybe format this as follows

Here :

'cursor' is the index in the input we are trying to match.

'unicodeCharLength' ....

So the logic is basically repeatedly....

aditi-pandit · 2024-02-01T19:58:48Z

website/blog/2024-01-27-like-optimization.mdx

+a function which wraps utf8proc function to determine how many bytes current character consists of,
+so the logic is basically repeatedly calculate size of current character and skip it.
+
+It seems not that complex, but we should note that this logic is not effective for pure ASCII input,


End sentence here.

aditi-pandit · 2024-02-01T20:01:54Z

website/blog/2024-01-27-like-optimization.mdx

+so the logic is basically repeatedly calculate size of current character and skip it.
+
+It seems not that complex, but we should note that this logic is not effective for pure ASCII input,
+for pure ASCII input, every character is one byte, to match a sequence of `_`, we don't need to


Nit sentence :
"Every character is one byte in pure ASCII input. So to match a sequence of '', we don't need to calculate the size of each character and compare in a for-loop. Infact, we don't need to explicitly match '' for pure ASCII input as all. We can use the following logic instead:"

aditi-pandit · 2024-02-01T20:02:20Z

website/blog/2024-01-27-like-optimization.mdx

+}
+```
+
+It only matches the kLiteralString pattern at the right position of the inputs, `_` is automatically


End this sentence with full-stop.

xumingming · 2024-02-02T00:29:55Z

@aditi-pandit Thanks for the review, made corresponding changes to all the comments.

facebook-github-bot · 2024-02-02T00:43:38Z

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-02-02T01:17:50Z

@mbasmanova merged this pull request in 3004e34.

conbench-facebook · 2024-02-02T01:41:56Z

Conbench analyzed the 1 benchmark run on commit 3004e349.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Summary: Pull Request resolved: facebookincubator#8576 Reviewed By: Yuhta, kgpai Differential Revision: D53308906 Pulled By: mbasmanova fbshipit-source-id: 31a1efe0d5472ccc9f2a1c81602e402d2f8c8e8a

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 27, 2024

lingbin reviewed Jan 28, 2024

View reviewed changes

Add a blog post for LIKE optimizations

b18effe

xumingming force-pushed the like_opt_blog branch from 91cbba7 to b18effe Compare January 31, 2024 12:25

mbasmanova requested review from bikramSingh91 and pedroerp February 1, 2024 11:53

mbasmanova approved these changes Feb 1, 2024

View reviewed changes

mbasmanova requested review from majetideepak and aditi-pandit February 1, 2024 12:00

Fix some typos

5dcd2f1

aditi-pandit reviewed Feb 1, 2024

View reviewed changes

Yuhta approved these changes Feb 1, 2024

View reviewed changes

Optimizing the sentences

9d2dfd8

xumingming force-pushed the like_opt_blog branch from 4a3769d to 9d2dfd8 Compare February 2, 2024 00:29

facebook-github-bot closed this in 3004e34 Feb 2, 2024

facebook-github-bot added the Merged label Feb 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a blog post for LIKE optimizations #8576

Add a blog post for LIKE optimizations #8576

xumingming commented Jan 27, 2024

netlify bot commented Jan 27, 2024 •

edited

Loading

xumingming commented Jan 27, 2024

lingbin Jan 28, 2024

xumingming Jan 29, 2024

mbasmanova left a comment

facebook-github-bot commented Feb 1, 2024

facebook-github-bot commented Feb 1, 2024

aditi-pandit left a comment

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

aditi-pandit Feb 1, 2024

xumingming commented Feb 2, 2024

facebook-github-bot commented Feb 2, 2024

facebook-github-bot commented Feb 2, 2024

conbench-facebook bot commented Feb 2, 2024


		## What is LIKE?

		<a href="https://prestodb.io/docs/current/functions/comparison.html#like">LIKE</a> is a very useful operation,

Add a blog post for LIKE optimizations #8576

Add a blog post for LIKE optimizations #8576

Conversation

xumingming commented Jan 27, 2024

netlify bot commented Jan 27, 2024 • edited Loading

✅ Deploy Preview for meta-velox ready!

xumingming commented Jan 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbasmanova left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 1, 2024

facebook-github-bot commented Feb 1, 2024

aditi-pandit left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xumingming commented Feb 2, 2024

facebook-github-bot commented Feb 2, 2024

facebook-github-bot commented Feb 2, 2024

conbench-facebook bot commented Feb 2, 2024

netlify bot commented Jan 27, 2024 •

edited

Loading