Skip to content

A simple benchmark to benchmark ways to see about quoting fields in CSVs

License

Michael-S/silly_java_string_search

Repository files navigation

silly java string search

A simple benchmark to benchmark ways to see about quoting fields in CSVs

See the included LICENSE.md for licensing.

I'm working on a patch to a CSV-handling library in the OpenSearch project SQL plugin, and I'm curious about the performance in Java (since OpenSearch is written in Java) of searching potentially millions of Java strings for a quote character (usually, but not always, "), a separator character (usually, but not always, ,), and carriage return \r and new line (aka line feed) \n.

I plan for this to run on Unixes that have bash and a recent-ish JDK. I'm not using a traditional Java project management tool like Maven or Gradle because this is simple enough not to need that kind of structure.

On my machine (Elementary Linux 8, laptop with Intel i7-11800h, nvme drive):

This runs a simple benchmark.
You need javac and java.
Compiling...
Generating Data...
Starting benchmark!
---------------------
Benchmarking SearchByContains...
Average time: 152.624 ms
Count: 229179

Benchmarking SearchByManualLoop...
Average time: 459.023 ms
Count: 229179

Benchmarking SearchByRegex...
Average time: 399.361 ms
Count: 229179

Benchmarking SearchByArray...
Average time: 310.899 ms
Count: 229179

Benchmarking SearchByStream...
Average time: 151.256 ms
Count: 229179

Benchmarking SearchByParallelStream...
Average time: 27.151 ms
Count: 229179

Benchmarking SearchByLineStream...
Average time: 635.126 ms
Count: 229179

Benchmarking SearchByLineCPStream...
Average time: 1090.798 ms
Count: 229179

About

A simple benchmark to benchmark ways to see about quoting fields in CSVs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •