- Overview
- Usage
- Benchmarks Overview
- Performance Comparison
- Article
In this document you'll find information on how to use the project in terms of how to run it in your local machine, a brief description of each benchmark's pipeline, and a performance comparison in speedup relative to Java's Stream API.
To run these benchmarks on you local machine just run:
mvn clean install
And then:
java -jar target/benchmarks.jar
Benchmarks the allMatch()
operation in the different sequence types.
Collection Sizes: [10, 1000, 100 000]
Pipeline:
sequenceOfEvenIntegers -> allMatch(isEven)
Every is an operation that, based on a user defined predicate, tests if all the elements of a sequence match between corresponding positions. So for instance, if we have:
Stream<String> seq1 = Stream.of("505", "Amsterdam", "Mural");
Stream<String> seq2 = Stream.of("505", "Amsterdam", "Mural");
Stream<String> seq3 = Stream.of("Mural", "Amsterdam", "505");
BiPredicate<String, String> pred = (s1, s2) -> s1.equals(s2);
Then:
every(seq1, seq2, pred); // returns true
every(seq1, seq3, pred); // returns false
For every
to return true, every element of each sequence must match in the same index.
The every
feature can be achieved through a pipeline of zip
-allMatch
operations,
such as:
seq1.zip(seq2, pred::test).allMatch(Boolean.TRUE::equals);
Benchmarked for: [Class, Integer, String, Randomly generated String]
Collection Sizes: [10, 1000, 100 000]
Pipeline:
sourceSequence + copyOfSourceSequence -> zip(Object::Equals) -> allMatch(Boolean.TRUE::equals)
The find
between two sequences is an operation that, based on a user defined
predicate, finds two elements that match, returning one of them in the process.
So if we have:
Stream<String> seq1 = Stream.of("505", "Amsterdam", "Mural");
Stream<String> seq2 = Stream.of("400", "Amsterdam", "Stricken");
Stream<String> seq3 = Stream.of("Amsterdam", "505", "Stricken");
BiPredicate<String, String> pred = (s1, s2) -> s1.equals(s2);
Then:
find(seq1, seq2, pred); // returns "Amsterdam"
find(seq1, seq3, pred); // returns null
find(seq2, seq3, pred); // returns "Stricken"
For find
to return an element, two elements of each sequence must match in the
same index. Here's what an implementation of find
would look like with the support
of a zip
:
zip(seq1, seq2, (t1, t2) -> predicate.test(t1, t2) ? t1 : null)
.filter(Objects::nonNull)
.findFirst()
.orElse(null);
}
Benchmarked for: [Class, Integer, String]
Collection Sizes: [10, 1000, 100 000]
This pipeline has two ways of determining which element will be the matching element:
- For String sequences, the matching element will be on a fixed index, using the
following criteria:
-
For collections with more than 100 elements, the matching index will correspond to COLLECTION_SIZE / 100.
(e.g: for a COLLECTION_SIZE of 100K the pipeline will match with the 1000th element) -
Otherwise, the matching index will correspond to COLLECTION_SIZE / 10.
(e.g: for a COLLECTION_SIZE of 100 elements the pipeline will match with the 10th element)
-
- For all sequences (including String), the match index increments with each iteration
to make sure there are matches in every index of the sequence.
(e.g: On the 1st iteration the match index will be 0, on the 2nd it will be 1, etc... )
Pipeline:
sourceSequence1 + sourceSequence2 ->
-> zip((e1, e2) -> Object.equals(e1, e2) ? t1 : null) ->
-> filter(Objects::nonNull) ->
-> findFirst()
Benchmarks the usage of the findFirst()
operator. This benchmark was run
with three types of Sequences:
- One where the match would be found in the first element.
- One where the match would be found in the middle.
- One where the match would be found in the last element.
Collection Sizes: [10, 1000, 100 000]
Pipeline:
sequenceOfAllButOneEvenIntegers -> filter(isOdd) -> findFirst()
Benchmarks the usage of flatMap
in conjunction with a reduce
.
Pipeline:
nestedSequenceOfIntegers -> flatMap() -> reduce(sum)
Benchmarks zipping a sequence of prime Integers with instances of the class Value.
Collection Sizes: [10, 1000, 100 000]
Pipeline:
(sequenceOfIntegers -> filter(isPrime)) + sequenceOfValues -> zip(Pair::with) -> forEach(bh::consume)
Benchmarks creating two different sequences, one consisting of the top 50 Artists (provided by Last.fm) of each non english speaking country (provided by REST Countries) and the other the exact same thing but for the top 50 Tracks. Then zipping both sequences into a Trio of Country, First Artist and First Track and retrieving the distinct elements by Artist.
Pipelines:
- Sequence of Artists:
sequenceOfCountries -> filter(isNonEnglishSpeaking) -> filter(hasArtists) -> map(Pair.of(country, artists));
- Sequence of Tracks:
sequenceOfCountries -> filter(isNonEnglishSpeaking) -> filter(hasTracks) -> map(Pair.of(country, tracks));
- Pipeline
sequenceOfArtists + sequenceOfTracks ->
-> zip(Trio.of(country, topArtist, topTrack)) ->
-> distinctBy(artist) ->
-> forEach(bh::consume)
[Filter] - Artists who are in a Country's top ten who also have Tracks in the same Country's top ten Benchmark - zip(artists, tracks)
Benchmarks creating two different sequences, one consisting of the top 50 Artists (provided by Last.fm) of each non english speaking country (provided by REST Countries) and the other the exact same thing but for the top 50 Tracks. Then, for each Country, we take the top ten Artists and top ten Track artist's names and zip them into a Trio. After, the top ten artists are filtered by their presence in the top ten Track artist's name, returning a Pair with the Country and the resulting Sequence.
Pipelines:
- Sequence of Artists:
sequenceOfCountries -> filter(isNonEnglishSpeaking) -> filter(hasArtists) -> map(Pair.of(country, artists));
- Sequence of Tracks:
sequenceOfCountries -> filter(isNonEnglishSpeaking) -> filter(hasTracks) -> map(Pair.of(country, tracks));
- Pipeline
sequenceOfArtists + sequenceOfTracks ->
-> zip(Trio.of(country, artists, tracks)) ->
-> map(Pair.of(country, artists)) ->
-> forEach(bh::consume)
For these benchmarks we created two custom operations oddLines
and collapse
. Both these operations are quite simple
oddLines
simply lets through elements in odd indexes of the sequence it is applied to, while collapse
collapses
repeating elements into one, the following snippet exemplifies:
Sequence<String> days = Sequence.of(
"2020-05-08",
"2020-05-09",
"2020-05-10",
"2020-05-11"
);
Sequence<String> temperatures = Sequence.of( "22ºC", "23ºC", "23ºC", "24ºC", "22ºC", "22ºC", "21ºC" );
days.oddLines().forEach(System.out::println);
// Output
// "2020-05-09"
// "2020-05-11"
temperatures.collapse().forEach(System.out::print);
// Output
// "22ºC", "23ºC", "24ºC", "22ºC", "21ºC"
We then queried WorldWeatherOnline for the weather of Lisbon, Portugal between the dates of 2020-05-08 and
2020-11-08, which left us with a .csv
file, that we manipulated with the operations above in three different queries:
- Query number of temperature transitions
- Query max temperature
- Query number of distinct temperatures
In these benchmarks, like the ones from REST Countries and
Last.fm, a part of the pipeline is common between all three. This is due to the fact that
they all manipulate the same set of data. The next snippet shows the common to part, we first filter
all the comments
from the .csv
file, then skip
one line that has "Not available" written in it, then we use oddLines to let through
only the hourly info, and finally map to the temperature on that line.
Sequence.of(weatherData)
.filter(s -> s.charAt(0) != '#') // Filter comments
.skip(1) // Skip line: Not available
.oddLines() // Filter hourly info
.mapToInt(line -> parseInt(line.substring(14, 16))) // Map to Temperature data
From here on out these three queries are quite easy to explain.
Benchmarks querying the number of temperature transitions we simply take the common part of the pipeline and add the collapse
operation, leaving us with all the transitions, followed by a count
.
Sequence.of(weatherData)
.filter(s -> s.charAt(0) != '#') // Filter comments
.skip(1) // Skip line: Not available
.oddLines() // Filter hourly info
.mapToInt(line -> parseInt(line.substring(14, 16))) // Map to Temperature data
.collapse()
.count();
Benchmarks retrieving the highest temperature in the data set by adding the operation max
to the common part of the pipeline.
Sequence.of(weatherData)
.filter(s -> s.charAt(0) != '#') // Filter comments
.skip(1) // Skip line: Not available
.oddLines() // Filter hourly info
.mapToInt(line -> parseInt(line.substring(14, 16))) // Map to Temperature data
.max()
Benchmark querying the distinct number of temperatures by applying distinct
to common part of the pipeline, followed by a count
.
Sequence.of(weatherData)
.filter(s -> s.charAt(0) != '#') // Filter comments
.skip(1) // Skip line: Not available
.oddLines() // Filter hourly info
.mapToInt(line -> parseInt(line.substring(14, 16))) // Map to Temperature data
.distinct()
.count()
The results presented here were based on the results attained from Github Actions, they are presented in relation to Java's Stream performance. For the actual results check the Github Actions Section.
Notes:
- Java's Stream performance is equivalent to 1, all results are presented in relation to it
- For Pipelines where collection sizes vary, only 1k results and 100k results will be
displayed, separated in a pipe format, like so:
- 1K results | 100K results
Benchmark | Time Complexity | Jayield | Sek | Kotlin Sequence | Eclipse Collections | jOOλ | StreamEx | Vavr |
---|---|---|---|---|---|---|---|---|
All Match | Linear | 2.4 | 2.4 | 1.7 | 1.6 | 2.8 | 2.4 | 1.6 | 1.9 | 1.1 | 1.0 | 1.0 | 0.9 | 0.0 | 0.0 |
Collapse | Linear | 3.5 | 1.5 | 2.2 | 1.2 | 1.0 | 0.4 | 0.4 |
First In the Beginning | Constant | 1.3 | 1.3 | 2.7 | 2.8 | 11.2 | 11.8 | 0.1 | 0.0 | 0.5 | 0.5 | 1.1 | 1.0 | 1.9 | 1.9 |
First In the End | Linear | 0.5 | 0.6 | 4.3 | 4.0 | 5.7 | 7.1 | 3.0 | 3.0 | 1.0 | 1.0 | 0.9 | 1.1 | 0.1 | 0.1 |
First In the Middle | Linear | 0.6 | 0.6 | 4.0 | 5.1 | 6.1 | 7.1 | 2.7 | 2.0 | 1.0 | 1.0 | 0.9 | 1.0 | 0.1 | 0.1 |
FlatMap and Reduce | Linear | 1.1 | 1.1 | 1.1 | 1.3 | 1.2 | 1.4 | 0.6 | 0.6 | 0.4 | 0.5 | 1.1 | 1.3 | 0.1 | 0.1 |
Max | Linear | 1.3 | 0.5 | 0.7 | 0.5 | 0.5 | 0.2 | 0.1 |
Distinct Temperatures | Linear | 1.2 | 0.6 | 0.8 | 0.6 | 0.9 | 0.2 | 0.2 |
Benchmark | Time Complexity | Zipline | Jayield | Sek | Kotlin Sequence | Eclipse Collections | jOOλ | StreamEx | Vavr | Guava over Stream | Protonpack over Stream |
---|---|---|---|---|---|---|---|---|---|---|---|
Every Class | Linear | 1.4 | 1.2 | 6.7 | 1.2 | 4.6 | 2.2 | 5.0 | 2.5 | 4.6 | 3.9 | 0.7 | 1.0 | 1.4 | 1.1 | 0.1 | 0.2 | 1.0 | 1.0 | 2.0 | 1.4 |
Every Integer | Linear | 1.4 | 1.2 | 6.5 | 1.2 | 5.2 | 4.1 | 5.1 | 4.2 | 4.1 | 4.0 | 0.8 | 0.7 | 1.4 | 1.4 | 0.1 | 0.1 | 1.0 | 1.1 | 2.3 | 2.2 |
Every Random String | Linear | 1.3 | 1.3 | 5.3 | 1.2 | 4.9 | 3.9 | 5.1 | 3.0 | 4.3 | 2.6 | 0.8 | 0.6 | 1.4 | 1.3 | 0.1 | 0.1 | 1.1 | 1.1 | 1.3 | 1.1 |
Every String | Linear | 1.2 | 1.3 | 6.2 | 1.2 | 4.6 | 4.1 | 4.7 | 2.7 | 4.0 | 2.6 | 0.7 | 0.8 | 1.3 | 1.5 | 0.1 | 0.1 | 1.0 | 1.1 | 1.2 | 1.1 |
Find Class | Linear | 1.2 | 1.0 | 0.6 | 1.2 | 3.3 | 1.3 | 3.1 | 1.3 | 1.3 | 1.4 | 0.8 | 0.8 | 1.3 | 1.0 | 0.1 | 0.5 | 1.1 | 1.0 | 1.1 | 1.4 |
Find Fixed Index | Constant | 1.1 | 1.1 | 0.7 | 0.7 | 2.9 | 4.0 | 3.4 | 3.1 | 0.3 | 0.2 | 0.6 | 0.8 | 1.4 | 1.2 | 0.2 | 0.1 | 1.0 | 1.0 | 1.4 | 1.1 |
Find Integer | Linear | 1.1 | 1.1 | 0.6 | 0.8 | 4.1 | 2.1 | 4.6 | 2.2 | 1.4 | 1.3 | 0.7 | 0.9 | 1.4 | 1.2 | 0.1 | 0.4 | 1.0 | 1.0 | 2.1 | 1.3 |
Find String | Linear | 1.0 | 1.0 | 0.5 | 1.2 | 2.9 | 1.2 | 2.7 | 1.4 | 1.0 | 1.6 | 0.8 | 0.9 | 1.3 | 1.0 | 0.1 | 0.5 | 1.0 | 1.0 | 0.8 | 1.5 |
Filter | Linear | 1.0 | 1.0 | 1.3 | 1.2 | 1.3 | 0.8 | 0.9 | 0.4 | 1.0 | 1.0 |
Zip Primes | Linear | 1.4 | 1.0 | 1.4 | 1.1 | 1.5 | 1.2 | 1.3 | 1.1 | 1.7 | 1.1 | 1.0 | 1.0 | 1.1 | 1.1 | 0.3 | 0.7 | 1.0 | 1.1 | 1.1 | 1.0 |
Distinct | Linear | 1.1 | 1.3 | 1.9 | 1.9 | 1.2 | 0.8 | 0.9 | 0.7 | 1.0 | 1.0 |
(1) Corresponds to "Distinct Top Artist and Top Track by Country Benchmark"
(2) Corresponds to Artists who are in a Country's top ten who also have Tracks"
in the same Country's top ten Benchmark
If you want to read a more in depth analysis on some of these benchmarks, check out our article over at DZONE.