A fast server-side search engine for the Javadoc API of selected JDK releases and other libraries.
Available at https://javadoc-search.maisikoleni.net
Can also be added to a bowser as custom search engine:
-
Search URL Template:
https://javadoc-search.maisikoleni.net/api/v2/libraries/jdk-latest/search/redirect?query=%s
-
Search Suggestions URL Template:
https://javadoc-search.maisikoleni.net/api/v2/libraries/jdk-latest/search/suggestions?query=%s
Warning
|
The above is a fairly weak server and not suitable for benchmarking. |
You can also replace jdk-latest
which always points to the
latest full release with one of the following:
-
jdk-11
for the JDK 11 API -
jdk-17
for the JDK 17 API -
junit5-latest
for the current JUnit 5 API
The same works for the website in the browser as well.
Realized as Quarkus web application mainly with Qute, RESTEasy and Microstream. The client-side uses Bootstrap and htmx.
The search engine is implemented as a sparse radix trie
generated from the xxx-search-index.js
files of a Javadoc site.
It is also mixed with a suffix tree
for name segments of the Javadoc elements.
The following example shows the trie generated for java.lang.Long
and java.util.OptionalLong
, both in the java.base
module.
The trie will be compressed such that the equally colored nodes point to the same object.
To search in that trie like one can in the official Javadoc implementation, the nodes in the trie are being traversed recursively based on a custom regular expression matcher implementation.
For example, j.u.OpL
gets transformed into a regular expression
equivalent to
j\p{javaLowerCase}*+\s*+•*+\.\s*+u\p{javaLowerCase}*+\s*+•*+\.\s*+Op\p{javaLowerCase}*+L[^•]*+
and gets compiled into:
CompiledRegex [
0x0000006a, // matches 'j'
0xc0000000, // star referencing char class #0 -> "\p{javaLowerCase}"
0xc0000001, // star referencing char class #1 -> "\s"
0x80000378, // star referencing literal '•'
0x0000002e, // matches '.'
0xc0000001, // star referencing char class #1 -> "\s"
0x00000075, // matches 'u'
0xc0000000, // star referencing char class #0 -> "\p{javaLowerCase}"
0xc0000001, // star referencing char class #1 -> "\s"
0x80000378, // star referencing literal '•'
0x0000002e, // matches '.'
0xc0000001, // star referencing char class #1 -> "\s"
0x0000004f, // matches 'O'
0x00000070, // matches 'p'
0xc0000000, // star referencing char class #0 -> "\p{javaLowerCase}"
0x0000004c, // matches 'L'
0xc0000002 // star referencing char class #2 -> "[^•]"
]
These instructions are an array of int
and
they are accompanied by an array of CharPredicates,
which are functions of type char → boolean
and represent the character classes in the custom regex implementation.
The instructions are encoded as follows:
Bit 31 | Bit 30 | Bits 29 … 0 |
---|---|---|
Wrapped by a star? |
Represents a char class? |
|
This compiled custom regex implementation has the disadvantage that it does not support nested groups or repetitions of more than a single character. This is however not an issue for our purpose, as we only need to star single characters.
The advantage, however, is
that all its state can be stored in a single long
,
which can easily be passed around methods
and avoids the generation of objects.
This long
value is composed as follows:
Bit 63 | Bits 62 … 32 | Bit 31 | Bits 30 … 0 |
---|---|---|---|
Did the match fail? |
Number of stars that matched at least one character |
Did the current star already match one character? |
Next index in the instruction array |
As the state is only a long
that is stored by the user
and this possessive matcher implementation can advance the state
step by step consuming a single character,
collecting all matches by traversing the trie
and branching at the transitions is fairly simple.
All of that allows for the following search syntax:
-
Query
ja.l.o
will find:-
java.base/java.lang.Object
-
java.base/java.lang.Override
-
java.base/java.lang.OutOfMemoryError
-
-
Query
LDT
will find:-
java.base/java.time.LocalDateTime
-
java.base/java.time.chrono.ChronoLocalDateTime
-
…
-
-
Query
math max
will find:-
java.base/java.lang.Math.max(double, double)
-
java.base/java.lang.Math.max(long, long)
-
…
-
-
Query
j.u.s.Col.jo
will find:-
java.base/java.util.stream.Collectors.joining()
-
java.base/java.util.stream.Collectors.joining(CharSequence, CharSequence, CharSequence)
-
java.base/java.util.stream.Collectors.joining(CharSequence)
-
-
Query
.C~rs
will find:-
java.base/java.util.stream.Collectors
-
…
-
The search engine implementation is mostly compliant with the Javadoc Search Specification but since major changes got introduced with JDK 19 (search terms can be separated by spaces), I felt free to include custom behavior.
Note
|
One example for that is |
As a result of the trie and simplified regex matching combination, the search is very fast and rarely exceeds 20 ms. Most of the time, search completes in about 1 ms. The longer the search term, the faster the results are available.
The matching results are currently ranked by:
-
Where the match of the query starts.
A match starting at the beginning is the best case, a match starting at or after a separator comes next and matches that start within identifiers (upper-case letters, underscore) have the lowest rank.java.desktop/javax.print.attribute.SetOfIntegerSyntax
is therefore a better match forSet
thanjava.base/java.util.AbstractSet
. -
How good they match the query.
java.base/java.io.File.isHidden()
is a better match forFile.isH
thanjava.base/java.nio.file.Files.isHidden(Path)
because it does not require further lower case characters afterFile
. -
Natural order of the entries.
Results with smaller char values at the same position come first. The implementation is very much likeString.compareTo
on the last segment of the qualified name.
Case insensitive matching is only performed if case-sensitive did not yield any results.
This project requires at least JDK 17.
Building should be straightforward since this is a regular Maven project.
-
mvn clean
to clean up generated artifacts (does not remove thedatabase
folder) -
mvn test
runs all tests -
mvn quarkus:dev
starts the Quarkus dev mode (live reload) -
mvn package
bundles the web application using theprod
profile intotarget/quarkus-app
It might be helpful to add the missing search-index files
in src/main/resources/net/maisikoleni/javadoc/service/jdk-latest
(see jdk-index-files-go-here.txt
there for details), but this is not required
as they are then fetched from the official site and cached in database/javadoc-indexes
.
To run outside Maven, --add-exports java.base/jdk.internal.misc=ALL-UNNAMED
is required
as MicroStream requires Unsafe at the moment.
Javadoc Search was created by Christian Femers in 2022
and the program and the accompanying materials are made available
under the terms of the
European Union Public License 1.2
(see the "LICENSE" file,
SPDX-License-Identifier: EUPL-1.2-or-later
)
with the following exceptions:
-
src/test/java/net/maisikoleni/javadoc/parser/JdkUrlGeneration.java
(GPL-2.0-with-classpath-exception
)