Description
Is your suggestion for improvement related to a problem? Please describe.
Currently, JabRef struggles with libraries that have over 1000 entries (#10209).
Short reason and solution: JabRef stores all information in RAM. JabRef needs a mechanism to manage lots of data. This is a perfect use case for databases!
Longer issue description: look at how JabRef manages libraries and entries:
- Load
.bib
file. - Convert
.bib
file intoBibDatabase
(withBibDatabaseContext
) andBibEntry
. Those are Java objects that are stored in RAM. - Manipulate library with those objects.
- Save those objects into a
.bib
file.
So, JabRef's original philosophy is to be a file editor. However, when you have a giant library, you just don't have enough JVM heap. It is limited.
Describe the solution you'd like
JabRef should have a mechanism for managing a lot of data and use it for storing and manipulating libraries.
This is the purpose of databases! A DBMS will also cache data: a typical DBMS stores data in pages. Some pages are stored in RAM, some are offloaded to disk. This is a perfect solution for giant libraries, as now you are not limited to RAM space, but to space on your HDD/SDD!
Moreover, DBMS allows you to query data fast and powerful. Here is one place where SQL can be used: #10209 (comment). Search functionality is also a perfect case for databases.
Additional context
This is planned as a GSoC project. Beware, while this project is quite important for JabRef, it might turn out to be very complex.
We aim for a Relational DBMS like SQLite, DuckDB, Postgres. Especially, we want a database to be embedded.
In fact, we want Postgres to be our backend, as Postgres has powerful capabilities for search. It can be used as an embedded database, actually; checkout this library: https://github.com/zonkyio/embedded-postgres.
Here are some materials for this project:
- Postgres: https://www.postgresql.org/.
- Other databases you might consider (though, Postgres is preferable):
- DuckDB: https://duckdb.org/ -- seems promising too. Can do JNI and thus could save a process: https://github.com/duckdb/duckdb-java/blob/main/src/jni/duckdb_java.cpp#L25
- SQLite: https://www.sqlite.org/.
- H2: https://www.h2database.com/html/main.html.
- HSQLDB: https://hsqldb.org/.
- BibTeX and BibLaTeX (you can use this information to design the schema of the DB):
- Internals of BibTeX: https://polish-mirror.evolution-host.com/ctan/biblio/bibtex/base/btxdoc.pdf.
- Internals of BibLaTeX: https://mirrors.ibiblio.org/CTAN/macros/latex/contrib/biblatex/doc/biblatex.pdf.
- How Zotero internally stores data: https://github.com/zotero/zotero/blob/main/resource/schema/userdata.sql.
- Use Postgres as an embedded database: https://github.com/zonkyio/embedded-postgres.
- Take a look at JabRef's code:
- Search functionality: https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/model/search/PostgreConstants.java#L6. (It already uses embedded Postgres).
- Shared database: https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/logic/shared/PostgreSQLProcessor.java (schemas, etc.)
Metadata
Metadata
Assignees
Type
Projects
Status