-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability and maintainability for big bib-files #371
Comments
I am afraid it's more complex that this, firstly, biber doesn't save state by design to make it easier to automate wit things like People tend to see that standard |
@proteusGIT probably knows about it and PLK already mentions it, but in everyday operations a tool like |
In my experience, the extra time is only a problem if the same An easy way to avoid this is simply to compute the MD5 hash of the |
To obtain only the bib entries used in a given document you can use the bibget tool. |
|
Thank you for pointing it out. |
I've similarly written a tool like My tool just uses a dumb regex to decide what the boundaries of entries are. I'm not sure if I'm missing some correctness argument that justifies the extra processing that biber does here, but otherwise could this maybe be considered? |
@thomwiggers The "dumb regex" is fine and will work for your files, but is likely to fail on file formatting conventions or some special cases of other users. |
@dspinellis sure, parsing a non-regular language with a regex is asking for trouble. But that doesn't mean a simpler parser could filter out irrelevant entries before biber applies all of its processing power. |
|
@plk Running with |
That does require changing the LaTeX file to switch between Regarding not wanting to implement it: fair enough; after a bit of digging it definitely doesn't seem like a trivial change. |
I am having the same problem. I write a lot of documents, and I like keeping my bibliography unified between them. Thus, I made a git repo containing my main While BibTex takes less than half a second to process it, biber takes several seconds, eating up a significant part of the overall compilation time. Now, I understand that biber does a lot more stuff that BibTex and that it is not written in C. However, it would me nice to have a preliminary parsing to keep only the relevant entries before actually processing them to speed up the compilation time. For instance, by using a first parsing which extracts the relevant lines and put them into a second, temporary, In all cases, it is quite annoying, and I do not see alternatives to achieve the following goals:
If you have any idea, please let me know. FYI, to put things into perspectives, it is quite common that I write relatively small documents (i.e. 20-50 pages max) having well over one hundred citations. |
I have a bib file with 1086 entries and biber requires 6 seconds to process it, which is alot w.r.t. the overall compile time of my document.
I propose to add the following features to biber.
biber should (probably based on some parameter) check first whether the list of used keys has changed: if not, biber should abort immediately.
biber should (probably based on some parameter) export a bib file containing the used references only.
Then, if only references have been removed, biber should (probably based on some parameter) reuse the small bib file containing the used references only to generate the bbl file.
The text was updated successfully, but these errors were encountered: