Having Quarto (or Pandoc) use more cores / threads ? #1149
-
Hello Quarto, I'm wondering if there is a way to make Quarto (or Pandoc) use more threads/cores ? I'm currently compiling a document to HTML (from R), and it takes more than 3 hours. The document is not that long: 2700 lines (more than half of those being text). There are no complex computations (the knitting in itself takes only a few minutes), but once it's done knitting, it will take close to 3 hours to output the HTML document (which is roughly 15MB in size). During that whole time, only ~5% of my CPU is used by the Deno virtual machine, so I was wondering if there was a way to have it use more CPU resources to speed up the process ? Thanks :) |
Beta Was this translation helpful? Give feedback.
Replies: 12 comments 32 replies
-
Using more cores isn't the problem here and none of the operations that transform .md to .html are fundamentally parallelizable. It's possible depending on what's in your document that its waiting on network IO? I think to really help here we need to get a reproducible example then we can dig into exactly what's happening. |
Beta Was this translation helpful? Give feedback.
-
@jjallaire Thanks for your input ! The document is a "tutorial" on data wrangling. It lists equivalences between You can see it on RPubs. There are a few hidden code chunks at the beginning, that define a custom Edit: currently trying to compile this document with Quarto 0.9.449 on Ubuntu 20.04 (WSL2), and the last time I compiled it was on Windows 11 (don't remember the version of quarto, should have been around 0.9.350). |
Beta Was this translation helpful? Give feedback.
-
What happens if you don't use Making a self contained HTML can take time as everything will be encoded in the single HTML file. It is also quite heavy (15MB in size for a HTML file is heavy.) Without self-contained, the file will be lighter, and it could be quicker, but you'll have external resources probably and/or you'll need online access. Can you try that ? As there is a lot outputs (like gt tables), this could also come from all our processing after the computation step to support different features of Quarto. 🤔 |
Beta Was this translation helpful? Give feedback.
-
Our Lua filters do apply patterns to HTML tables so that they can apply labels and captions from YAML. What happens if you remove any labels or captions you have? @cscheid I don't think this can just be O(n) in the print method if knitr is completing fairly early in the render. |
Beta Was this translation helpful? Give feedback.
-
Here's some more data as I run it. I created a "manageably bad" version of this file by rendering only the first 500 lines or so. I get the following overall timing:
That version with Even smaller version
So it's not |
Beta Was this translation helpful? Give feedback.
-
Tagging everyone here to let you know about the most recent development: using As I mentioned in the linked comment, should we move this to a specific GH issue (since I didn't think it was an issue when I opened this Q&A) ? Also, thank you all for your reactivity & helpful suggestions. This is the best 'getting help to solve an issue on an open-source software' experience I've ever had ! |
Beta Was this translation helpful? Give feedback.
-
@ma-riviere Take a look at some of the workarounds I found here: #1152 (comment). We're fixing some of the problems in quarto and upstream libraries, but a full treatment will unfortunately take some time. |
Beta Was this translation helpful? Give feedback.
-
Here's a self-contained example: https://github.com/kwstat/quarto_test The .Rmd file renders to pdf in 0:51. The .qmd file renders to pdf in 3:10. |
Beta Was this translation helpful? Give feedback.
-
Also note that Quarto xelatex by default and R Markdown uses pdflatex by default. xelatex is known to be slower, but supports UTF-8 characters and use of local system fonts. You can override this default with: pdf-engine: pdflatex |
Beta Was this translation helpful? Give feedback.
-
I updated to the latest daily RStudio 2022.07.2-559. .Rmd to .pdf:0:00 Start .qmd to .pdf0:00 Start It doesn't look like xelatex or pdflatex is to blame. Maybe the new pandoc filters are slow??? |
Beta Was this translation helpful? Give feedback.
-
OK, I installed RStudio Daily 2022.11.0-daily+101 qmd to pdf0:00 Start Again, md creation is fast and pdf creation time is okay, but in-between is slow, perhaps pandoc is the problem? |
Beta Was this translation helpful? Give feedback.
-
Just an update on the timings. Also, deno.exe seems to take a long time. Maybe my corporate-managed laptop has security software that is scanning deno.exe when it runs (or some such thing). Using this test case (creates 480 page pdf): https://github.com/kwstat/quarto_test With this configuration : Windows 10. RStudio 2022.12.0 Build 353, R 4.2.2 I am still seeing slow rendering times for qmd files:
|
Beta Was this translation helpful? Give feedback.
@ma-riviere Take a look at some of the workarounds I found here: #1152 (comment). We're fixing some of the problems in quarto and upstream libraries, but a full treatment will unfortunately take some time.