Skip to content

Parallel Computing - using one .qmd to render n output files - cannot open the connection #5861

Open
@marianklose

Description

@marianklose

Bug description

Thanks a lot for your work, quarto is awesome! My goal is to use one single .qmd file and render it n times in parallel on a HPC cluster (via a SLURM job array or MPI process) to create n output files. This could be e.g. parameterized or also (like in my case) just executing the same .qmd n times.

However, only one output file is created: Output created: test1.html

For the rest (e.g. test2.html) I get an error:

Error in readLines(con, warn = FALSE) : cannot open the connection
Calls: .main ... partition_yaml_front_matter -> grep -> is.factor -> read_utf8 -> readLines
In addition: Warning message:
In readLines(con, warn = FALSE) :
  cannot open file 'test.qmd': No such file or directory
Execution halted

It seems that the first / quickest executed run is able to be rendered correctly and then there seem to be conflicting processes with reading/writing the .qmd file.

Steps to reproduce

The .qmd content doesn't really matter at this point, but here is an example:

---
title: "Test"
author: "Test"
date: now
format:
  html:
    toc: true
---

# Session Information

```{r}
# display session info
sessionInfo()
```

My shell script (exe.sh) looks like this:

#!/bin/bash

#SBATCH --job-name=quarto_test                  
#SBATCH --ntasks=1      
#SBATCH --cpus-per-task=4                      
#SBATCH --mem-per-cpu=4096                      
#SBATCH --time=00:01:00                         
#SBATCH --qos=standard                          

module add R/4.2.2-foss-2022b         

cd Scripts
                             
quarto render test.qmd --output test${SLURM_ARRAY_TASK_ID}.html

The idea is to pass ${SLURM_ARRAY_TASK_ID} to the --output option to create n individual output files. On the cluster I then call

[xxx@curta 01_exe]$ sbatch --array=1-2 exe.sh

to execute the same .qmd file 2 times to create test1.html and test2.html.

Expected behavior

I would expect that the .qmd file remains untouched during the rendering process and that there shouldn't occur reading/writing issues.

For the code example I would expect that the same .qmd file gets executed 2 times to create test1.html and test2.html. It would be nice to have support for parallel computing of quarto documents without needing to create n copies of the .qmd file to avoid this error.

Actual behavior

The first job within the array converges, the other jobs throw the error mentioned in the bug description. My workaround so far is to create n copies of the .qmd files and name then test1.qmd, test2.qmd, .... and then use within the shell script:

quarto render test${SLURM_ARRAY_TASK_ID}.qmd

which works fine but is not the best way, especially when having a large number of runs.

Your environment

Platform: x86_64-pc-linux-gnu (64-bit) running on a HPC cluster
Running under: CentOS Linux 7 (Core)

Quarto check output

[✓] Checking versions of quarto binary dependencies...
Pandoc version 3.1.1: OK
Dart Sass version 1.55.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
Version: 1.3.361
Path: /home/mklose/opt/quarto-1.3.361/bin

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
Version: 3.10.8
Path: /trinity/shared/easybuild/software/Python/3.10.8-GCCcore-12.2.0/bin/python3
Jupyter: (None)

  Jupyter is not available in this Python installation.
  Install with python3 -m pip install jupyter

[✓] Checking R installation...........OK
Version: 4.2.2
Path: /trinity/shared/easybuild/software/R/4.2.2-foss-2022b/lib64/R
LibPaths:
- /home/mklose/R/x86_64-pc-linux-gnu-library/4.2
- /trinity/shared/easybuild/software/R/4.2.2-foss-2022b/lib64/R/library
knitr: 1.42
rmarkdown: 2.20

[✓] Checking Knitr engine render......OK

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions