Skip to content

Last docs updates and release #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 34 additions & 1 deletion docs/vcf2zarr/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,17 @@ convert your data, basically providing different levels of
convenience and flexibility corresponding to what you might
need for small, intermediate and large datasets.

:::{warning}
The documentation of vcf2zarr is under development, and
some bits are more polished than others. This "tutorial"
is experimental, and will likely evolve into a slightly
different format in the near future. It is
a work in progress and incomplete. The
{ref}`sec-vcf2zarr-cli-ref` should be complete
and authoritative, however.
:::


## Small dataset

The simplest way to convert VCF data to Zarr is to use the
Expand Down Expand Up @@ -229,11 +240,33 @@ granularity). You should be careful to use this value in your scripts


Once ``dexplode-init`` is done and we know how many partitions we have,
we need to call ``dexplode-partition`` this number of times.
we need to call
{ref}`dexplode-partition<cmd-vcf2zarr-dexplode-partition>` this number of times:

```{code-cell}
vcf2zarr dexplode-partition sample-dist.icf 0
vcf2zarr dexplode-partition sample-dist.icf 1
vcf2zarr dexplode-partition sample-dist.icf 2
```

This is not how it would be done in practise of course: you would
use your cluster scheduler of choice to dispatch these operations.
:::{todo}
Document how to do this conveniently over some popular schedulers.
:::

:::{tip}
Use the ``--one-based`` argument in cases in which it's more convenient
to index the partitions from 1 to n, rather than 0 to n - 1.
:::

Finally we need to call
{ref}`dexplode-finalise<cmd-vcf2zarr-dexplode-finalise>`:
```{code-cell}
vcf2zarr dexplode-finalise sample-dist.icf
```

:::{todo}
Document the process for dencode, noting the information output about
memory requirements.
:::
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ dependencies = [
]
requires-python = ">=3.9"
classifiers = [
"Development Status :: 3 - Alpha",
"Development Status :: 4 - Beta",
"License :: OSI Approved :: Apache Software License",
"Operating System :: POSIX",
"Operating System :: POSIX :: Linux",
Expand Down
Loading