Skip to content

Commit

Permalink
Observable framework notebook
Browse files Browse the repository at this point in the history
Sets up a notebook with content based on the previous jupyter notebook.
See the added notebook/README.md for more details including how to
deploy the notebook to GitHub pages.
  • Loading branch information
jameshadfield committed Mar 7, 2024
1 parent 8cafc40 commit c31e66a
Show file tree
Hide file tree
Showing 9 changed files with 3,214 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ stats.json
# temporary directories useful for local development
scratch/

# used as a separate git workdir for gh-pages branch
/dist/

# Downloaded remote files from sources we expect
/nextstrain-ncov-private/
/nextstrain-data/
Expand Down
22 changes: 22 additions & 0 deletions ingest/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -141,4 +141,26 @@ rule copy_ingest_files:
"""
cp {input.sequences} {output.sequences}
cp {input.aligned} {output.aligned}
"""


rule align_unrotated:
"""
Align all genomes before rotation for parsing by our notebook.
This rule must be called explicitly, it is not part of the DAG to produce the outputs of `rule all`
"""
input:
sequences = "data/curated-sequences.fasta",
output:
alignment = "data/curated-sequences.aligned.fasta",
params:
dataset = config['nextclade_dataset'],
threads: 4
shell:
"""
nextclade run \
-j {threads} --silent --replace-unknown \
--input-dataset {params.dataset} \
--output-fasta {output.alignment} \
{input.sequences}
"""
5 changes: 5 additions & 0 deletions notebook/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.DS_Store
dist/
docs/.observablehq/cache/
node_modules/
yarn-error.log
49 changes: 49 additions & 0 deletions notebook/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Nextstrain Hepatitis-B (HBV) builds

This is an [Observable Framework](https://observablehq.com/framework) project.
The notebook generated by this is uploaded to GitHub pages


### Environment / Dependencies
You'll need nodejs installed in the same environment as python3, ideally in the
same environment you are using for the bioinformatic workflows. This is because
some of the notebook's data loaders (`./docs/data`) are written in python. Then
install the notebook dependencies via
```
npm ci
```

### Running the notebook locally
```
npm run dev
```

### Deploying to GitHub Pages

GitHub pages serves this notebook at [nextstrain.github.io/hbv](https://nextstrain.github.io/hbv). The deploy process is inspired by [this gist](https://gist.github.com/renatoathaydes/75fcf8c5104134ae112f367d5e4f3f50).

> The current deploy process is not automated as building the notebook requires data from the ingest pipeline, but as we automate these pipelines we can also automate the notebook build + github pages deployment

**prerequisites**

Set up a git worktree to track the `gh-pages` branch, for example:

```
git worktree add -B gh-pages dist origin/gh-pages
```

The files inside `./dist` represent the top-level files for the repo on the (orphaned) `gh-pages` branch.

**deploy**

```sh
# build the notebook
cd notebook && npm run build && cd -
# copy the files to the gh-pages branch (worktree)
rm -rf dist/* && cp -r notebook/dist/* ./dist/
# (optional) preview the build
npx http-server dist
# commit & push the gh-pages branch
cd dist && git add . && git commit -m 'notebook rebuild for github pages' && git push && cd ..
```
10 changes: 10 additions & 0 deletions notebook/docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
toc: false
---

# Nextstrain Hepatitis-B (HBV) builds


<div class="caution">
This repo, including this notebook, is a work in progress and should not be considered scientifically valid at the current point in time
</div>
85 changes: 85 additions & 0 deletions notebook/docs/rotated.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
title: Rotating genomes
---

# Alignment lengths

HBV is a circular genome and so the genomes found on NCBI have been linearised.
The following plot shows the % of the genome which aligns to the reference (in black), where around 1000 genomes seem to only have 50% matching.
If we rotate the genomes in order to all start at the same place then we can improve this (coloured dots).

Genomes with over 90% match: ${numAbove(rotatedMatchCount, 90)} (rotated) vs ${numAbove(rawMatchCount, 90)} (NCBI)

Genomes with over 80% match: ${numAbove(rotatedMatchCount, 80)} vs ${numAbove(rawMatchCount, 80)}

```js
display(Plot.plot({
marginTop: 20,
marginRight: 20,
marginBottom: 30,
marginLeft: 60,
grid: true,
inset: 10,
// aspectRatio: 0.6,
width: 1000,
figure: true,
color: {legend: true},
className: "largerFont",
x: { tickSize: 15, label: "Genomes"},
y: { tickSize: 15, label: "Aln match (%)"},
marks: [
Plot.frame(),
Plot.dot(rawMatchCount, {x: "idx", y: "count", stroke: "black"}),
Plot.dot(rotatedMatchCount, {x: "idx", y: "count", stroke: "genotype", fill: "genotype", symbol: "circle", r: 3}),
Plot.crosshair(rotatedMatchCount, {x: "idx", y: "count", stroke: "genotype"}),
]
}))
```
<style type="text/css">
.largerFont {
font-size: 16px;
}
</style>



```js
const parseMetadata = async () => {
return Object.fromEntries(
(await FileAttachment("data/metadata.tsv").tsv())
.map((row) => [row.name, row])
)
}
const metadata = await parseMetadata()
```



```js
const refLength = 3182;
const parseAlignment = async (attachment) => {
return Object.entries(
await attachment.json()
)
.map(([name, count]) => ({name, count}))
.sort((a, b) => a.count > b.count ? -1 : 1)
.map((d, idx) => ({...d, count: d.count/refLength*100}))
.map((d, idx) => ({...d, idx}))
.map((d) => ({...d, genotype: metadata[d.name]?.genotype_genbank}))
}
```
```js
const rotatedMatchCount = await parseAlignment(FileAttachment("data/alignment_rotated.json"))
const rawMatchCount = await parseAlignment(FileAttachment("data/alignment_raw.json"))
// console.log("rotatedMatchCount", rotatedMatchCount.slice(0, 10))
// console.log("rawMatchCount", rawMatchCount.slice(0, 10))
```
```js
const numAbove = (dataset, perc) => {
return dataset.filter((d) => d.count > perc).length
}
```
28 changes: 28 additions & 0 deletions notebook/observablehq.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// See https://observablehq.com/framework/config for documentation.
export default {
// The project’s title; used in the sidebar and webpage titles.
title: "Nextstrain Hepatitis-B (HBV) builds",

// The pages and sections in the sidebar. If you don’t specify this option,
// all pages will be listed in alphabetical order. Listing pages explicitly
// lets you organize them into sections and have unlisted pages.
// pages: [
// {
// name: "Examples",
// pages: [
// {name: "Dashboard", path: "/example-dashboard"},
// {name: "Report", path: "/example-report"}
// ]
// }
// ],

// Some additional configuration options and their defaults:
// theme: "default", // try "light", "dark", "slate", etc.
// header: "", // what to show in the header (HTML)
// footer: "Built with Observable.", // what to show in the footer (HTML)
// toc: true, // whether to show the table of contents
// pager: true, // whether to show previous & next links in the footer
// root: "docs", // path to the source root for preview
// output: "dist", // path to the output root for build
// search: true, // activate search
};
Loading

0 comments on commit c31e66a

Please sign in to comment.