Update kraken2 and add support for custom databases by pvanheus · Pull Request #7257 · galaxyproject/tools-iuc

pvanheus · 2025-09-04T18:20:22Z

FOR CONTRIBUTOR:

I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
License permits unrestricted use (educational + commercial)
This PR adds a new tool or tool collection
This PR updates an existing tool or tool collection
This PR does something else (explain below)

pvanheus · 2025-09-04T18:22:29Z

This both updates the kraken2 tool to version 2.1.6 and adds support for custom databases. The custom databases are provided by directory types and because zip files without a common prefix get unpacked to a directory prefixed by the name of the original dataset (see the galaxy code), it uses some shell code to find the complete path to the custom database.

bernt-matthias · 2025-09-04T19:32:02Z

+                </param>
+            </when>
+            <when value="history">
+                <param type="data" name="custom_database" label="Kraken2 database" format="directory" help="A kraken2 is a directory containing the files hash.k2d, opts.k2d and taxo.k2d"/>


How big are those typically?

Should we subclass directory for that?

We should also add a tool that crates the K2 databases, or?

Gigabyte scale. E.g. the Mycobacterium genus databases from Hall is 7.6 GB (https://zenodo.org/records/8343322). And yes, ideally we need a datatype and tool for creating Kraken2 databases, but there are already several Kraken2 databases available online that I am using for e.g. my M. tuberculosis analyses. I'm happy to work on PRs for the Kraken2 database datatype and the tool that creates them, but I don't think that that work should block this PR.

Would this be of use for many users?

How about just adding this to the data manager?

We are just updating the dm anyway #6980

The DM is very useful... the use case here is in Galaxy's Workflow Landings API, which is oriented around inputs, not DMs. And also, once there is a kraken2-build tool, custom databases.

@mvdbeek when are data manager bundles expected. Would this help here?

Directory datasets are a much better idea

the use case here is in Galaxy's Workflow Landings API

Would this then mean that the user uploads multiple GB "just" for a single workflow run? I have not yet used Galaxy's Workflow Landings API - how is this supposed to be used?

pvanheus · 2025-09-06T06:16:10Z

Some notes on the future kraken2 custom database builder tool:

The kraken2 db building process involves three steps:

Download the NCBI taxonomy (kraken2-build --download-taxonomy or k2 download-taxonomy) to the database directory. This downloads some 7GB of data from ftp.ncbi.nlm.nih.gov/pub/taxonomy, uncompressing to 54GB
Either or both of:
Download library sequences files (kraken2-build --standard / kraken2-build --special / kraken2-build --download-library or k2 download-library) to the DB directory
Add your own FASTA sequence data (kraken2-build --add-to-library or k2 add-to-library)
Build the database (kraken2-build --build or k2 build)
Clean up (remove temporary files)

This gives flexibility in how databases are built, and therefore it makes sense to have tools for (1), and some combination of (2), (3) and (4), balancing the need to not continually re-download the taxonomy files with the need to not pollute the history with intermediate products.

bernt-matthias · 2025-11-23T13:01:01Z

Download the NCBI taxonomy (kraken2-build --download-taxonomy or k2 download-taxonomy) to the database directory. This downloads some 7GB of data from ftp.ncbi.nlm.nih.gov/pub/taxonomy, uncompressing to 54GB

The data is available from the ncbi_taxonomy data table.

The procedure that you outlined is exactly what should happen for building custom entries with the DM https://github.com/galaxyproject/tools-iuc/blob/main/data_managers/data_manager_build_kraken2_database/data_manager/kraken2_build_database.xml

pvanheus · 2026-04-10T05:45:29Z

Just to clarify this most recent work: I have postponed working on the kraken2 database builder and, as per @bernt-matthias' suggestion, will draw on the similar code that is in the data manager. This PR simply updates to the latest kraken2 and adds support for custom databases (via a Directory input).

pvanheus added 2 commits September 4, 2025 12:53

Remove FASTA descriptions

910a335

Update to 2.1.6 and add support for custom databases

f77e3f8

bernt-matthias reviewed Sep 4, 2025

View reviewed changes

bernt-matthias mentioned this pull request Sep 4, 2025

Kraken2: bump #7254

Closed

5 tasks

mvdbeek reviewed Sep 5, 2025

View reviewed changes

Comment thread tool_collections/kraken2/kraken2/kraken2.xml

Add test for providing directory directly

ea73387

pvanheus mentioned this pull request Jan 5, 2026

Give kraken2 100G from 100MB input usegalaxy-eu/infrastructure-playbook#1606

Open

pvanheus mentioned this pull request Jan 14, 2026

Add TB variant analysis workflow galaxyproject/iwc#1062

Open

12 tasks

pvanheus added 5 commits April 9, 2026 19:34

Merge branch 'main' into add_kraken2_from_directory

25a997b

Remove hack for dealing with bug related to flat zip files

29e8e4a

Fix broken XML

6a09618

Return description to test output FASTA

00a213d

Bump kraken2 version suffix

b8c02a8

pvanheus mentioned this pull request Apr 10, 2026

Add kraken2 datatype galaxyproject/galaxy#22455

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update kraken2 and add support for custom databases#7257

Update kraken2 and add support for custom databases#7257
pvanheus wants to merge 8 commits into
galaxyproject:mainfrom
pvanheus:add_kraken2_from_directory

pvanheus commented Sep 4, 2025 •

edited

Loading

Uh oh!

pvanheus commented Sep 4, 2025

Uh oh!

bernt-matthias Sep 4, 2025

Uh oh!

pvanheus Sep 5, 2025

Uh oh!

bernt-matthias Sep 5, 2025

Uh oh!

pvanheus Sep 5, 2025

Uh oh!

bernt-matthias Sep 5, 2025

Uh oh!

mvdbeek Sep 5, 2025

Uh oh!

bernt-matthias Oct 14, 2025

Uh oh!

Uh oh!

Uh oh!

pvanheus commented Sep 6, 2025 •

edited

Loading

Uh oh!

bernt-matthias commented Nov 23, 2025

Uh oh!

pvanheus commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pvanheus commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvanheus commented Sep 4, 2025

Uh oh!

bernt-matthias Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

pvanheus Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bernt-matthias Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

pvanheus Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bernt-matthias Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

mvdbeek Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

bernt-matthias Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pvanheus commented Sep 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bernt-matthias commented Nov 23, 2025

Uh oh!

pvanheus commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pvanheus commented Sep 4, 2025 •

edited

Loading

pvanheus commented Sep 6, 2025 •

edited

Loading