[DOC] add docs for collection forking #5229

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

philipithomas merged 5 commits into main from philipithomas/forking-docs

Aug 8, 2025

+90 −1

Member

philipithomas commented Aug 7, 2025 •

edited

Loading

Documents forking feature for docs.trychroma.com

Preview: https://chroma-git-philipithomas-forking-docs-chromacore.vercel.app/cloud/collection-forking


          [DOC] add docs for collection forking

24dcf01

Contributor

propel-code-bot bot commented Aug 7, 2025 •

edited

Loading

Add Documentation for Chroma Cloud Collection Forking Feature

This pull request introduces comprehensive documentation for the new collection forking feature in Chroma Cloud, describing its copy-on-write storage model, usage patterns, pricing, and quotas/limits. It adds a dedicated "Collection Forking" documentation page, updates relevant references in the pricing and quotas/limits docs, incorporates an explanatory diagram, and ensures discoverability via the sidebar navigation. Several rounds of feedback have been addressed to clarify copy semantics, quota behaviors, correct example code, and stabilize formatting and terminology throughout.

Key Changes

• Added docs/markdoc/content/cloud/collection-forking.md: new page detailing forking semantics, cost, quotas, and intended usage scenarios; includes example code and a workflow diagram.
• Modified docs/markdoc/content/cloud/pricing.md: added a section about forking costs and linked to the forking documentation.
• Updated docs/markdoc/content/cloud/quotas-limits.md: added the fork edges quota (4,096) and a link to further details in the forking documentation.
• Updated docs/markdoc/content/sidebar-config.ts: added Collection Forking to the Cloud doc section navigation sidebar.
• Added/modified diagram assets (e.g., fork-edges-light.png, fork-edges-dark.png) to visually illustrate fork edge quota/structure.

Affected Areas

• Documentation content for Cloud features
• Pricing documentation
• Quota/limits documentation
• Sidebar navigation/configuration
• Image assets for Cloud documentation

This summary was automatically generated by @propel-code-bot

philipithomas requested review from itaismith and HammadB

August 7, 2025 23:08

vercel bot commented Aug 7, 2025 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
chroma	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 8, 2025 8:46pm

vercel bot deployed to Preview

August 7, 2025 23:11

View deployment


          update pricing

3d703ff

propel-code-bot bot reviewed

View reviewed changes

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md

+              # Create a forked collection. Name must be unique within the database.
+              forked_collection = source_collection.fork(name="main-repo-index-pr-1234")
+              # Forked collection is immediately queryable; changes are isolated

Contributor

propel-code-bot bot Aug 7, 2025

[Documentation]

Add missing article for grammatical correctness: change "Forked collection is immediately queryable; changes are isolated" to "The forked collection is immediately queryable; changes are isolated."

itaismith reviewed

View reviewed changes

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated


		Forking lets you create a new collection from an existing one instantly, using copy-on-write under the hood. The forked collection initially shares its data with the source and only incurs additional storage for incremental changes you make afterward.

		{% Banner type="info" %}

Contributor

itaismith Aug 7, 2025

"info" is not supported. It's "note" for yellow, "tip" for blue, and "warn" for red

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated


		{% /TabbedCodeBlock %}

		For a longer end-to-end demo, see the advanced forking example in the Chroma repo: [Forking notebook](https://github.com/chroma-core/chroma/blob/main/examples/advanced/forking.ipynb).

Contributor

itaismith Aug 7, 2025

In this notebook you can find a comprehensive demo, where we index a codebase in a Chroma collection, and use forking to efficiently create collections for new branches.

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated


		## Quotas and errors

		Forking is subject to a limit on the total number of fork edges from the root. This counts every edge in the fork graph from the root collection (e.g., A→B→C is 2; A→[B, C], B→D is 3). The current default limit is 4,096. If you exceed it, the fork request returns a quota error for the `NUM_FORKS` rule — catch it and fall back to creating a new collection with a full copy.

Contributor

itaismith Aug 7, 2025

I think a diagram would be better here

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated

+              ## When to use forking
+              - **Data versioning/checkpointing**: Maintain consistent snapshots as your data evolves.
+              - **Git-like workflows**: For example, index a pull request by forking the main repository’s collection, then apply the diff to the fork. This saves both write and storage costs compared to re-ingesting the entire dataset.

Contributor

itaismith Aug 7, 2025

I would say "index a branch by forking from its divergence point". "pull request" is more specific to GitHub.

Also if this refers specifically to code, change "entire dataset" to "entire codebase"?

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated

+              - **Data versioning/checkpointing**: Maintain consistent snapshots as your data evolves.
+              - **Git-like workflows**: For example, index a pull request by forking the main repository’s collection, then apply the diff to the fork. This saves both write and storage costs compared to re-ingesting the entire dataset.
+              - **Experimentation**: Safely test schema changes, new embedding functions, or cleaning pipelines without touching production data.

Contributor

itaismith Aug 7, 2025

I don't think you'd be able to test a new embedding function since the forked collection should have the same configuration?

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated


		## Notes

		- Forking is within the same database.

Contributor

itaismith Aug 7, 2025

Your forked collections will belong to the same db as the source


          copy updates

ed9ec80

propel-code-bot bot reviewed

View reviewed changes

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated

+              - **Data versioning/checkpointing**: Maintain consistent snapshots as your data evolves.
+              - **Git-like workflows**: For example, index a pull request by forking the main repository’s collection, then apply the diff to the fork. This saves both write and storage costs compared to re-ingesting the entire dataset.
+               - **Git-like workflows**: For example, index a branch by forking from its divergence point, then apply the diff to the fork. This saves both write and storage costs compared to re-ingesting the entire codebase.

Contributor

propel-code-bot bot Aug 8, 2025

[Documentation]

Remove duplicate bullet and fix indentation: two consecutive "Git-like workflows" bullets (lines 68-69) repeat the same heading, and the second one has an extra leading space. Merge or delete one to avoid confusion and ensure proper Markdown rendering.


          update forking docs and add diagram

d962c89

philipithomas requested a review from itaismith

August 8, 2025 20:42

propel-code-bot bot reviewed

View reviewed changes

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated


		## Quotas and errors

		Chroma limits the number of fork edges in your fork tree. Every time you call "fork", a new edge is created from the parent to the child. The count includes edges created by forks on the root collection and on any of its descendants; see the diagram below. The current default limit is 4,096 edges per tree. If you delete a collection, its edge remains in the tree and still counts.

Contributor

propel-code-bot bot Aug 8, 2025

[Documentation]

Remove the leading space at the start of this paragraph to prevent unintended indentation.

vercel bot deployed to Preview

August 8, 2025 20:46

View deployment

itaismith approved these changes

View reviewed changes

HammadB reviewed

View reviewed changes

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated

		@@ -0,0 +1,80 @@
		# Collection Forking

		Collection forking enables instant, zero-copy collection branching in Chroma Cloud.

Collaborator

HammadB Aug 8, 2025

using the word branching is a bit odd, we called forking.

Collaborator

HammadB Aug 8, 2025

i may not say zero-copy AND copy-on-write. I would just say "copy-on-write" everywehre

HammadB reviewed

View reviewed changes

docs/docs.trychroma.com/markdoc/content/cloud/collection-forking.md Outdated

+              Forking lets you create a new collection from an existing one instantly, using copy-on-write under the hood. The forked collection initially shares its data with the source and only incurs additional storage for incremental changes you make afterward.
+              {% Banner type="tip" %}
+              **Forking is available in Chroma Cloud only.** The file system on single-node Chroma does not support forking — see [Single-Node Chroma: Performance and Limitations](../guides/deploy/performance). Chroma Cloud uses block storage that enables true copy-on-write semantics.

Collaborator

HammadB Aug 8, 2025

Linking out to performance and limitations seems a bit odd.
"Chroma Cloud uses block storage that enables true copy-on-write semantics." -> This doesn't quite make sense to me


          integrate feedback

e093c60

Member Author

philipithomas commented Aug 8, 2025

1 Job Failed:

PR checks / Python tests / test-cluster-rust-frontend (3.9, chromadb/test/property/test_add.py)

No logs available for this step.

Summary: 1 successful workflow, 1 failed workflow

❌ PR checks (52 jobs succeeded, 1 job pending, 1 job failed)
✅ Check PR Title (1 job succeeded)

Last updated: 2025-08-08 21:59:20 UTC

HammadB approved these changes

View reviewed changes

philipithomas merged commit fcf8d68 into main

57 of 59 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet