[WIP] Add automatically generated LLM context files #575

elseml · 2025-09-17T14:31:07Z

Summary

TL;DR: LLMs code generation can be sloppy for BayesFlow. This PR introduces automatically generated context files for constraining LLM code generation.

Full summary: This PR introduces a pipeline for automatically generating LLM context markdown files for BayesFlow. The goal is to make LLM assistance more accurate and user-friendly by providing lightweight, up-to-date repository snapshots (aka some very basic RAG) based on gitingest. Users can download a single context file to provide their LLM with the latest BayesFlow state.
This addition targets the broad pip install userbase (power users might just pass their local repo clone to an IDE-integrated LLM, but standard users might not have a local repo clone or use an IDE-integrated LLM). As discussed previously, this is a WIP PR for openly testing and discussing whether this addition is beneficial in lowering the barrier to entry for BayesFlow or if it adds confusion. See llm_context/README.md for further infos.

Key Additions

llm_context/ folder added
- Contains the build script (build_llm_context.py), requirements, and generated context files.
Automatic context file generation
- Produces two files per release:
  - llm_context_full-<TAG>.md → full project snapshot: README + examples + source code (bayesflow/) (~250k tokens).
  - llm_context_compact-<TAG>.md → compact snapshot: README + examples (~45k tokens).
Notebook handling
- Example notebooks (examples/*.ipynb) are automatically converted to Markdown before context file generation.
Automatic cleanup
- Old bayesflow-context-* files in llm_context/ are removed before generating new ones.
GitHub Action integration (currently experimental, not thoroughly tested yet)
- On each release, the workflow builds fresh context files, adds the current release tag in the <TAG> placeholder, and uploads them to the corresponding GitHub release.

Open Questions

Most importantly: Is it useful? It would be great if you could try out both context files and share your experiences here. Extra valuable if you have a pro license for your LLM of choice since this allows us to take the latest LLM capabilities into account and run more tests.
Compact vs. full context files: Do we need two files or can we go with a single one for the most straightforward user experience?
(Implementation: Not polished yet, but I suggest to first discuss whether we want to move on with this at all or not)

Downloads for Pre-Generated Context Files

llm_context_full_dev.md
llm_context_compact_dev.md

(Tagging some people I remember discussing this with @niels-leif-bracher @han-ol @stefanradev93 @vpratz @marvinschmitt @paul-buerkner)

elseml · 2025-09-17T14:36:33Z

I am currently leaning towards going with the compact file only:

It increased the likelihood of generating working BayesFlow code in my preliminary testing (ChatGPT free version);
focuses on the files with the highest information density for actual BayesFlow usage (i.e., tutorials);
and strikes a balance between providing an LLM some concrete, focused guidance while still giving leeway to leverage the increasing agentic/search abilities of LLMs (i.e., looking up BayesFlow source code when needed).

codecov · 2025-09-17T14:47:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

vpratz · 2025-09-17T14:57:06Z

Thanks for opening the discussion. As I already indicated in the previous discussion, I'm skeptical that we do ourselves a favor with that addition and currently opposed to merging this. Besides the general concerns surrounding LLMs, I think our users are better served by working with the tutorials and documentation. If we include files like the above, we suggest to our users that using LLMs is a viable strategy to write code with BayesFlow, and that this is a workflow we are willing to offer support for. Given the widely reported accuracy problems of LLMs, also for tasks like summarization, I'm skeptical that the output is sufficiently accurate to be useful for anyone who is not already familiar with the code.

Also, the tutorials convey important information on how to use ABI and highlight potential issues, so using an LLM as a shortcut might backfire when users no longer encounter this kind of information.

@elseml Could you provide a few non-cherrypicked example interactions of how a user would use an LLM with this setup, assuming they have little prior knowledge of BayesFlow, and the results they would obtain? If possible, please use tasks that are not part of the documentation. This would help me judge what kind of code we would expect to see.

… BF1 context)

elseml · 2025-09-19T13:56:35Z

Hi Valentin, thanks for bringing in your concerns! I fully agree about the reliability problems of LLMs. However, I am pretty convinced that the a substantial share of BF users already uses LLM assistance and the usage of LLM assistance in coding will continue to increase. Given these assumptions, the feature is for me mainly about equipping BF users with tools for mitigating the damage of hallucinations (i.e., by providing the LLMs with a better grounding in the BF code). A bit in the direction of companies hosting their own internal LLM instances to prevent data leakage due to employees already using LLMs for their tasks.

But yeah, this WIP PR is meant as an open testing ground, so we might very well conclude after more testing that, even when providing context, current LLMs are still not accurate enough yet to be of any use for BF users. If we indeed decide to proceed with this feature, I would support adding a prominent disclaimer highlighting these concerns, emphasizing that LLM assistance is supplementary to, rather than a replacement for, consulting the tutorials.

Concerning example interactions: Here are two trials for the frequent use case of evidence accumulation modeling (only testing the compact context file since ChatGPT upload limitations are pretty strict):
Without context
With compact (=tutorials only) context file

Some observations:

Without providing context, ChatGPT derailed pretty quickly from the typical BayesFlow workflow, mixing up lots of stuff and especially not being aware of the latest BF2 API despite looking up the documentation (this might also be a problem for the compact context with tutorials only, where the LLM is instructed to look up the source code if needed).
With compact context provided, adherence to the BF2 API improved a lot despite less CoT reasoning. Still, there were some errors (e.g., sometimes suggesting TimeSeriesNetwork for exchangeable observations). The generated DDM simulator placeholder is quite peculiar, but I think that is okay since it is an explicit placeholder and we are interested in the BF part here. After simple copy-pasting of two TypeError tracebacks the code was in a runnable state until the pandas DataFrame creation at the end. Of course, this run might just have been a lucky exception.
I also tested combining a pyddm simulator with BayesFlow, where using the pyddm API already failed in all conditions. Not totally surprising since the LLM is not given pyddm context in any condition.
(Also, previously the BF1 code in From_BayesFlow_1.1_to_2.0.ipynb had a pretty big effect on LLM code generation, so it is now excluded from the context files. I will update the context files provided here accordingly)

I think more constrained applications (e.g., debugging / asking specialized questions about existing code like in the forum) might be better test cases than full code generation from scratch. I will test that out as well when upload limits allow. As stated above, I would also be very interested in seeing which other use cases other people come up with.

elseml and others added 10 commits September 16, 2025 17:37

Add initial llm context drafts

6311514

Add llm context minimal working version

0694609

Exclude experimental folders from llm context

e88f858

Cleanup llm context generation

defdb94

Switch from repomix to gitingest for llm context generation

130492e

Update llm context readme

9d16499

Improve llm context documentation

eca094d

Remove --exclude-pattern gitingest arg to fix Windows bug

e7800cb

Add info for developers

1d23637

Update README

ed0741e

elseml added feature New feature or request draft Draft Pull Request, Work in Progress labels Sep 17, 2025

Add file exclusion (e.g., From_BayesFlow_1.1_to_2.0.ipynb to minimize…

e630219

… BF1 context)

Fix code style adherence

f47cbf3

elseml mentioned this pull request Sep 22, 2025

log_gamma diagnostic tests failing frequently #576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add automatically generated LLM context files #575

[WIP] Add automatically generated LLM context files #575

Uh oh!

elseml commented Sep 17, 2025 •

edited

Loading

Uh oh!

elseml commented Sep 17, 2025

Uh oh!

codecov bot commented Sep 17, 2025 •

edited

Loading

Uh oh!

vpratz commented Sep 17, 2025

Uh oh!

elseml commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Add automatically generated LLM context files #575

Are you sure you want to change the base?

[WIP] Add automatically generated LLM context files #575

Uh oh!

Conversation

elseml commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Additions

Open Questions

Downloads for Pre-Generated Context Files

Uh oh!

elseml commented Sep 17, 2025

Uh oh!

codecov bot commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

vpratz commented Sep 17, 2025

Uh oh!

elseml commented Sep 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

elseml commented Sep 17, 2025 •

edited

Loading

codecov bot commented Sep 17, 2025 •

edited

Loading