Skip to content

Authoring Recipes

Adam Pingel edited this page Oct 31, 2024 · 26 revisions

General Requirements

For information about contributing to this repo, code of conduct guidelines, etc., see the community CONTRIBUTING and Code of Conduct guides.

Checklist

See the pull request template for a thorough list of checks that should be completed before a PR will be merged.

Cooking Analogy

A "cookbook" is composed of "recipes". In this version of the Granite Code Cookbook, a "recipe" is a Python notebook.

Under the implied cooking analogy, there are three key defining elements of a good recipe:

1. State ingredients up-front

The "ingredients" and "tools" of the first point mean the data and software at hand.

2. Be straightforward to reproduce efficiently

The second point is straightforward to map into the technical space: The reader should be able to run the cells of the notebook sequentially and the same result as what was published in the original notebook. Our objective is to make these recipes reproducible in 15 minutes or less. This may inform the decisions about recipe granularity.

3. Result in something delicious to eat

The third point is more subtle, and is what sets this kind of writing apart. In a technical sense, "something delicious to eat" means a system that demonstrates useful functionality in such a way that it sets the reader on the path to adopting it in their environment. It clearly articulates the business value achieved by the resulting system.

Style Guide

The basic case is a list of python cells, each of which has a markdown cell preceding it. The markdown cells should begin with an imperative phrase (a command), and some description of what and how the following python does, and how it fits in to the overall flow.

Longer docs may need multiple sections. In that case, the section name should be a gerund phrase at an H2 level (##), with commands as imperative phrases at H3 (###).

Clear Outputs

In general, cell outputs should not be checked in with the recipe notebook. You can clear outputs while in Jupyter notebook.

Execution counts (execution_count in the notebook json) should be nulled out as well. This can be done by clearing outputs using a fresh (just-restarted) kernel.

Pre-commit hook

Each cookbook repository has a git pre-commit hook called nbstripout that will clear the notebook cell outputs and execution counts for you upon commit. You can activate this for a given repository by running pre-commit install in the root directory of the cloned repo. (You might need to install it first: pip install pre-commit.)

Once activated, the nbstripout hook will clear outputs from each staged notebook and write the modified notebook over your working copy. Your commit will fail if there are staged files with uncleared outputs, but you can git add the modified files, then re-issue the commit, and the hook will pass.

Disable or bypass pre-commit hook

Use pre-commit uninstall to deactive all pre-commit hooks, or git commit --no-verify to bypass them during commit.

Keep some outputs

To keep outputs on individual cells, set the keep_output tag on the cell metadata. You can edit the cell metadata by selecting "Edit Metadata" on the Cell Toolbar, or by opening the notebook in a text editor and editing the json directly. The tag should look like this:

{
  "keep_output": true,
}

See the documentation here.

Example Data

Many recipes must provide example data. This can either be data committed alongside the recipe, or downloaded during recipe execution.

Checking In Example Data

Checking in the data has the advantage of eliminating a moving part from the recipe. Not only might a referenced server disappear, but the data might change unexpectedly.

If the data is "small" (under 100 KB), and if the data can be committed to a cookbook repository and made available under the CDLA Permissive 2.0 license (see Licenses page) then this is a good option. Also consider the implications of the DCO requirement of commits. As noted below, see the legal section of the community CONTRIBUTING guide.

Downloading Example Data

In many cases the data is larger than something we'd want to manage in a Cookbook repository, or the the data cannot be made available under the CDLA Permissive 2.0 license in the Cookbook repository.

In those cases, the recipe should obtain (download) the data during the execution of the notebook. Be careful to state any login requirements at the top of the recipe Ideally the data can be obtained from a public source without authentication.

Finally, also consider the testability of the download. Recipe notebooks are executed automatically as a quality gate on pull requests. This means that all data downloads should work unassisted/headless. If this is not possible, consider using a flag (defaulting to true) to indicate to the recipe that it is running as an automated test. In that case, the recipe could take an alternate path to use some smaller, stand-in dataset that is committed to the Cookbook repository.

Other Guidelines

For recipe authors with strong familiarity with a specific capability or tool, the first inclination may be to write a recipe oriented around the tool. Consider alternate ways to phrase the recipe so that the end result is showcased, rather than the tool.

Under the cooking analogy, that would mean writing a great soup recipe rather than one that talks about the features of a food processor. If the soup tastes great and is easy to prepare, the reader will likely want to know more about how it was made.

Recipes will vary in complexity. Some may be single inference calls. Others may illustrate useful agentic workflows.

A "cookbook" is not intended to be a comprehensive guide to all issues that may arise during development with Granite Code. Recipes will link to helpful external resources on topics including: distributed systems, UI, design, AI/ML theory, metrics, etc.

  • Prefer an opinionated recipe over one that is flexible.
  • With that said, sometimes offering examples from multiple domains (e.g., for "text to SQL") can be helpful. If that brings too much complexity, split into smaller recipes.
  • Keep in mind a specific user persona when writing a recipe. Rather than writing for a general audience, can you imagine that one specific user would find the recipe valuable from beginning to end?
  • Take an iterative approach to the development of this cookbook.
  • Expect that over time, recipes will be split, merged, made uniform, deprecated, replaced, or deleted.

Example

For a "text to SQL" recipe, for instance, the recipe should:

  • State at a high level how a "text to SQL" capability could create "business value", and therefore be worth the investment.
  • Provide example schema
  • Provide example data
  • Provide example natural language queries
  • Show the expected resulting SQL
  • Provide enough code to walk through the whole process
  • Remind or show the reader how to obtain the schema and execute the query.