Our automated pipelines directory contains code shared by our automated pipelines, including REST, GraphQL, Webhooks, CodeQL CLI, and GitHub Apps.
An automated pipeline consumes data from an external source that is used to create content for docs.github.com. An automated pipeline does not automate documentation that is created by our content writing team. For example, if a writer creates a structured data file like YAML or JSON that lives in the docs-internal
repo, using that data to create a page does not create an automated pipeline.
Automated pages allow for manually created content to be prepended to the automated content, but do not allow for manually created content to be appended or interspersed within automated content. Manually created content (that is prepended to automated content) lives in the Markdown file associated with the automated page, along with the article's frontmatter metadata.
We currently have two patterns that we used to create automated pipelines:
- REST, Webhooks, GitHub Apps, and GraphQL pipelines consume external structured data and transform that data into a JSON file that is used to create content for a page on docs.github.com. Typically, data files are a 1:1 mapping to a specific page on docs.github.com.
- The CodeQL CLI pipeline takes an unstructured ReStructuredText file and transforms it directly into a Markdown file with frontmatter, that uses the same authoring format as the rest of the docs.
Each pipeline should be evaluated individually to determine the best architecture for simplicity, maintainability, and requirements. For example:
- Is the content being displayed basic Markdown content? For example, does the content avoid using complex tables and interactive elements? If so, then writing the Markdown content directly and avoiding the need to create a structured data file that requires a React component may be the best approach. This was the case for the CodeQL CLI pipeline. One caveat to think about before writing Markdown directly is whether the content will need liquid versioning. The current pipeline that writes Markdown directly does not need to use liquid versioning. Liquid versioning which would increase the complexity quite a bit. All of the Markdown content in each article that is generated from the CodeQL CLI pipeline applies to all versions listed in the
versions
frontmatter property, simplifying the Markdown generation process. - Is the page interactive like the REST and Webhooks pages? If so, then the data will likely need to be structured data. In that case, a new React component may be needed to display the data.
When creating a new pipeline, the source data that is being consumed may not have all of the necessary data needed to create the page. Oftentimes, source data does not contain descriptions and prose that our content writers have crafted to describe properties or concepts. In this case, it's common to need to scrape data from our docs and merge it into a new field in the structured data file that we intend to consume. When creating a new pipeline, you'll need to work with the team that owns the source data to create a plan for adding any additional properties and agreeing on a format that will work best for both teams.
- Create a new directory in the
src
directory with the name of the pipeline. For example,src/codeql-cli
. - Add a README.md file that describes the pipeline and how to use it. This should include any dependencies, how to run the pipeline, and any other information that is needed to use the pipeline. It's strongly recommended to include a diagram showing the overall flow of the pipeline.
- Each pipeline typically requires a workflow to allow scheduling or manually running the pipeline. The workflow should be placed in the
.github/workflows
directory and namedsync-<pipeline-name>.js
. Each workflow typically requires adding a manual run option and an input parameter to specify the source repo's branch to use. - Each pipeline will need a
scripts
directory with (at minimum) ascripts/sync.js
file to run the pipeline. - If the pipeline will contain structured data, you will need to add a
src/<pipeline-name>/data
directory. The files inside thedata
directory are typically organized by version (e.g.,src/webhooks/data/fpt/*
). - Pipelines typically have tests specific to the pipeline that are placed in the
src/<pipeline-name>/tests
directory. There is no need to add tests that render the page because all autogenerated pages are tested insrc/automated-pipelines/tests/rendering.js
.- If the pipeline uses a Next.js page component (e.g.,
pages/**/*.tsx
), ensure there is a test that fails if that page component is moved or deleted.
- If the pipeline uses a Next.js page component (e.g.,
Slack: #docs-engineering
Repo: github/docs-engineering
If you have a question about automation pipelines, you can ask in the #docs-engineering
Slack channel. If you notice a problem with one of the automation pipelines, you can open an issue in the github/docs-engineering
repository.