Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite mirror configuration documentation #1669

Merged
merged 6 commits into from
Mar 3, 2024

Conversation

flyinghyrax
Copy link
Contributor

This is a rewrite of the documentation page docs/mirror_configuration.md in preparation for fixing the default value issue in #990. I set out to make sure all the options and defaults were documented and got a bit carried away. If this is too drastic of a change, I'm happy to backpedal to an update that only adds missing options and defaults.

Summary of Major Changes

  • Add reference for valid [mirror] options that were not listed on the mirror configuration page, e.g.:
    • storage-backend
    • digest_name
    • cleanup
    • download-mirror-no-fallback
    • keep_index_versions
  • Document the expected default value for every option.
    • This will temporarily continue to disagree with reality for master, global-timeout, timeout, workers, and hash-index when a user gives a config file that is missing any of those options.
  • Rewrite each options short description:
    • Move each option's "type" and default value out of the short description and into a field list, so these are displayed consistently across all options.
    • Trim short descriptions to be as concise as possible, moving additional details into separate paragraphs below the type/default.
    • Remove example code blocks that only show a single option by itself.
    • Make extensive (or perhaps excessive?) use of cross-references.
  • Create new example snippets that show related options being used together.
  • Add documentation for how different options affect the mirror directory structure
    • I personally would've found this useful when I first started using Bandersnatch, since it was hard for me to see what options controlled what directories and files in the output. But I'm also happy to move these elsewhere if you think this is too much detail for this page.
  • Embed the content of src/bandersnatch/default.conf for user reference
    • Since it's pulled directly from source it should always be up to date as long as the docs get re-built.

Page Organization

  1. Introduction
  2. Examples
  3. Option Reference
  4. Default Config File

The reference for each option is organized like this:

  1. Option name (header)
  2. Short description (preferably single sentence)
  3. Option type and default value
  4. Detailed description, notes, and references

I am hoping this makes the page more skimmable, but am unsure.

@flyinghyrax
Copy link
Contributor Author

flyinghyrax commented Feb 20, 2024

mdformat is breaking on MyST syntax. I tried changing the syntax plugin in pre-commit configuration from mdformat-gfm to mdformat-myst which fixed some problems, but continues to break admonition blocks per this issue: executablebooks/mdformat-myst#13. mdformat-myst doesn't support MyST's colon_fence extension, despite "myst-parser", "mdformat", and "mdformat-myst" all being maintained by the same organization, which seems... kinda weird.

I'm going to try using code fence syntax for directives and see if that's enough to smooth things over. (Writing an mdformat plugin feels excessive.)


Update: that worked, and mdformat seems happy now even without changing mdformat-gfm to mdformat-myst.

@flyinghyrax flyinghyrax force-pushed the 990-mirror-config-docs branch from c5692c1 to 3f7bfe5 Compare February 20, 2024 01:27
Copy link

codecov bot commented Feb 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.57%. Comparing base (997b18e) to head (592d36c).
Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1669   +/-   ##
=======================================
  Coverage   83.57%   83.57%           
=======================================
  Files          31       31           
  Lines        4328     4328           
  Branches      781      781           
=======================================
  Hits         3617     3617           
  Misses        524      524           
  Partials      187      187           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@cooperlees cooperlees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me so far. Just left some minor comments.

directory = /srv/pypi-filtered
simple-format = ALL
release-files = false
root_uri = https://files.pythonhosted.org/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR - but, If we need root_uri when release-files is false, maybe we should open an issue to fix that. Seems unneeded right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current behavior is that if release-files is false, root_uri will default to https://files.pythonhosted.org/, so not required per-se.

I'll have to check, but I suspect setting root_uri = and release-files = false would still generate the index files with relative URLs, so I guess it's a matter of what we think the most intuitive default behavior is for release-files?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, lets just document what happens in this PR and if we feel we can discuss changing it in a dedicated issue.

docs/mirror_configuration.md Outdated Show resolved Hide resolved

The mirror json setting is a boolean (true/false) setting that indicates that
the json packaging metadata should be mirrored in addition to the packages.
; per-coroutine time limit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget what this is, but is it just an overall maximum time each worker coroutine can run? If so, maybe we should explain that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the existing docs and default.conf I believe it's the maximum runtime for each aiohttp coroutine, but this would be a good one to track down in the source to confirm that and perhaps explain more clearly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if you have time, won't block the PR on it.

docs/mirror_configuration.md Outdated Show resolved Hide resolved
docs/mirror_configuration.md Outdated Show resolved Hide resolved
docs/mirror_configuration.md Show resolved Hide resolved
docs/mirror_configuration.md Show resolved Hide resolved
Comment on lines 317 to 373
### `cleanup`

Example:
Enable cleanup of legacy simple directories with non-normalized names.

```ini
[mirror]
proxy=http://myproxy.com
:Type: boolean
:Default: false

Bandersnatch versions prior to 4.0 used directories with non-normalized package names for compatability with older versions of pip. Enabling this option checks for and removes these directories.

```{seealso}
[Python Packaging User Guide - Names and Normalization](https://packaging.python.org/en/latest/specifications/name-normalization/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should prob remove this one day.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to go ahead and indicate that it's deprecated in the docs? (Though I suppose technically that requires a version bump even if its just the documented status that changes?)

Copy link
Contributor

@cooperlees cooperlees Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will not hurt to do so.

docs/mirror_configuration.md Outdated Show resolved Hide resolved
docs/storage_options.md Outdated Show resolved Hide resolved
@flyinghyrax
Copy link
Contributor Author

  1. I've updated this to match the current config implementation rather than the aspirational one. All options are marked as either required or optional, and I listed the currently required options at the top of the document.
  2. I added info about SOCKS support to the proxy option, but based on my reading of the source SOCKS is only supported when using environment variables for the proxy URL. It might be worth tweaking the code that checks for SOCKS URLs to check the proxy option in the config as well as the environment variables.
  3. I originally thought the output files created by the json option were basically unused, but it appears that the verify subcommand relies on the files being in <directory>/web/json, making enabling json a requirement for running verify. I will try to validate this experimentally.

I would still like to track down how global-timeout is used so I can describe that more accurately. I also plan to run each of the example configs to make sure they work, and verify that I've identified all the 'required' options.

@cooperlees
Copy link
Contributor

  1. Nice - Will review once you mark it as ready + CI passes (just approved it to run)
  2. Opened an issue - Thanks for hi-lighting that.
  3. The verify option should download the JSON if missing, but probably worth noting it's needed for faster verify.

I'm happy to merge as is and you can do smaller PRs addressing the global-timeout etc. etc. if you like - but happy to do it how ever you want :)

  • Just want to make it easier for you. This is already a large improvement as is. Thanks!

flyinghyrax and others added 6 commits February 27, 2024 19:24
- Add options that were missing from the documentation
- Document expected defaults for all options
- Group examples into larger snippets that show related options being used together
- Show the type and default value for each option in a consistent, non-prose format
- Rewrite option short descriptions to be more concise
- Make extensive (or excessive?) use of cross references
- Add examples of how different options affect mirror directory structure
- Embed the content of `src/bandersnatch/default.conf` for user reference
Apply suggested tweaks to option descriptions.

Co-authored-by: Cooper Lees <me@cooperlees.com>
- Remove explicit reference targets and use heading anchors instead
- Remove file name for links to headings in the same document
- Elaborate a little in the description for 'simple-format', & include
  a link to PEP-691
- Add note regarding environment variables & SOCKS support in the
  description of 'proxy'
  * the aiohttp client setup in Master checks the noted environment
    variables & adds aiohttp-socks to the client session if it sees a
    socks URL
  * if there wasn't a socks URL, 'trust_env' is set to True, which sets
    aiohttp to use urllib.request.getproxies to scan the environment for
    proxy configuration.
This changes the mirror config documentation to match reality, where
the current config reading implementation makes more options required
than originally intended / strictly necessary.

All options now have a `Required` field in addition to `Type` and
`Default`. Intended future defaults for required fields are moved
into the option description / recommendations.

Updates example configs to always include required options.

I also found that the output of `[mirror].json` appears to be
required for running `bandersnatch verify`.
- Checked the example mirror configuration snippets and corrected
  mistakes - they should all work with 'bandersnatch mirror' when
  combined with filter plugin config.
- Corrected the default value for 'diff-file' and tweak description
- Add entry to CHANGES
@flyinghyrax flyinghyrax force-pushed the 990-mirror-config-docs branch from 6117a75 to 592d36c Compare February 28, 2024 00:24
@flyinghyrax flyinghyrax marked this pull request as ready for review February 28, 2024 00:25
@flyinghyrax
Copy link
Contributor Author

flyinghyrax commented Feb 28, 2024

I tested the example configurations and did find some corrections, so I'm glad I did that. 😄 I also corrected the default value for diff-file and added an entry to CHANGES.md. Lastly I rebased on main. I feel it is ready for review. I'm happy to defer some of the smaller things like global-timeout to future PRs as you mentioned. 🙂

During the config testing I stumbled into a few things:

  • The mirror subcommand uses a default value of filesystem for mirror.storage-backend, but verify doesn't provide a default and crashes if storage-backend isn't explicitly set. That is something I can easily fix along with the rest of the configuration refactoring.
  • I couldn't get verify to work without web/json/ already being populated (i.e. mirror.json enabled). It crashes if the 'json' folder doesn't exist, and no-ops if the folder exists but is empty. I haven't yet checked if there are any open issues for this.
  • mirror.master and mirror.download-mirror both seem sensitive to trailing forward slashes (e.g. https://pypi.org vs https://pypi.org/. I don't know if that's a problem, but it was at least unexpected.
  • And one last weird one: running bandersnatch --debug -c ... mirror consistently failed if mirror.directory wasn't already populated. I didn't try to debug it (heh) and figured I would get together a minimal reproducible example or unit test to replicate it before opening an issue.

Copy link
Contributor

@cooperlees cooperlees left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for this PR and all the findings. Feel free to open Issues or just submit PRs if time permits!

@cooperlees cooperlees merged commit 9785a8a into pypa:main Mar 3, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants