Skip to content

Commit

Permalink
Provide default values for mirror config options (#1740)
Browse files Browse the repository at this point in the history
* Provide default values for mirror config options

This splits the existing 'default.conf' config file shipped with the package into two
similar files: "defaults.conf" and "example.conf". "example.conf" is an exact copy of the
previous "default.conf". The new "defaults.conf" is a stripped-down version containing
only default values for all mirror configuration options except "mirror.directory".

BandersnatchConfig is changed to *always* read defaults.conf, then read the user config
file if one is specified. This leaves the ConfigParser populated with default values for
any mirror options that aren't set by the user (except mirror.directory).

Notable ripple effects for this include:
- It is no longer meaningful to check ConfigParser.has_option with the 'mirror' section.
  Instead, you have to check whether the options value is empty or None.
- Specifying a default/fallback value when calling .get on the 'mirror' section will
  have no effect, because the option will already be present in the ConfigParser mappings.

As (mostly) an implementation detail, BandersnatchConfig is changed to be a subclass
of ConfigParser. The BandersnatchConfig singleton can be used anywhere a ConfigParser
instance is expected without having to use '.config' to access a nested ConfigParser.

Fixes #1702
Fixes #990

* Update mirror configuration documentation page

Add default values for options that are no longer required.

* Fix unnecessary concatenation in a string literal

Co-authored-by: Cooper Lees <me@cooperlees.com>

---------

Co-authored-by: Cooper Lees <me@cooperlees.com>
  • Loading branch information
flyinghyrax and cooperlees authored May 29, 2024
1 parent 1561917 commit f405f48
Show file tree
Hide file tree
Showing 18 changed files with 301 additions and 250 deletions.
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

- Fix config file value interpolation for the `diff-file` option `PR #1715`
- Fix diff-file being created when the option wasn't set `PR #1716`
- Provide default values for most config options in the `[mirror]` section `PR #1740`

## Deprecation

Expand Down
78 changes: 37 additions & 41 deletions docs/mirror_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,14 @@ The **\[mirror\]** section of the configuration file contains general options fo
The following options are currently _required_:

- [](#directory)
- [](#master)
- [](#workers)
- [](#timeout)
- [](#global-timeout)
- [](#stop-on-error)
- [](#hash-index)

## Examples

These examples only show `[mirror]` options; a complete configuration may include [mirror filtering plugins][filter-plugins] and/or options for a [storage backend][storage-backends].

### Minmal

A basic configuration with reasonable defaults for the required options:
A basic configuration showing some of the more common options:

```ini
[mirror]
Expand All @@ -39,9 +33,6 @@ global-timeout = 18000

; continue syncing when an error occurs
stop-on-error = false

; use PyPI-compatible folder structure for index files
hash-index = false
```

This will mirror index files and package release files from PyPI and store the mirror in `/srv/pypi`. Add configuration for [mirror filtering plugins][filter-plugins] to optionally filter what packages are mirrored in a variety of ways.
Expand All @@ -57,13 +48,6 @@ directory = /srv/pypi
master = https://pypi.org
; Package distribution artifacts downloaded from here if possible
download-mirror = https://pypi-mirror.example.com/

; required options from basic config
workers = 3
timeout = 15
global-timeout = 18000
stop-on-error = false
hash-index = false
```

This will download release files from `https://pypi-mirror.example.com` if possible and fall back to PyPI if a download fails. See [](#download-mirror). Add [](#download-mirror-no-fallback) to download release files exclusively from `download-mirror`.
Expand All @@ -79,13 +63,6 @@ master = https://pypi.org
simple-format = ALL
release-files = false
root_uri = https://files.pythonhosted.org/

; required options from basic config
workers = 3
timeout = 15
global-timeout = 18000
stop-on-error = false
hash-index = false
```

This will mirror index files for projects and versions allowed by your [mirror filters][filter-plugins], but will not download any package release files. File URLs in index files will use the configured `root_uri`. See [](#release-files) and [](#root_uri).
Expand Down Expand Up @@ -169,9 +146,9 @@ A base URL to generate absolute URLs for package release files.

:Type: URL
:Required: no
:Default: `https://files.pythonhosted.org/`
:Default: dynamic; see description

Bandersnatch creates index files containing relative URLs by default. Setting this option generates index files with absolute URLs instead.
Bandersnatch creates index files containing relative URLs by default. Setting this option generates index files with absolute URLs instead, using the specified string for the base URL.

If [](#release-files) is disabled _and_ this option is unset, Bandersnatch uses a default value of `https://files.pythonhosted.org/`.

Expand All @@ -185,9 +162,9 @@ File location to write a list of all new or changed files during a mirror operat

:Type: file or folder path
:Required: no
:Default: `<mirror directory>/mirrored-files`
:Default: none

Bandersnatch creates a plain-text file at the specified location containing a list of all files created or updated during the last mirror/sync operation. The files are listed as absolute paths separated by blank lines.
If set, Bandersnatch creates a plain-text file at the specified location containing a list of all files created or updated during the last mirror/sync operation. The files are listed as absolute paths, one per line.

This is useful when mirroring to an offline network where it is required to only transfer new files to the downstream mirror. The diff file can be used to copy new files to an external drive, sync the list of files to an SSH destination such as a diode, or send the files through some other mechanism to an offline system.

Expand Down Expand Up @@ -233,7 +210,8 @@ Will generate diff files with names like `/srv/pypi/new-files-1568129735`. This
Group generated project index folders by the first letter of their normalized project name.

:Type: boolean
:Required: **yes**
:Required: no
:Default: false

Enabling this changes the way generated index files are organized. Project folders are grouped into subfolders alphabetically as shown here: [](#hash-index-index-files). This has the effect of splitting up a large `/web/simple` directory into smaller subfolders, each containing a subset of the index files. This can improve file system efficiency when mirroring a very large number of projects, but requires a web server capable of translating Simple Repository API URLs into file paths.

Expand Down Expand Up @@ -275,14 +253,17 @@ rewrite ^/simple/([^/])([^/]*)/([^/]+)$/ /simple/$1/$1$2/$3 last;
The URL of the Python package repository server to mirror.

:Type: URL
:Required: **yes**

Bandersnatch requests metadata for projects and packages from this repository server, and downloads package release files from the URLs specified in the received metadata.
:Required: no
:Default: `https://pypi.org`

To mirror packages from PyPI, set this to `https://pypi.org`.
Bandersnatch requests metadata for projects and packages from this repository server, and downloads package release files from the URLs specified in the received metadata. The default value mirrors packages from PyPI.

The URL _must_ use the `https:` protocol.

```{note}
The specified server must support [PyPI's JSON API](https://warehouse.pypa.io/api-reference/json.html) for Bandersnatch to mirror any projects.
```

```{seealso}
Bandersnatch can download package release files from an alternative source by configuring a [](#download-mirror).
```
Expand Down Expand Up @@ -312,7 +293,8 @@ SOCKS proxies are not currently supported via the `mirror.proxy` config option.
The network request timeout to use for all connections, in seconds. This is the maximum allowed time for individual web requests.

:Type: number, in seconds
:Required: **yes**
:Required: no
:Default: 10

```{note}
It is recommended to set this to a relatively low value, e.g. 10 - 30 seconds. This is so temporary problems will fail quickly and allow retrying, instead of having a process hang infinitely and leave TCP unable to catch up for a long time.
Expand All @@ -323,7 +305,8 @@ It is recommended to set this to a relatively low value, e.g. 10 - 30 seconds. T
The maximum runtime of individual aiohttp coroutines, in seconds.

:Type: number, in seconds
:Required: **yes**
:Required: no
:Default: 1800

```{note}
It is recommended to set this to a relatively high value, e.g. 3,600 - 18,000 (1 - 5 hours). This supports coroutines mirroring large package files on slow connections.
Expand Down Expand Up @@ -378,7 +361,8 @@ Bandersnatch versions prior to 4.0 used directories with non-normalized package
The number of worker threads used for parallel downloads.

:Type: number, 1 ≤ N ≤ 10
:Required: **yes**
:Required: no
:Default: 3

Use **1 - 3** workers to avoid overloading the PyPI master (and maybe your own internet connection). If you see timeouts and have a slow connection, try lowering this setting.

Expand All @@ -401,7 +385,8 @@ This option is used by the <project:#bandersnatch-verify> subcommand.
Stop mirror/sync operations immediately when an error occurs.

:Type: boolean
:Required: **yes**
:Required: no
:Default: false

When disabled (`stop-on-error = false`), Bandersnatch continues syncing after an error occurs, but will mark the sync as unsuccessful. When enabled, Bandersnatch will stop all syncing as soon as possible if an error occurs. This can be helpful when debugging the cause of an unsuccessful sync.

Expand All @@ -421,6 +406,7 @@ The method used to compare existing files with upstream files.
The algorithm used to compute file hashes when [](#compare-method) is set to `hash`.

:Type: one of `sha256`, `md5`
:Required: no
:Default: `sha256`

### `keep_index_versions`
Expand Down Expand Up @@ -607,13 +593,23 @@ The content of the index files themselves is unchanged.

## Default Configuration File

Bandersnatch loads default values from a configuration file inside the package. You can use this file as a reference or as the basis for your own configuration.
Bandersnatch loads default values from a configuration file inside the package.

```{literalinclude} ../src/bandersnatch/defaults.conf
---
name: defaults.conf
language: ini
caption: Default configuration file from `src/bandersnatch/defaults.conf`
---
```

An annotated example configuration is also included. You can use this file as a reference or as the basis for your own configuration.

```{literalinclude} ../src/bandersnatch/default.conf
```{literalinclude} ../src/bandersnatch/example.conf
---
name: default.conf
name: example.conf
language: ini
caption: Default configuration file from `src/bandersnatch/default.conf`
caption: Example configuration file from `src/bandersnatch/example.conf`
---
```

Expand Down
13 changes: 13 additions & 0 deletions src/bandersnatch/config/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
"""Exception subclasses for configuration file loading and validation."""


class ConfigError(Exception):
"""Base exception for configuration file exceptions."""

pass


class ConfigFileNotFound(ConfigError):
"""A specified configuration file is missing or unreadable."""

pass
Loading

0 comments on commit f405f48

Please sign in to comment.