Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python-connector-base-image: upgrade to python 3.9.19 + update setuptools and pip #38859

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 31 additions & 29 deletions airbyte-ci/connectors/base_images/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,22 @@ Our connector build pipeline ([`airbyte-ci`](https://github.com/airbytehq/airbyt
Our base images are declared in code, using the [Dagger Python SDK](https://dagger-io.readthedocs.io/en/sdk-python-v0.6.4/).

- [Python base image code declaration](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/base_images/base_images/python/bases.py)
- ~Java base image code declaration~ _TODO_
- ~Java base image code declaration~ *TODO*

## Where are the Dockerfiles?

## Where are the Dockerfiles?
Our base images are not declared using Dockerfiles.
They are declared in code using the [Dagger Python SDK](https://dagger-io.readthedocs.io/en/sdk-python-v0.6.4/).
We prefer this approach because it allows us to interact with base images container as code: we can use python to declare the base images and use the full power of the language to build and test them.
However, we do artificially generate Dockerfiles for debugging and documentation purposes.

### Example for `airbyte/python-connector-base`:


### Example for `airbyte/python-connector-base`:
```dockerfile
FROM docker.io/python:3.9.18-slim-bookworm@sha256:44b7f161ed03f85e96d423b9916cdc8cb0509fb970fd643bdbc9896d49e1cad0
FROM docker.io/python:3.9.19-slim-bookworm@sha256:b92e6f45b58d9cafacc38563e946f8d249d850db862cbbd8befcf7f49eef8209
Copy link
Contributor

@wennergr wennergr Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are our options for changing to python:3.9.19-alpine?

Debian moves a bit slower and is a bit bigger. Pretty big difference in vulnerability size


sh# grype python:3.9.19-slim-bookworm 
 ✔ Vulnerability DB                [no update available]  
 ✔ Pulled image                    
 ✔ Loaded image                                                                                                                                               python:3.9.19-slim-bookworm
 ✔ Parsed image                                                                                                   sha256:182cc99a2af6b2eb20b8da7e1a0d3661424b8fa7a0f0fbdac147e5e4ca8a3005
 ✔ Cataloged contents                                                                                                    34f145fff6b64d6199becdf2f8d67d2810b54df9d610a14f0c1989ab39d0c01e
   ├── ✔ Packages                        [118 packages]  
   ├── ✔ File digests                    [2,954 files]  
   ├── ✔ File metadata                   [2,954 locations]  
   └── ✔ Executables                     [809 executables]  
 ✔ Scanned for vulnerabilities     [118 vulnerability matches]  
   ├── by severity: 1 critical, 7 high, 25 medium, 3 low, 55 negligible (27 unknown)
   └── by status:   2 fixed, 116 not-fixed, 0 ignored 

vs

sh# grype python:3.9.19-alpine       
 ✔ Vulnerability DB                [no update available]  
 ✔ Pulled image                    
 ✔ Loaded image                                                                                                                                                      python:3.9.19-alpine
 ✔ Parsed image                                                                                                   sha256:a3003c79447cb9c6aa127216f7dd3fe0c746723003effce38674cff406bcee25
 ✔ Cataloged contents                                                                                                    d3a44294453f9c846b8a1d36120f54690ee52d77c376b8b7d8d4b6850408415e
   ├── ✔ Packages                        [47 packages]  
   ├── ✔ File digests                    [659 files]  
   ├── ✔ File metadata                   [659 locations]  
   └── ✔ Executables                     [141 executables]  
 ✔ Scanned for vulnerabilities     [11 vulnerability matches]  
   ├── by severity: 0 critical, 2 high, 8 medium, 0 low, 0 negligible (1 unknown)
   └── by status:   2 fixed, 9 not-fixed, 0 ignored 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wennergr we originally went for bookworm as some python package like Pandas require system dependencies which are not on alpine.
It's feasible to add them to an alpine base but not straightforward...
https://gist.github.com/orenitamar/f29fb15db3b0d13178c1c4dd611adce2

So we picked bookworm for simplicity and compatibility...

I suggest to cut this new version (1.2.1) with bookworm, and then cut a new one (2.0.0) with alpine.

This would be a major version as it might not be usable by some connectors.

Our batch update connector flow will lead to a best effort thing:

  • All connectors incompatible with alpine will fail their CI build so they won't get updated.

RUN ln -snf /usr/share/zoneinfo/Etc/UTC /etc/localtime
RUN pip install --upgrade pip==23.2.1
RUN pip install --upgrade pip==24.0 setuptools==70.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we have the versions hardcoded here in code. But, not blocking, definitely works for us well now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hardcode versions for build reproducibility. If we don't do that a rebuild without any change might end up with different versions... Which can be surprising and lead to unexpected side effets.

ENV POETRY_VIRTUALENVS_CREATE=false
ENV POETRY_VIRTUALENVS_IN_PROJECT=false
ENV POETRY_NO_INTERACTION=1
Expand All @@ -30,56 +31,57 @@ RUN sh -c apt-get update && apt-get install -y tesseract-ocr=5.3.0-2 poppler-uti
RUN mkdir /usr/share/nltk_data
```



## Base images


### `airbyte/python-connector-base`

| Version | Published | Docker Image Address | Changelog |
| ------- | --------- | --------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| 1.2.0 | ✅ | docker.io/airbyte/python-connector-base:1.2.0@sha256:c22a9d97464b69d6ef01898edf3f8612dc11614f05a84984451dde195f337db9 | Add CDK system dependencies: nltk data, tesseract, poppler. |
| 1.1.0 | ✅ | docker.io/airbyte/python-connector-base:1.1.0@sha256:bd98f6505c6764b1b5f99d3aedc23dfc9e9af631a62533f60eb32b1d3dbab20c | Install socat |
| 1.0.0 | ✅ | docker.io/airbyte/python-connector-base:1.0.0@sha256:dd17e347fbda94f7c3abff539be298a65af2d7fc27a307d89297df1081a45c27 | Initial release: based on Python 3.9.18, on slim-bookworm system, with pip==23.2.1 and poetry==1.6.1 |
| Version | Published | Docker Image Address | Changelog |
| ---------- | --------- | -------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| 1.2.1 | ✅ | docker.io/airbyte/python-connector-base:1.2.1@sha256:4a4255e2bccab71fa5912487e42d9755cdecffae77273fed8be01a081cd6e795 | Upgrade to Python 3.9.19 + update pip and setuptools |
| 1.2.0 | ✅ | docker.io/airbyte/python-connector-base:1.2.0@sha256:c22a9d97464b69d6ef01898edf3f8612dc11614f05a84984451dde195f337db9 | Add CDK system dependencies: nltk data, tesseract, poppler. |
| 1.2.0-rc.1 | ✅ | docker.io/airbyte/python-connector-base:1.2.0-rc.1@sha256:f6467768b75fb09125f6e6b892b6b48c98d9fe085125f3ff4adc722afb1e5b30 | |
| 1.1.0 | ✅ | docker.io/airbyte/python-connector-base:1.1.0@sha256:bd98f6505c6764b1b5f99d3aedc23dfc9e9af631a62533f60eb32b1d3dbab20c | Install socat |
| 1.0.0 | ✅ | docker.io/airbyte/python-connector-base:1.0.0@sha256:dd17e347fbda94f7c3abff539be298a65af2d7fc27a307d89297df1081a45c27 | Initial release: based on Python 3.9.18, on slim-bookworm system, with pip==23.2.1 and poetry==1.6.1 |


## How to release a new base image version (example for Python)

### Requirements

- [Docker](https://docs.docker.com/get-docker/)
- [Poetry](https://python-poetry.org/docs/#installation)
- Dockerhub logins
* [Docker](https://docs.docker.com/get-docker/)
* [Poetry](https://python-poetry.org/docs/#installation)
* Dockerhub logins

### Steps

1. `poetry install`
2. Open `base_images/python/bases.py`.
2. Open `base_images/python/bases.py`.
3. Make changes to the `AirbytePythonConnectorBaseImage`, you're likely going to change the `get_container` method to change the base image.
4. Implement the `container` property which must return a `dagger.Container` object.
5. **Recommended**: Add new sanity checks to `run_sanity_check` to confirm that the new version is working as expected.
6. Cut a new base image version by running `poetry run generate-release`. You'll need your DockerHub credentials.

It will:

- Prompt you to pick which base image you'd like to publish.
- Prompt you for a major/minor/patch/pre-release version bump.
- Prompt you for a changelog message.
- Run the sanity checks on the new version.
- Optional: Publish the new version to DockerHub.
- Regenerate the docs and the registry json file.

- Prompt you to pick which base image you'd like to publish.
- Prompt you for a major/minor/patch/pre-release version bump.
- Prompt you for a changelog message.
- Run the sanity checks on the new version.
- Optional: Publish the new version to DockerHub.
- Regenerate the docs and the registry json file.
7. Commit and push your changes.
8. Create a PR and ask for a review from the Connector Operations team.

**Please note that if you don't publish your image while cutting the new version you can publish it later with `poetry run publish <repository> <version>`.**
No connector will use the new base image version until its metadata is updated to use it.
If you're not fully confident with the new base image version please:
- please publish it as a pre-release version
- try out the new version on a couple of connectors
- cut a new version with a major/minor/patch bump and publish it
- This steps can happen in different PRs.

- please publish it as a pre-release version
- try out the new version on a couple of connectors
- cut a new version with a major/minor/patch bump and publish it
- This steps can happen in different PRs.

## Running tests locally

```bash
poetry run pytest
# Static typing checks
Expand Down
5 changes: 3 additions & 2 deletions airbyte-ci/connectors/base_images/base_images/hacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,10 @@ def get_container_dockerfile(container) -> str:
Returns:
str: The Dockerfile of the base image container.
"""

lineage = [
field for field in list(container._ctx.selections) if isinstance(field, dagger.api.base.Field) and field.type_name == "Container"
field
for field in list(container._ctx.selections)
if isinstance(field, dagger.client._core.Field) and field.type_name == "Container"
]
dockerfile = []

Expand Down
10 changes: 5 additions & 5 deletions airbyte-ci/connectors/base_images/base_images/python/bases.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,12 @@
from base_images import bases, published_image
from base_images import sanity_checks as base_sanity_checks
from base_images.python import sanity_checks as python_sanity_checks
from base_images.root_images import PYTHON_3_9_18
from base_images.root_images import PYTHON_3_9_19


class AirbytePythonConnectorBaseImage(bases.AirbyteConnectorBaseImage):

root_image: Final[published_image.PublishedImage] = PYTHON_3_9_18
root_image: Final[published_image.PublishedImage] = PYTHON_3_9_19
repository: Final[str] = "airbyte/python-connector-base"
pip_cache_name: Final[str] = "pip_cache"
nltk_data_path: Final[str] = "/usr/share/nltk_data"
Expand Down Expand Up @@ -94,7 +94,7 @@ def get_container(self, platform: dagger.Platform) -> dagger.Container:
# Set the timezone to UTC
.with_exec(["ln", "-snf", "/usr/share/zoneinfo/Etc/UTC", "/etc/localtime"])
# Upgrade pip to the expected version
.with_exec(["pip", "install", "--upgrade", "pip==23.2.1"])
.with_exec(["pip", "install", "--upgrade", "pip==24.0", "setuptools==70.0.0"])
# Declare poetry specific environment variables
.with_env_variable("POETRY_VIRTUALENVS_CREATE", "false")
.with_env_variable("POETRY_VIRTUALENVS_IN_PROJECT", "false")
Expand All @@ -117,8 +117,8 @@ async def run_sanity_checks(self, platform: dagger.Platform):
container = self.get_container(platform)
await base_sanity_checks.check_timezone_is_utc(container)
await base_sanity_checks.check_a_command_is_available_using_version_option(container, "bash")
await python_sanity_checks.check_python_version(container, "3.9.18")
await python_sanity_checks.check_pip_version(container, "23.2.1")
await python_sanity_checks.check_python_version(container, "3.9.19")
await python_sanity_checks.check_pip_version(container, "24.0")
await python_sanity_checks.check_poetry_version(container, "1.6.1")
await python_sanity_checks.check_python_image_has_expected_env_vars(container)
await base_sanity_checks.check_a_command_is_available_using_version_option(container, "socat", "-V")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,10 @@
tag="3.9.18-slim-bookworm",
sha="44b7f161ed03f85e96d423b9916cdc8cb0509fb970fd643bdbc9896d49e1cad0",
)

PYTHON_3_9_19 = PublishedImage(
registry="docker.io",
repository="python",
tag="3.9.19-slim-bookworm",
sha="b92e6f45b58d9cafacc38563e946f8d249d850db862cbbd8befcf7f49eef8209",
)
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
[
{
"version": "1.2.1",
"changelog_entry": "Upgrade to Python 3.9.19 + update pip and setuptools",
"dockerfile_example": "FROM docker.io/python:3.9.19-slim-bookworm@sha256:b92e6f45b58d9cafacc38563e946f8d249d850db862cbbd8befcf7f49eef8209\nRUN ln -snf /usr/share/zoneinfo/Etc/UTC /etc/localtime\nRUN pip install --upgrade pip==24.0 setuptools==70.0.0\nENV POETRY_VIRTUALENVS_CREATE=false\nENV POETRY_VIRTUALENVS_IN_PROJECT=false\nENV POETRY_NO_INTERACTION=1\nRUN pip install poetry==1.6.1\nRUN sh -c apt update && apt-get install -y socat=1.7.4.4-2\nRUN sh -c apt-get update && apt-get install -y tesseract-ocr=5.3.0-2 poppler-utils=22.12.0-2+b1\nRUN mkdir /usr/share/nltk_data"
},
{
"version": "1.2.0",
"changelog_entry": "Add CDK system dependencies: nltk data, tesseract, poppler.",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ def dummy_version(self):

def test_class_attributes(self):
"""Spot any regression in the class attributes."""
assert bases.AirbytePythonConnectorBaseImage.root_image == root_images.PYTHON_3_9_18
assert bases.AirbytePythonConnectorBaseImage.root_image == root_images.PYTHON_3_9_19
assert bases.AirbytePythonConnectorBaseImage.repository == "airbyte/python-connector-base"
assert bases.AirbytePythonConnectorBaseImage.pip_cache_name == "pip_cache"

Expand Down
Loading