Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move tools/ci_* projects to airbyte-ci, update to use Poetry, bump to python 3.10 #27957

Merged
merged 38 commits into from
Jul 26, 2023

Conversation

bnchrch
Copy link
Contributor

@bnchrch bnchrch commented Jul 4, 2023

What

  1. moves tools/ci_* projects to airbyte-ci
  2. Updates them to use poetry/pipx
  3. Updates these packages to python 3.10
  4. Improves the installation story
  5. Moves pipelines to its own package

Now to install airbyte-ci all you have to run is

pipx install airbyte-ci/connectors/pipelines

Notes for reviewer

@alafanechere I would love to get your take on this being the airbyte-ci tzar.

  • Is this a good idea?
  • Is this the direction you had wanted to go in?
  • What do you think about using pipx to isolate the install?
  • The folders ci and connectors at the top of airbyte-ci still feel awkward is there a better way?

Left TODO

  • Update ci_code_validator
  • Update GHA
  • Update docs
  • Update Dagger build
  • Test that everything still works

closes #25170
closes #27865

@bnchrch bnchrch changed the title WIP: Move tools/ci_* projects to airbyte-ci and update to use Poetry Move tools/ci_* projects to airbyte-ci, update to use Poetry, bump to python 3.10 Jul 5, 2023
@bnchrch bnchrch requested a review from alafanechere July 5, 2023 06:44
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pipelines/airbyte-ci are now their own package

Once pipx is installed, navigate to the root directory of the project, then run:

```bash
pipx install airbyte-ci/ci/ci_pipelines/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation is one line

packages = [{ include = "ci_credentials" }]

[tool.poetry.dependencies]
python = "^3.10"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 3.10 all around!!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ci_credentials is installed on GHA workflows that are running 3.9. Can we keep compatibility of ci_credentials with 3.9 while suggesting use of 3.10 though .python-version files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to keep the version diff between local and ci? Could this be another source of behavior mismatches?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm, seems like we bumped most things as we went along, except .github/workflows/connector-performance-command.yml for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I think that was a miss on my part.

I ended up going with 3.10 everywhere as it seems that there were no issues encountered while running these tools as 3.10

requests = "^2.28.2"
click = "^8.1.3"
pyyaml = "^6.0"
ci_common_utils = { path = "../ci_common_utils", develop = true }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go ahead with this we will need to update the local dependency resolution right?

@alafanechere
Copy link
Contributor

Is this a good idea?

Yes it is!

Is this the direction you had wanted to go in?

Totally

What do you think about using pipx to isolate the install?

I'm not against it but I'm not familiar with it. Is it providing dependency isolation while not requiring the use of a python virtualenv? I'm 👍 , but if poetry has a builtin way to achieve the same I'd prefer to stick to poetry (I'm not too familiar with it too..)

The folders ci and connectors at the top of airbyte-ci still feel awkward is there a better way?

So in this ci_connector_ops / ci_credentials / ci_common_utils thing we have:

  • QA checks
  • airbyte-ci pipelines
  • ci_credentials to download GSm secrets
  • ci_common_utils which only has a GoogleAPI client class... Which I believe is only used by ci_credentials.

My suggestion would be to:

  • Merge ci_credentials and ci_common_utils into a single package : airbyte-ci/connectors/credentials ?
  • I'd love to have a package we can use as a lib that would have the Connector utility class. airbyte-ci/connectors/utils ? It could be imported in pipelines and qa_checks for instance.
  • The QA checks should be a light package in airbyte-ci/connectors/qa_checks

@bnchrch
Copy link
Contributor Author

bnchrch commented Jul 5, 2023

@alafanechere Thanks for the swift reply and glad to hear this is in the same direction you had in mind!

if poetry has a builtin way to achieve the same I'd prefer to stick to poetry

Poetry doesntly have a good way to "install" these cli tools.

So we'd have to run the airbyte-ci by

cd airbyte-ci/ci/ci_pipelines
poetry install
poetry run airbyte-ci

Note: we'd also have to update all the paths to work from this folder not root

However thats where pipx comes in

image

its purpose built to install cli tools in an isolated way. This should let us have our customers (connector devs) install airbyte-ci ci-credentials and any other tool completely isolated without having conflicts.

It also interfaces with pyproject.toml and the lockfiles perfectly!

My suggestion would be to:

  1. Merge ci_credentials and ci_common_utils into a single package : airbyte-ci/connectors/credentials ?
  2. I'd love to have a package we can use as a lib that would have the Connector utility class. airbyte-ci/connectors/utils ? It could be imported in pipelines and qa_checks for instance.
  3. The QA checks should be a light package in airbyte-ci/connectors/qa_checks

These all sound great. Would you mind if we moved the refactor of ci_common_utils and ci_credentials to later PR?

Just so I can get this in with minimal conflicts 😅

What I can do is rename these to remove ci and place them all under airbyte-ci/connectors/

e.g.
airbyte-ci/connectors/common_utils
airbyte-ci/connectors/pipelines
airbyte-ci/connectors/credentials

@alafanechere
Copy link
Contributor

Allright, let's use pipx!
And let's refactor ci_common_utils and ci_credentials in a different PR!

What I can do is rename these to remove ci and place them all under airbyte-ci/connectors/

👍

I'd love to have a package we can use as a lib that would have the Connector utility class. airbyte-ci/connectors/utils ? It could be imported in pipelines and qa_checks for instance.

Do you have an opinion about this?

@bnchrch
Copy link
Contributor Author

bnchrch commented Jul 5, 2023

I'd love to have a package we can use as a lib that would have the Connector utility class. airbyte-ci/connectors/utils ? It could be imported in pipelines and qa_checks for instance.

Do you have an opinion about this?

I LOVE this. Thats my opinion. I think were in a great spot to do that. Also it would be useful in the metadata_service

@bnchrch bnchrch marked this pull request as ready for review July 7, 2023 22:46
@bnchrch bnchrch requested a review from a team July 7, 2023 22:46
@octavia-squidington-iii
Copy link
Collaborator

source-faker test report (commit 26884cd0db) - ❌

⏲️ Total pipeline duration: 03mn30s

Step Result
Validate airbyte-integrations/connectors/source-faker/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-faker docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-faker test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@airbytehq/connector-operations Gut check: We should remove this.

There is

  1. no use of ci_sonar_qube in code
  2. no use of ci_changes_detection in code
  3. the is one use of ci_sonar_qube in a github action: https://github.com/airbytehq/airbyte/pull/27957/files#diff-c46f6e02d0050829e4e9f4a4bd0eb29da35853e0786b7ddf348f665085c561da
  4. But that action is not used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never heard of this, if that helps anything!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnchrch
Copy link
Contributor Author

bnchrch commented Jul 25, 2023

@airbytehq/connector-operations This is ready for review!

Most of this is

  1. a folder restructure
  2. a tooling change
  3. a python version bump

The only file with net new logic is how we install pipx and use it in the pipelines here

...tor_ops/pipelines/actions/environments.py → ...pelines/pipelines/actions/environments.py

Please let me know what you think, I would like this to be a speedy merge to avoid conflicts!

Copy link
Contributor

@erohmensing erohmensing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installation was a breeze! Messed around with formatting a source, testing the orchestrator and metadata lib. I'm still not sure (via READMEs alone, and after a quick non-deep perusal of CI changes) how to run tests for the pipelines.

Some non-blocking feedback - I think we should merge this improvement before trying to tweak anything much more:

  • I was able to run the QA engine from where it lives with the instructions provided. I find it sort of weird it lives under airbyte ci but has a separate installation and run process
  • It looks like the README of airbyte-ci lives under airbyte-ci/connectors/pipelines - i'd expect it to live under airbyte-ci or maybe airbyte-ci/connectors. I guess this makes sense because of the install command being pipx install airbyte-ci/connectors/pipelines... this installation path confuses me since i expect airbyte-ci to be able to run the metadata service as well (and the qa engine as described above)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never heard of this, if that helps anything!

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock but could you please checkout the report path problem and the dev mode story? 😄


[tool.poetry.dependencies]
python = "^3.10"
dagger-io = "^0.6.4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before installing airbyte-ci with pipx I've set my current Python version to Python 3.10.8 with pyenv:
pyenv shell 3.10.8.
In this python version I already had installed dagger-io in a different version: 0.5.4.
When I ran pipx install airbyte-ci/connectors/pipelines/ it did not update the dagger version.

When I retried without pyenv shell the install worked correctly and the correct version is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it. This is expected!

https://github.com/pypa/pipx#how-is-it-different-from-pip

pipx actually installs all dependencies for the specified package in a directory isolated from your current python environment.

The packages you have in you global env or pyenv will have no effect on the package dependencies install with pipx and vice versa

So it makes sense to me that the dagger-io version in your pyenv shell didnt change.

But can you tell me what you ran to get the version for dagger-io?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the pipx behavior is great when you're not using pyenv shell before:
When you use pyenv shell 3.10.8 I think that pyenv does also create/reuse a venv for the specific python version you use.
In my context, my pyenv venv had an old dagger-io version that I installed with pip install dagger-io
When I ran pipx install after running pyenv shell 3.10.8 the install works but when running airbyte-ci the "old" dagger version was used and not the one declared in airbyte-ci.

It's a strange behavior I can't really explain: my suggestion is just to keep in mind that the pyenv shell + pipx combo can lead to strange install and not suggest developer to use pyenv shell.
TLDR: don't change anything on the doc, the classic install path look stable, but in case of installation issue by developers we should just keep in mind to ask them how they installed and use Python 3.10.8.

@@ -16,7 +16,8 @@ CONNECTOR_DIR="$ROOT_DIR/airbyte-integrations/connectors/$CONNECTOR_NAME"

if [ -n "$FETCH_SECRETS" ]; then
cd $ROOT_DIR
VERSION=dev $ROOT_DIR/tools/.venv/bin/ci_credentials $CONNECTOR_NAME write-to-storage || true
pipx install airbyte-ci/connectors/ci_credentials
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this script install pipx directly to make it self contained?

@@ -1,30 +1,50 @@
# Airbyte CI CLI
# Airbyte CI CLI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a link in the airbyte-ci/README.md to this file?

```

This command installs `airbyte-ci` and makes it globally available in your terminal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think folks should open a new terminal or source their bash profile to get airbyte-ci in their path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit more on the why and what an ideal install instruction looks like?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that after you run pipx install the airbyte-ci the CLI is not directly available in the PATH.
I suggest to add something like: Open a new terminal session or source your bash profile to make airbyte-ci available in your path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm it should be immediately available on your path 🤔

It wasnt for you?

Comment on lines 43 to 47
If you are developing on pipelines, we recommend installing airbyte-ci in editable mode:

```bash
pipx install airbyte-ci/connectors/pipelines/ --force --editable
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit skeptical about this approach for dev mode.
When developping on airbyte-ci I'm used to iterate in a virtual environment and have clear output of the dependencies installed when in run pip install. When using pipx install it's not obvious where is this environment located and how it would clash with an other non editable install.
I installed in editable mode, changed a dependency version, reinstall in editable mode: the dependency was not upgraded.
To check what dependencies are install I use pip freeze. In this context, as I'm not in a virtual environment the pip freeze outputs my globally installed packages, not the airbyte-ci specific one.

TLDR: in dev mode I'd like to explicitely switch to a python virtualenv and get an easy way to check what deps are installed.

The solution here is probably to use poetry instead of pipx for development:

pyenv shell 3.10.8 # I can't make poetry work with python 3.9, I face this issue https://github.com/python-poetry/poetry/issues/7695
poetry install # this creates a virtualenv (.venv) in the package folder
poetry shell # activates the virtualenv
cd ../../..
airbyte-ci

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thumbs up to using poetry for dev!

Seems a lot more approachable and intuitive :)

@@ -32,6 +32,6 @@
GRADLE_CACHE_PATH = "/root/.gradle/caches"
GRADLE_BUILD_CACHE_PATH = f"{GRADLE_CACHE_PATH}/build-cache-1"
GRADLE_READ_ONLY_DEPENDENCY_CACHE_PATH = "/root/gradle_dependency_cache"
LOCAL_REPORTS_PATH_ROOT = "tools/ci_connector_ops/pipeline_reports/"
LOCAL_REPORTS_PATH_ROOT = "airbyte-ci/connectors/pipelines/pipeline_reports/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path looks good, but when I execute an empty pipeline_reports folder is created at:

airbyte-ci/airbyte-ci/connectors/pipelines/pipeline_reports

And the actual test reports are written a very nested location:

airbyte-ci/connectors/pipelinespipeline_reports/airbyte-ci/connectors/test/manual/bnchrch_ci_poetry-pipelines/1690300788/26884cd0dbe194727b5982e7b67c96e7b60573a6/dagger.log

Something is probably missing up with the path in reports.py or context.py:

        local_path = anyio.Path(f"{LOCAL_REPORTS_PATH_ROOT}/{self.report_output_prefix}/{filename}")
        await local_path.parents[0].mkdir(parents=True, exist_ok=True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh good catch. Let me see whats going on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see whats going on

empty pipeline_reports folder created
This is an is issue with where we create the report directory which is only discovered if you run the airbyte-ci cmd not in the root.

Ill move this closer to us writing the log

And the actual test reports are written a very nested location:
It is very nested but expected

The log path airbyte-ci/connectors/pipelines/pipeline_reports/airbyte-ci/connectors/test/manual/bnchrch_ci_poetry-pipelines/1690300788/26884cd0dbe194727b5982e7b67c96e7b60573a6/dagger.log has three discrete parts

  1. The parent folder airbyte-ci/connectors/pipelines/
  2. The airbyte-ci defined path structure airbyte-ci/connectors/test/
  3. the git info manual/bnchrch_ci_poetry-pipelines/1690300788/26884cd0dbe194727b5982e7b67c96e7b60573a6

so you will note that #1 and #2 are nearly identical.

and that turns out is a coincidence.

The airbyte-ci defined path structure is derived from the click command hierarchy airbyte-ci connectors test --name=...

requests = "^2.28.2"
connector-ops = {path = "../connector_ops"}
toml = "^0.10.2"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some testing dependencies? (feel free to copy the one that were in the main setup.py

@octavia-squidington-iii
Copy link
Collaborator

source-faker test report (commit 68758f99ba) - ❌

⏲️ Total pipeline duration: 03mn22s

Step Result
Validate airbyte-integrations/connectors/source-faker/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-faker docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-faker test

@bnchrch
Copy link
Contributor Author

bnchrch commented Jul 26, 2023

@alafanechere @pedroslopez

Just updated the branch to

  1. Have poetry based dev install instructions
  2. Have pipeline test deps declared
  3. Remove ci_code_validator folder and related action
  4. Upgrade .github/workflows/connector-performance-command.yml to 3.10
  5. Ran the python formatter on airbyte-ci
  6. Updated the logging code to not create the dagger.log parent folder until nessesary

Im going to run some test pipelines now to ensure everything works as planned

@octavia-squidington-iii
Copy link
Collaborator

source-faker test report (commit 4090391bac) - ❌

⏲️ Total pipeline duration: 03mn17s

Step Result
Validate airbyte-integrations/connectors/source-faker/metadata.yaml
Connector version semver check
Connector version increment check
QA checks
Code format checks
Connector package install
Build source-faker docker image for platform linux/x86_64
Unit tests
Acceptance tests

🔗 View the logs here

Please note that tests are only run on PR ready for review. Please set your PR to draft mode to not flood the CI engine and upstream service on following commits.
You can run the same pipeline locally on this branch with the airbyte-ci tool with the following command

airbyte-ci connectors --name=source-faker test

@bnchrch
Copy link
Contributor Author

bnchrch commented Jul 26, 2023

Test

prepublish

Edit: everything passed, moving onto reverting source-faker changes

@octavia-squidington-iii octavia-squidington-iii removed the area/connectors Connector related issues label Jul 26, 2023
@alafanechere
Copy link
Contributor

alafanechere commented Jul 26, 2023

@bnchrch I pushed a gitignore change to ignore the legacy pipeline reports paths. Otherwise a git add . will stage all the old report files a developer might have locally (like I did 😄 ).

@alafanechere
Copy link
Contributor

@bnchrch I updated the poetry lock following the test dependency declaration.

@bnchrch bnchrch enabled auto-merge (squash) July 26, 2023 15:28
@bnchrch bnchrch merged commit fb7258e into master Jul 26, 2023
@bnchrch bnchrch deleted the bnchrch/ci/poetry-pipelines branch July 26, 2023 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation CDK Connector Development Kit connectors/source/faker
Projects
None yet
5 participants