Updated workflows/actions to dynamically generate an `.env` file #552

validbeck · 2024-11-26T23:17:45Z

Internal Notes for Reviewers

sc-7505

Updated, updated Miro board:

Workflows updates

`.env` file now created on the fly

  # If yes then create the .env file for use in execution step
  - name: Create .env file
    if: steps.filter.outputs.notebooks == 'true'
    id: create_env
    run: |
      touch .env
      echo VM_API_HOST=${{ secrets.PLATFORM_API_HOST }} >> .env 
      echo VM_API_KEY=${{ secrets.PLATFORM_API_KEY }} >> .env 
      echo VM_API_SECRET=${{ secrets.PLATFORM_API_SECRET }} >> .env 
      echo VM_API_MODEL=${{ secrets.PLATFORM_DEV_MODEL }} >> .env 
      cat .env

documentation/.github/workflows/validate-docs-site.yaml

Line 43 in 4568aa8

# If yes then create the .env file for use in execution step

documentation/.github/workflows/deploy-docs-staging.yaml

Line 44 in 4568aa8

# If yes then create the .env file for use in execution step

documentation/.github/workflows/deploy-docs-prod.yaml

Line 44 in 4568aa8

# If yes then create the .env file for use in execution step

Execution step now passes in the generated `.env` file into the composite actions

  # Only execute the demo notebook if .env file is created
  - name: Execute demo Intro for Model Developers notebook
    if: ${{ steps.create_env.outcome == 'success' }}
    uses: ./.github/actions/demo-notebook
    id: execute-demo-notebook
    with:
      env_file: .env

documentation/.github/workflows/validate-docs-site.yaml

Line 55 in 4568aa8

# Only execute the demo notebook if .env file is created

documentation/.github/workflows/deploy-docs-staging.yaml

Line 56 in 4568aa8

# Only execute the staging notebook if .env file is created

documentation/.github/workflows/deploy-docs-prod.yaml

Line 56 in 4568aa8

# Only execute the prod notebook if .env file is created

Notebook filters now check against the incoming branch instead of against `main` when applicable

This is to get around the default condition that caused this push to prod to filter incorrectly: https://github.com/validmind/documentation/actions/runs/12039272835/job/33566716893

documentation/.github/workflows/validate-docs-site.yaml

Line 37 in 4568aa8

base: main

documentation/.github/workflows/deploy-docs-staging.yaml

Line 38 in 4568aa8

base: staging

documentation/.github/workflows/deploy-docs-prod.yaml

Line 38 in 4568aa8

base: prod

Actions updates

Actions now receive the generated `.env` file as an input

inputs:
  env_file:
    description: "Load the created .env file"
    required: true

documentation/.github/actions/demo-notebook/action.yml

Line 4 in 4568aa8

inputs:

documentation/.github/actions/prod-notebook/action.yml

Line 4 in 4568aa8

inputs:

documentation/.github/actions/staging-notebook/action.yml

Line 4 in 4568aa8

inputs:

Step to double-check `.env` is actually available within the composite action where expected

  - name: Ensure .env file is available
    shell: bash
    id: find_env
    run: |
      if [ ! -f "${{ inputs.env_file }}" ]; then
        echo "Error: .env file not found at ${{ inputs.env_file }}"
        exit 1
      fi

documentation/.github/actions/demo-notebook/action.yml

Line 32 in 4568aa8

- name: Ensure .env file is available

documentation/.github/actions/prod-notebook/action.yml

Line 32 in 4568aa8

- name: Ensure .env file is available

documentation/.github/actions/staging-notebook/action.yml

Line 32 in 4568aa8

- name: Ensure .env file is available

Only execute the notebook if `.env` is found, and pass the `.env` in for the notebook from the root

  - name: Execute ONLY the Intro for Model Developers notebook with heap development
    shell: bash
    if: ${{ steps.find_env.outcome == 'success' }}
    run: |
        cd site 
        source ../${{ inputs.env_file }}
        quarto render --profile exe-demo notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb &> render_errors.log || { 
        echo "Execute for intro_for_model_developers_EXECUTED.ipynb failed";
        cat render_errors.log;
        exit 1;
        }

documentation/.github/actions/demo-notebook/action.yml

Line 41 in 4568aa8

    
               - name: Execute ONLY the Intro for Model Developers notebook with heap development

documentation/.github/actions/prod-notebook/action.yml

Line 41 in 4568aa8

    
               - name: Execute ONLY the Intro for Model Developers notebook with heap production

documentation/.github/actions/staging-notebook/action.yml

Line 41 in 4568aa8

- name: Execute ONLY the Intro for Model Developers notebook with heap staging

Makefile updates

New `execute` command

PROFILE := exe-demo
FILE_PATH := notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb 

# Execute a Jupyter Notebook  
# Will default to `exe-demo` profile & the `notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb` if no input provided
# To override: make execute PROFILE=select-profile FILE_PATH=notebooks/notebook-path-here.ipynb
execute:
	quarto render --profile $(PROFILE) $(FILE_PATH)

There is now a Make command that allows you to run the execution locally on your profile of choice as long as you have valid .env credentials and any notebook dependencies installed:

documentation/site/Makefile

Line 125 in 59d5694

# Execute a Jupyter Notebook

.env.example has been updated as well to accommodate local execution:

# API INFO TO RUN /notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb
VM_API_HOST=<api_host>
VM_API_KEY=<api_key>
VM_API_SECRET=<api_secret>
VM_API_MODEL=<model>

documentation/.env.example

Line 3 in 3e3d6e8

    
           # API INFO TO RUN /notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb

`get-source` makes a copy of `intro_for_model_developers.ipynb`

@echo "Duplicating notebooks/tutorials/intro_for_model_developers.ipynb for execution"
@cp notebooks/tutorials/intro_for_model_developers.ipynb notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb

Instead of reverting notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb to the copy on main, now it is made on the fly by copying the source and renaming the file, ELIMINATING OUR EDGE CASE UPKEEP IN THE SOURCE!!! 🎉

documentation/site/Makefile

Line 47 in 59d5694

    
           @echo "Duplicating notebooks/tutorials/intro_for_model_developers.ipynb for execution"

Docs updates

Store model credentials in `.env` files

Old	New

LIVE PREVIEW

I updated these instructions to actually be clear and easy to follow
Added some examples
And an edge case for ongoing monitoring

Enable monitoring

Old	New

LIVE PREVIEW

Added the .env edge case here as well.

Expected behaviour

PR > `main`

- Initial PR creation will NOT trigger the notebook execution: SUCCESSFUL WORKFLOW RUN
- When I pull in the source after validmind-library#251 is merged, it will trigger the notebook execution: SUCCESSFUL WORKFLOW RUN

`main` > `staging`

After this PR is merged into main, the workflow merging into staging from main will trigger the notebook execution: WORKFLOW LINK PLACEHOLDER

`staging` > `prod`

The next time we publish to prod, the workflow merging into staging from prod will trigger the notebook execution: WORKFLOW LINK PLACEHOLDER

Workflow environment warnings

I couldn't figure out how to get rid of these without modifying the notebook source. Ugh. Anything I tried in the workflow actions to shift the dependencies or spawn method didn't work.

Polars deadlock

Using fork() can cause Polars to deadlock in the child process.
In addition, using fork() with Python in general is a recipe for mysterious
deadlocks and crashes.

The most likely reason you are seeing this error is because you are using the
multiprocessing module on Linux, which uses fork() by default. This will be
fixed in Python 3.14. Until then, you want to use the "spawn" context instead.

See https://docs.pola.rs/user-guide/misc/multiprocessing/ for details.

Affects "Run the model evaluation tests" & "Run the model evaluation tests" sections:

Run the model evaluation tests	Run the model evaluation tests

LangChain uses pydantic v2 internally

As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain_core.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.

For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet.     from pydantic.v1 import BaseModel

As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.

For example, replace imports like: `from langchain.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet.     from pydantic.v1 import BaseModel

Affects the "Verify & preview the documentation template" section:

SHAPGlobalImportance error

Juan confirmed he is still looking into this; I passed him some more details today for sc-7456.

External Release Notes

ValidMind Academy

The ValidMind Academy Developer Fundamentals course has been updated with an improved version of the "ValidMind Introduction for Model Developers" Jupyter Notebook.
Now this embedded notebook is executed live within the training, allowing you to interact with output cells previously omitted, such as when previewing the documentation template.
This training notebook serves as a reference guide for users as to what to expect when they first run the cells.

Jupyter Notebooks

We've now included details on how to initialize ValidMind with credentials stored in an .env file in our Jupyter Notebook samples.
Our documentation guides have also been updated to match this new experience.

github-actions · 2024-11-26T23:38:36Z

PR Summary

This pull request introduces several enhancements to the environment configuration and execution of Jupyter Notebooks within the project. Key changes include:

Environment Configuration:
- Updated the .env.example file to include new environment variables: VM_API_HOST, VM_API_KEY, VM_API_SECRET, and VM_API_MODEL.
- Modified GitHub Actions (demo-notebook, prod-notebook, staging-notebook) to require an env_file input, ensuring that the .env file is present before executing notebooks.
- Updated workflow files (deploy-docs-prod.yaml, deploy-docs-staging.yaml, validate-docs-site.yaml) to create a .env file dynamically using secrets before executing notebooks.
Notebook Execution:
- Added a new execute target in the Makefile to facilitate the execution of Jupyter Notebooks with specified profiles and file paths.
- Updated the Makefile to duplicate notebooks for execution rather than checking out from the main branch.
Documentation Updates:
- Enhanced documentation in store-credentials-in-env-file.qmd to provide detailed instructions on storing model credentials in .env files and using them in notebooks.
- Added examples and explanations for enabling monitoring when using .env files in enable-monitoring.qmd.

These changes improve the flexibility and security of environment management and streamline the process of executing and monitoring Jupyter Notebooks.

Test Suggestions

Verify that the .env file is correctly created and populated with secrets in all GitHub Actions workflows.
Test the execution of notebooks in demo, staging, and production environments to ensure they run successfully with the new .env configuration.
Check that the execute target in the Makefile correctly executes a Jupyter Notebook with the specified profile and file path.
Ensure that the documentation changes accurately reflect the new process for storing credentials and enabling monitoring.

nrichers

LGTM! Thank you for switching over to using .env files generated on the fly for the different workflows and fixing/improving a whole slew of other things as well — the make execute action is a great touch! 🚀🚀🚀

This PR actually opens up other possibilities for the future as well, such as executing more notebooks or surfacing code cells from notebooks that require access to the VM library to work.

Some general comments follow — these are NOT requests to make changes now, just some observations or suggestions we can look into for the future.

macOS runners

Related to the Polars deadlock specifically, we could switch our GitHub runners over to macOS, say, macos-latest. It currently includes Python 3.13.0.

We've already noticed some other issues in our docs where the behavior on Linux is subtly different compared to how content builds locally on macOS ... (Test descriptions, anyone?)

However, we might run into other subtle differences related to Python environments with this change, so it's definitely something we would want to test further before making a decision.

Disabling warnings in notebooks

Related to warnings in notebooks more generally, it looks like there is a built-in Python package called warnings that could be imported and then used to filter out warnings at the notebook level:

import warnings
warnings.filterwarnings('ignore')

More info in https://docs.python.org/3/library/warnings.html. I might experiment with this package the next time I record videos and report back.

What's the fallback when things fail?

Since this PR and the previous, related PR change how things work, we should figure out what the process is when rendering the docs site locally with the intent to deploy, e.g. to hot-fix a docs site issue where our GitHub workflow is not working or GitHub is down. It's unlikely we'd be doing this very often but it has happened.

I think we would run the following:

make get-source
quarto render (there's technically make docs-site that includes step 1. but it currently uses the development profile)
make execute
make deploy-prod (disable the branch protection rules or run from the local prod branch after merging in the latest changes)

Does that match your expectation?

validbeck · 2024-11-27T22:50:48Z

Related to the Polars deadlock specifically, we could switch our GitHub runners over to macOS, say, macos-latest. It currently includes Python 3.13.0.

Yes, but the Linux environment was actually a lifesaver here. SO much smoother — on MacOS you run into this edge case with the dependencies: https://www.notion.so/validmind/Documented-Issues-Troubleshooting-bc9a38fc9ab34afd88251a92265c5a6f?pvs=4

Related to warnings in notebooks more generally, it looks like there is a built-in Python package called warnings that could be imported and then used to filter out warnings at the notebook level:

Interesting! I think they're OK in this one — sometimes the output being "imperfect" is sort of part of the experience, kind of like when product tour videos are a bit outdated. It has some organic/human element to it. In this case it still runs all the way through, so I will leave it for now but I will definitely play around with that package at some point just to see what it can do.

Does that match your expectation?

Yes, I was going to log a Story for next Sprint for putting the Miro board and some written internal process docs for our our infra works today, and that workaround would go on it!

@nrichers Can you pretty please review/approve validmind/validmind-library#251 so I can call make get-source on this PR before I merge to trigger the workflows? 🙏🏻

nrichers · 2024-11-27T23:02:15Z

@nrichers Can you pretty please review/approve validmind/validmind-library#251 so I can call make get-source on this PR before I merge to trigger the workflows? 🙏🏻

Taking a look now, @validbeck!

github-actions · 2024-11-28T00:29:54Z

A PR preview is available: Preview URL

validbeck · 2024-11-28T00:53:21Z

@nrichers Not quite there... for example, when merging into main into staging the filter now returns false: https://github.com/validmind/documentation/actions/runs/12060349011/job/33630603928

I think I know why the comparison is failing — our branches essentially match between merges, so we actually have to compare the PR being merged to the base instead: #554

validbeck added 6 commits November 26, 2024 13:16

Updated store-credentials-in-env-file.qmd

093a205

Added Enable monitoring edge case

55044fe

Pulled in the working tests from beck/no-changed-notebooks

152666d

Duplicated workflow & actions to staging & prod

3e3d6e8

Updating workflows to check against the correct branches

4568aa8

Adding Make command

59d5694

validbeck added the internal Not to be externalized in the release notes label Nov 26, 2024

validbeck self-assigned this Nov 26, 2024

validbeck added documentation Improvements or additions to documentation and removed internal Not to be externalized in the release notes labels Nov 26, 2024

validbeck requested a review from nrichers November 26, 2024 23:37

nrichers approved these changes Nov 27, 2024

View reviewed changes

Retrieving latest from validmind-library

5ffe0a9

validbeck merged commit fe39b6a into main Nov 28, 2024
5 checks passed

validbeck deleted the beck/sc-7505/update-relevant-validmind-library-notebooks branch November 28, 2024 00:32

validbeck mentioned this pull request Nov 28, 2024

Quick workflow filter tweak #554

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updated workflows/actions to dynamically generate an `.env` file #552

Updated workflows/actions to dynamically generate an `.env` file #552

Uh oh!

validbeck commented Nov 26, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Nov 26, 2024

Uh oh!

nrichers left a comment

Uh oh!

validbeck commented Nov 27, 2024

Uh oh!

nrichers commented Nov 27, 2024

Uh oh!

github-actions bot commented Nov 28, 2024

Uh oh!

Uh oh!

validbeck commented Nov 28, 2024

Uh oh!

Uh oh!

Updated workflows/actions to dynamically generate an .env file #552

Updated workflows/actions to dynamically generate an .env file #552

Uh oh!

Conversation

validbeck commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Internal Notes for Reviewers

Workflows updates

.env file now created on the fly

Execution step now passes in the generated .env file into the composite actions

Notebook filters now check against the incoming branch instead of against main when applicable

Actions updates

Actions now receive the generated .env file as an input

Step to double-check .env is actually available within the composite action where expected

Only execute the notebook if .env is found, and pass the .env in for the notebook from the root

Makefile updates

New execute command

get-source makes a copy of intro_for_model_developers.ipynb

Docs updates

Store model credentials in .env files

Enable monitoring

Expected behaviour

PR > main

main > staging

staging > prod

Workflow environment warnings

Polars deadlock

LangChain uses pydantic v2 internally

SHAPGlobalImportance error

External Release Notes

ValidMind Academy

Jupyter Notebooks

Uh oh!

github-actions bot commented Nov 26, 2024

PR Summary

Test Suggestions

Uh oh!

nrichers left a comment

Choose a reason for hiding this comment

macOS runners

Disabling warnings in notebooks

What's the fallback when things fail?

Uh oh!

validbeck commented Nov 27, 2024

Uh oh!

nrichers commented Nov 27, 2024

Uh oh!

github-actions bot commented Nov 28, 2024

Uh oh!

Uh oh!

validbeck commented Nov 28, 2024

Uh oh!

Uh oh!

Updated workflows/actions to dynamically generate an `.env` file #552

Updated workflows/actions to dynamically generate an `.env` file #552

validbeck commented Nov 26, 2024 •

edited

Loading

`.env` file now created on the fly

Execution step now passes in the generated `.env` file into the composite actions

Notebook filters now check against the incoming branch instead of against `main` when applicable

Actions now receive the generated `.env` file as an input

Step to double-check `.env` is actually available within the composite action where expected

Only execute the notebook if `.env` is found, and pass the `.env` in for the notebook from the root

New `execute` command

`get-source` makes a copy of `intro_for_model_developers.ipynb`

Store model credentials in `.env` files

PR > `main`

`main` > `staging`

`staging` > `prod`