Skip to content

Updated workflows/actions to dynamically generate an .env file #552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

validbeck
Copy link
Collaborator

@validbeck validbeck commented Nov 26, 2024

Internal Notes for Reviewers

sc-7505

Updated, updated Miro board:

New Docs Infra Workflow

Workflows updates

.env file now created on the fly

Screenshot 2024-11-26 at 12 43 10 PM
  # If yes then create the .env file for use in execution step
  - name: Create .env file
    if: steps.filter.outputs.notebooks == 'true'
    id: create_env
    run: |
      touch .env
      echo VM_API_HOST=${{ secrets.PLATFORM_API_HOST }} >> .env 
      echo VM_API_KEY=${{ secrets.PLATFORM_API_KEY }} >> .env 
      echo VM_API_SECRET=${{ secrets.PLATFORM_API_SECRET }} >> .env 
      echo VM_API_MODEL=${{ secrets.PLATFORM_DEV_MODEL }} >> .env 
      cat .env 

# If yes then create the .env file for use in execution step

# If yes then create the .env file for use in execution step

# If yes then create the .env file for use in execution step

Execution step now passes in the generated .env file into the composite actions

  # Only execute the demo notebook if .env file is created
  - name: Execute demo Intro for Model Developers notebook
    if: ${{ steps.create_env.outcome == 'success' }}
    uses: ./.github/actions/demo-notebook
    id: execute-demo-notebook
    with:
      env_file: .env

# Only execute the demo notebook if .env file is created

# Only execute the staging notebook if .env file is created

# Only execute the prod notebook if .env file is created

Notebook filters now check against the incoming branch instead of against main when applicable

This is to get around the default condition that caused this push to prod to filter incorrectly: https://github.com/validmind/documentation/actions/runs/12039272835/job/33566716893

Actions updates

Actions now receive the generated .env file as an input

inputs:
  env_file:
    description: "Load the created .env file"
    required: true

Step to double-check .env is actually available within the composite action where expected

  - name: Ensure .env file is available
    shell: bash
    id: find_env
    run: |
      if [ ! -f "${{ inputs.env_file }}" ]; then
        echo "Error: .env file not found at ${{ inputs.env_file }}"
        exit 1
      fi

- name: Ensure .env file is available

- name: Ensure .env file is available

- name: Ensure .env file is available

Only execute the notebook if .env is found, and pass the .env in for the notebook from the root

  - name: Execute ONLY the Intro for Model Developers notebook with heap development
    shell: bash
    if: ${{ steps.find_env.outcome == 'success' }}
    run: |
        cd site 
        source ../${{ inputs.env_file }}
        quarto render --profile exe-demo notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb &> render_errors.log || { 
        echo "Execute for intro_for_model_developers_EXECUTED.ipynb failed";
        cat render_errors.log;
        exit 1;
        }

- name: Execute ONLY the Intro for Model Developers notebook with heap development

- name: Execute ONLY the Intro for Model Developers notebook with heap production

- name: Execute ONLY the Intro for Model Developers notebook with heap staging

Makefile updates

New execute command

PROFILE := exe-demo
FILE_PATH := notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb 

# Execute a Jupyter Notebook  
# Will default to `exe-demo` profile & the `notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb` if no input provided
# To override: make execute PROFILE=select-profile FILE_PATH=notebooks/notebook-path-here.ipynb
execute:
	quarto render --profile $(PROFILE) $(FILE_PATH)

Screenshot 2024-11-26 at 2 35 07 PM

  • There is now a Make command that allows you to run the execution locally on your profile of choice as long as you have valid .env credentials and any notebook dependencies installed:

# Execute a Jupyter Notebook

  • .env.example has been updated as well to accommodate local execution:
# API INFO TO RUN /notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb
VM_API_HOST=<api_host>
VM_API_KEY=<api_key>
VM_API_SECRET=<api_secret>
VM_API_MODEL=<model>

# API INFO TO RUN /notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb

get-source makes a copy of intro_for_model_developers.ipynb

@echo "Duplicating notebooks/tutorials/intro_for_model_developers.ipynb for execution"
@cp notebooks/tutorials/intro_for_model_developers.ipynb notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb

Instead of reverting notebooks/tutorials/intro_for_model_developers_EXECUTED.ipynb to the copy on main, now it is made on the fly by copying the source and renaming the file, ELIMINATING OUR EDGE CASE UPKEEP IN THE SOURCE!!! 🎉

@echo "Duplicating notebooks/tutorials/intro_for_model_developers.ipynb for execution"

Docs updates

Store model credentials in .env files

Old New
Screenshot 2024-11-26 at 3 21 26 PM Screenshot 2024-11-26 at 2 11 33 PM

LIVE PREVIEW

  • I updated these instructions to actually be clear and easy to follow
  • Added some examples
  • And an edge case for ongoing monitoring

Enable monitoring

Old New
Screenshot 2024-11-26 at 3 21 40 PM Screenshot 2024-11-26 at 2 12 32 PM

LIVE PREVIEW

Added the .env edge case here as well.

Expected behaviour

PR > main

main > staging

  • After this PR is merged into main, the workflow merging into staging from main will trigger the notebook execution: WORKFLOW LINK PLACEHOLDER

staging > prod

  • The next time we publish to prod, the workflow merging into staging from prod will trigger the notebook execution: WORKFLOW LINK PLACEHOLDER

Workflow environment warnings

I couldn't figure out how to get rid of these without modifying the notebook source. Ugh. Anything I tried in the workflow actions to shift the dependencies or spawn method didn't work.

Polars deadlock

Using fork() can cause Polars to deadlock in the child process.
In addition, using fork() with Python in general is a recipe for mysterious
deadlocks and crashes.

The most likely reason you are seeing this error is because you are using the
multiprocessing module on Linux, which uses fork() by default. This will be
fixed in Python 3.14. Until then, you want to use the "spawn" context instead.

See https://docs.pola.rs/user-guide/misc/multiprocessing/ for details.

Affects "Run the model evaluation tests" & "Run the model evaluation tests" sections:

Run the model evaluation tests Run the model evaluation tests
Screenshot 2024-11-26 at 3 10 06 PM Screenshot 2024-11-26 at 3 10 33 PM

LangChain uses pydantic v2 internally

As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain_core.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.

For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet.     from pydantic.v1 import BaseModel

As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.

For example, replace imports like: `from langchain.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet.     from pydantic.v1 import BaseModel

Affects the "Verify & preview the documentation template" section:

Screenshot 2024-11-26 at 3 09 33 PM

SHAPGlobalImportance error

Juan confirmed he is still looking into this; I passed him some more details today for sc-7456.

External Release Notes

ValidMind Academy

  • The ValidMind Academy Developer Fundamentals course has been updated with an improved version of the "ValidMind Introduction for Model Developers" Jupyter Notebook.
  • Now this embedded notebook is executed live within the training, allowing you to interact with output cells previously omitted, such as when previewing the documentation template.
  • This training notebook serves as a reference guide for users as to what to expect when they first run the cells.

Jupyter Notebooks

  • We've now included details on how to initialize ValidMind with credentials stored in an .env file in our Jupyter Notebook samples.
  • Our documentation guides have also been updated to match this new experience.

@validbeck validbeck added the internal Not to be externalized in the release notes label Nov 26, 2024
@validbeck validbeck self-assigned this Nov 26, 2024
@validbeck validbeck added documentation Improvements or additions to documentation and removed internal Not to be externalized in the release notes labels Nov 26, 2024
@validbeck validbeck requested a review from nrichers November 26, 2024 23:37
Copy link
Contributor

PR Summary

This pull request introduces several enhancements to the environment configuration and execution of Jupyter Notebooks within the project. Key changes include:

  1. Environment Configuration:

    • Updated the .env.example file to include new environment variables: VM_API_HOST, VM_API_KEY, VM_API_SECRET, and VM_API_MODEL.
    • Modified GitHub Actions (demo-notebook, prod-notebook, staging-notebook) to require an env_file input, ensuring that the .env file is present before executing notebooks.
    • Updated workflow files (deploy-docs-prod.yaml, deploy-docs-staging.yaml, validate-docs-site.yaml) to create a .env file dynamically using secrets before executing notebooks.
  2. Notebook Execution:

    • Added a new execute target in the Makefile to facilitate the execution of Jupyter Notebooks with specified profiles and file paths.
    • Updated the Makefile to duplicate notebooks for execution rather than checking out from the main branch.
  3. Documentation Updates:

    • Enhanced documentation in store-credentials-in-env-file.qmd to provide detailed instructions on storing model credentials in .env files and using them in notebooks.
    • Added examples and explanations for enabling monitoring when using .env files in enable-monitoring.qmd.

These changes improve the flexibility and security of environment management and streamline the process of executing and monitoring Jupyter Notebooks.

Test Suggestions

  • Verify that the .env file is correctly created and populated with secrets in all GitHub Actions workflows.
  • Test the execution of notebooks in demo, staging, and production environments to ensure they run successfully with the new .env configuration.
  • Check that the execute target in the Makefile correctly executes a Jupyter Notebook with the specified profile and file path.
  • Ensure that the documentation changes accurately reflect the new process for storing credentials and enabling monitoring.

Copy link
Collaborator

@nrichers nrichers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for switching over to using .env files generated on the fly for the different workflows and fixing/improving a whole slew of other things as well — the make execute action is a great touch! 🚀🚀🚀

This PR actually opens up other possibilities for the future as well, such as executing more notebooks or surfacing code cells from notebooks that require access to the VM library to work.

Some general comments follow — these are NOT requests to make changes now, just some observations or suggestions we can look into for the future.

macOS runners

Related to the Polars deadlock specifically, we could switch our GitHub runners over to macOS, say, macos-latest. It currently includes Python 3.13.0.

We've already noticed some other issues in our docs where the behavior on Linux is subtly different compared to how content builds locally on macOS ... (Test descriptions, anyone?)

However, we might run into other subtle differences related to Python environments with this change, so it's definitely something we would want to test further before making a decision.

Disabling warnings in notebooks

Related to warnings in notebooks more generally, it looks like there is a built-in Python package called warnings that could be imported and then used to filter out warnings at the notebook level:

import warnings
warnings.filterwarnings('ignore')

More info in https://docs.python.org/3/library/warnings.html. I might experiment with this package the next time I record videos and report back.

What's the fallback when things fail?

Since this PR and the previous, related PR change how things work, we should figure out what the process is when rendering the docs site locally with the intent to deploy, e.g. to hot-fix a docs site issue where our GitHub workflow is not working or GitHub is down. It's unlikely we'd be doing this very often but it has happened.

I think we would run the following:

  1. make get-source
  2. quarto render (there's technically make docs-site that includes step 1. but it currently uses the development profile)
  3. make execute
  4. make deploy-prod (disable the branch protection rules or run from the local prod branch after merging in the latest changes)

Does that match your expectation?

@validbeck
Copy link
Collaborator Author

Related to the Polars deadlock specifically, we could switch our GitHub runners over to macOS, say, macos-latest. It currently includes Python 3.13.0.

Yes, but the Linux environment was actually a lifesaver here. SO much smoother — on MacOS you run into this edge case with the dependencies: https://www.notion.so/validmind/Documented-Issues-Troubleshooting-bc9a38fc9ab34afd88251a92265c5a6f?pvs=4

Related to warnings in notebooks more generally, it looks like there is a built-in Python package called warnings that could be imported and then used to filter out warnings at the notebook level:

Interesting! I think they're OK in this one — sometimes the output being "imperfect" is sort of part of the experience, kind of like when product tour videos are a bit outdated. It has some organic/human element to it. In this case it still runs all the way through, so I will leave it for now but I will definitely play around with that package at some point just to see what it can do.

Does that match your expectation?

Yes, I was going to log a Story for next Sprint for putting the Miro board and some written internal process docs for our our infra works today, and that workaround would go on it!

@nrichers Can you pretty please review/approve validmind/validmind-library#251 so I can call make get-source on this PR before I merge to trigger the workflows? 🙏🏻

@nrichers
Copy link
Collaborator

@nrichers Can you pretty please review/approve validmind/validmind-library#251 so I can call make get-source on this PR before I merge to trigger the workflows? 🙏🏻

Taking a look now, @validbeck!

Copy link
Contributor

A PR preview is available: Preview URL

@validbeck validbeck merged commit fe39b6a into main Nov 28, 2024
5 checks passed
@validbeck validbeck deleted the beck/sc-7505/update-relevant-validmind-library-notebooks branch November 28, 2024 00:32
@validbeck validbeck mentioned this pull request Nov 28, 2024
4 tasks
@validbeck
Copy link
Collaborator Author

@nrichers Not quite there... for example, when merging into main into staging the filter now returns false: https://github.com/validmind/documentation/actions/runs/12060349011/job/33630603928

I think I know why the comparison is failing — our branches essentially match between merges, so we actually have to compare the PR being merged to the base instead: #554

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants