Skip to content

Commit 89899bb

Browse files
authored
Load models from huggingface instead of blob storage (#22)
1 parent 45bd148 commit 89899bb

File tree

15 files changed

+70
-51
lines changed

15 files changed

+70
-51
lines changed

.github/actions/deps/action.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ runs:
1313
python-version: ${{inputs.python-version}}
1414
- name: Setup pip
1515
shell: sh
16-
run: |
16+
run: |
1717
python3 -m ensurepip
1818
python3 -m pip install --upgrade pip
19-
- name: Install project
19+
- name: Install project
2020
shell: sh
21-
run: pip install ".[dev,train]"
21+
run: pip install ".[dev,train]"

.github/pull_request_template.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
1-
<!--
1+
<!--
22
Thank you for your contribution to the repo :)
33
44
Pull Request (PR) Instructions:
55
Provide a general summary of your changes in the Title above. Fill out each section of the template, and replace the space with an `x` in all the boxes that apply. If you're unsure about any of these, don't hesitate to ask. We're here to help! Once you are satisfied with the pull request, click the "Create pull request" button to submit it for review.
66
77
Before submitting this PR, please ensure that your input and responses are entered in the designated space provided below each section to keep all project-related information organized and easily accessible.
8-
8+
99
How to link to a PR:
10-
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue
10+
https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue
1111
-->
1212

1313
## Change Description
14-
<!---
14+
<!---
1515
Describe your changes in detail. In your description, you should answer questions like "Why is this change required? What problem does it solve?".
1616
1717
If it fixes an open issue, please link to the issue here. If this PR closes an issue, put the word 'closes' before the issue link to auto-close the issue when the PR is merged.

.github/workflows/pre-commit-ci.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,12 @@ on:
1010
jobs:
1111
pre-commit-ci:
1212
runs-on: ubuntu-latest
13-
env:
13+
env:
1414
SKIP: "check-lincc-frameworks-template-version,pytest-check,no-commit-to-branch,validate-pyproject,check-added-large-files,sphinx-build"
1515
steps:
1616
- uses: actions/checkout@v3
1717
with:
18-
fetch-depth: 0
18+
fetch-depth: 0
1919
- name: Setup Dependencies
2020
uses: ./.github/actions/deps
2121
with:
@@ -24,4 +24,4 @@ jobs:
2424
with:
2525
extra_args: --from-ref ${{ github.event.pull_request.base.sha }} --to-ref ${{ github.event.pull_request.head.sha }}
2626
- uses: pre-commit-ci/lite-action@v1.0.1
27-
if: always()
27+
if: always()

.github/workflows/publish-to-pypi.yml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@ permissions:
1717

1818
jobs:
1919
deploy:
20-
2120
runs-on: ubuntu-latest
2221
permissions:
2322
id-token: write

.github/workflows/smoke-test.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ on:
1010
# Runs this workflow automatically
1111
schedule:
1212
- cron: 45 6 * * *
13-
13+
1414
# Allows you to run this workflow manually from the Actions tab
1515
workflow_dispatch:
1616

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,4 +147,4 @@ _results/
147147
_html/
148148

149149
# mlflow output
150-
mlruns/
150+
mlruns/

.mypy.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ explicit_package_bases = True
77
ignore_missing_imports = True
88

99
[mypy-mlflow.*]
10-
ignore_missing_imports = True
10+
ignore_missing_imports = True

.pre-commit-config.yaml

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,20 @@
11
fail_fast: true
22
repos:
33

4+
- repo: https://github.com/pre-commit/pre-commit-hooks
5+
rev: v4.4.0
6+
hooks:
7+
- id: trailing-whitespace
8+
- id: end-of-file-fixer
9+
# - id: check-docstring-first
10+
- id: check-json
11+
- id: check-yaml
12+
- id: pretty-format-json
13+
exclude: \.ipy(n|nb)$
14+
args: ["--autofix", "--indent=2", "--no-sort-keys"]
15+
416
# Compare the local template version to the latest remote template version
5-
# This hook should always pass. It will print a message if the local version
17+
# This hook should always pass. It will print a message if the local version
618
# is out of date.
719
- repo: https://github.com/lincc-frameworks/pre-commit-hooks
820
rev: v0.1.1
@@ -82,7 +94,7 @@ repos:
8294

8395

8496
# Run unit tests, verify that they pass. Note that coverage is run against
85-
# the ./src directory here because that is what will be committed. In the
97+
# the ./src directory here because that is what will be committed. In the
8698
# github workflow script, the coverage is run against the installed package
8799
# and uploaded to Codecov by calling pytest like so:
88100
# `python -m pytest --cov=<package_name> --cov-report=xml`
@@ -95,9 +107,9 @@ repos:
95107
language: system
96108
pass_filenames: false
97109
always_run: true
98-
# Make sure Sphinx can build the documentation while explicitly omitting
99-
# notebooks from the docs, so users don't have to wait through the execution
100-
# of each notebook or each commit. By default, these will be checked in the
110+
# Make sure Sphinx can build the documentation while explicitly omitting
111+
# notebooks from the docs, so users don't have to wait through the execution
112+
# of each notebook or each commit. By default, these will be checked in the
101113
# GitHub workflows.
102114
- repo: local
103115
hooks:

README.md

Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@
1111
[![codecov](https://codecov.io/gh/AutoResearch/autodoc/branch/main/graph/badge.svg)](https://codecov.io/gh/AutoResearch/autodoc)
1212
<!-- [![Read the Docs](https://img.shields.io/readthedocs/autora-doc)](https://autora-doc.readthedocs.io/) -->
1313

14-
This project was automatically generated using the LINCC-Frameworks
15-
[python-project-template](https://github.com/lincc-frameworks/python-project-template). For more information about the project template see the
14+
This project was automatically generated using the LINCC-Frameworks
15+
[python-project-template](https://github.com/lincc-frameworks/python-project-template). For more information about the project template see the
1616
[documentation](https://lincc-ppt.readthedocs.io/en/latest/).
1717

1818
## Dev Guide - Getting Started
@@ -31,24 +31,25 @@ Once you have created a new environment, you can install this project for local
3131
development using the following commands:
3232

3333
```
34-
>> pip install -e .'[dev]'
34+
>> pip install -e .'[dev,train]'
3535
>> pre-commit install
3636
>> conda install pandoc
3737
```
3838

3939
Notes:
4040
1) The single quotes around `'[dev]'` may not be required for your operating system.
41+
3) Look at `pyproject.toml` for other optional dependencies, e.g. you can do `pip install -e ."[dev,train,cuda]"` if you want to use CUDA.
4142
2) `pre-commit install` will initialize pre-commit for this local repository, so
4243
that a set of tests will be run prior to completing a local commit. For more
43-
information, see the Python Project Template documentation on
44+
information, see the Python Project Template documentation on
4445
[pre-commit](https://lincc-ppt.readthedocs.io/en/latest/practices/precommit.html)
4546
3) Install `pandoc` allows you to verify that automatic rendering of Jupyter notebooks
4647
into documentation for ReadTheDocs works as expected. For more information, see
4748
the Python Project Template documentation on
4849
[Sphinx and Python Notebooks](https://lincc-ppt.readthedocs.io/en/latest/practices/sphinx.html#python-notebooks)
4950

5051

51-
## Running AzureML pipelines
52+
## Running AzureML pipelines
5253

5354
This repo contains the evaluation and training pipelines for AutoDoc.
5455

@@ -69,21 +70,24 @@ az account set --subscription "<your subscription name>"
6970
az configure --defaults workspace=<aml workspace> group=<resource group> location=<location, e.g. westus3>
7071
```
7172

72-
### Uploading data
73-
74-
Example:
75-
```sh
76-
az storage blob upload --account-name <account> --container <container>> --file data/data.jsonl -n data/sweetpea/data.jsonl
77-
```
7873

7974
### Running jobs
8075

8176
Prediction
8277
```sh
83-
az ml job create -f azureml/eval.yml --set display_name="Test prediction job" --web
78+
az ml job create -f azureml/eval.yml --set display_name="Test prediction job" --set environment_variables.HF_TOKEN=<your huggingface token> --web
8479
```
8580

8681
Notes:
8782
- `--name` will set the mlflow run id
8883
- `--display_name` becomes the name in the experiment dashboard
89-
- `--web` argument will pop-up a browser window for tracking the job.
84+
- `--web` argument will pop-up a browser window for tracking the job.
85+
- The `HF_TOKEN` is required for gated repos, which need authentication
86+
87+
88+
### Uploading data
89+
90+
Example:
91+
```sh
92+
az storage blob upload --account-name <account> --container <container>> --file data/data.jsonl -n data/sweetpea/data.jsonl
93+
```

azureml/conda.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,4 @@ dependencies:
1515
- xformers
1616
- scipy
1717
# This works, while installing from pytorch and cuda from conda does not
18-
- torch==2.0.1
18+
- torch==2.0.1

0 commit comments

Comments
 (0)