You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+98-80Lines changed: 98 additions & 80 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,126 +1,150 @@
1
+
# GTO
2
+
1
3
[](https://github.com/iterative/gto/actions/workflows/check-test-release.yml)
Git Tag Ops. Turn your Git repository into an Artifact Registry:
6
8
7
-
Git Tag Ops. Turn your Git Repo into Artifact Registry:
8
-
* Register new versions of artifacts marking significant changes to them
9
-
* Promote versions to signal downstream systems to act
10
-
* Attach additional info about your artifact with Enrichments
11
-
* Act on new versions and promotions in CI
9
+
* Register new versions of artifacts marking releases/significant changes
10
+
* Promote versions to ordered, named stages to track their lifecycles
11
+
* GitOps: signal CI/CD automation or downstream systems to act upon these actions
12
+
* Maintain and query artifact metadata / additional info with Enrichments machinery
12
13
13
-
To turn your repo into an artifact registry, you only need to `pip install` this package. Versioning and promotion of artifacts are done by creation of special git tags. To use the artifact registry, you also need this package only.
14
+
GTO versions and promotes artifacts by creating annotated Git tags in special format.
14
15
15
-
The tool is created to be used both in CLI and in Python. The README will cover CLI part, but for all commands there are Python API counterparts in `gto.api` module.
16
+
## Installation
16
17
17
-
## Versioning
18
-
19
-
To register new version of artifact, you can use `gto register` command. You usually use those to mark significant changes to the artifact. Running `gto register` creates a special git tag.
18
+
Install GTO with pip:
20
19
21
20
```
22
-
$ gto register simple-nn HEAD --version v1.0.0
21
+
$ pip install gto
23
22
```
24
23
25
-
This will create git tag `rf@v1.0.0`.
24
+
This will install both python package with API you can use and CLI `gto` entrypoint.
26
25
27
-
## Promoting
26
+
Installing this package is enough to get started with using any repo as an artifact registry - no need to set up neither other services, nor a DB.
28
27
29
-
You could also promote a specific artifact version to Stage. You can use that to signal downstream systems to act - for example, redeploy a ML model (if your artifact is a model) or update the some special file on server (if your artifact is a file).
28
+
## Quick walkthrough
30
29
30
+
The README will cover CLI usage, but for every command there is a Python API counterpart in the [`gto.api`](/iterative/gto/blob/main/gto/api.py) module. In README we'll use this example repo: https://github.com/iterative/gto-example
To register new version of artifact, you can use `gto register` command. You usually use those to mark significant changes to the artifact. Running `gto register` creates a special git tag.
41
+
42
+
```
43
+
$ gto register rf
44
+
Created git tag 'rf@v0.0.1' that registers a new version
45
+
```
46
+
47
+
### Promoting
48
+
49
+
You could also promote a specific artifact version to Stage. Stages are statuses of your artifact specifying the readiness to be used by downstream systems. You can use promotions to signal downstream systems to act via CI/CD or webhooks - for example, redeploy a ML model (if your artifact is a model) or update the some special file on server (if your artifact is a file).
50
+
51
+
```
52
+
$ gto promote rf prod
53
+
Created git tag 'rf#prod#1' that promotes 'v0.0.1'
54
+
```
36
55
37
56
There are two notations used for git tags in promotion:
38
57
- simple: `rf#prod`
39
58
- incremental: `rf#prod-N`
40
59
41
60
Incremental is the default one and we suggest you use it when possible. The benefit of using it is that you don't have to delete git tags (with simple notation you'll need to delete them because you can't have two tags with the same name). This will keep the history of your promotions.
42
61
43
-
## Artifacts
62
+
### Artifacts
63
+
64
+
So far we've seen how to register versions and promote them, but we still didn't specify `type` of artifact (dataset, model, something else) and `path` to it. For simple workflows, when we have a single artifact, we can hardcore those to CI/CD or downstream systems. But for more advanced cases we would like to codify them - and we can do that with `artifacts.yaml` file.
44
65
45
-
So far we registered some artifacts, but we still didn't specify nowhere `type` of this artifact (dataset, model, something else) and `path` to it.
46
-
To add enrichment for artifact or remove the existing one, run `gto add` or `gto rm`:
66
+
To annotate artifact, use `gto annotate`:
47
67
48
68
```
49
-
$ gto add model simple-nn models/neural-network.pkl --virtual
69
+
$ gto annotate rf --type model --path models/neural-network.pkl
50
70
```
51
71
52
72
You could also modify `artifacts.yaml` file directly.
53
73
54
-
There are two types of artifacts in GTO:
74
+
There are two kinds of artifacts that GTO recognizes:
55
75
1. Files/folders committed to the repo. When you register a new version or promote it to stage, Git guarantees that it's immutable. You can return to your repo a year later and be able to get 100% the same artifact by providing the same version.
56
-
2.`Virtual` artifacts. This could be an external path, e.g. `s3://mybucket/myfile` or a local path if the file wasn't committed (as in case with DVC). In this case GTO can't pin the current physical state of the artifact and guarantee it's immutability. If `s3://mybucket/myfile` changes, you won't have any way neither retrieve, nor understand it's different now than it was before when you registered that artifact version.
76
+
2.`virtual` artifacts. This could be an external path, e.g. `s3://mybucket/myfile` or a local path if the file wasn't committed (as in case with DVC). In this case GTO can't pin the current physical state of the artifact and guarantee it's immutability. If `s3://mybucket/myfile` changes, you won't have any way neither retrieve, nor understand it's different now than it was before when you registered that artifact version.
57
77
58
-
In future versions, we will add enrichments (useful information other tools like DVC and MLEM can provide about the artifacts). This will allow treating files versioned with DVC and DVC PL outputs as usual artifacts instead `virtual` ones.
78
+
By default GTO treats your artifact as a `vitrual` one. To make sure it's not a vitrual one, you could supply `--must_exist` flag to `gto annotate`.
79
+
80
+
In future versions, we will add enrichments: useful information other tools like DVC and MLEM can provide about the artifacts. This will allow treating files versioned with DVC and DVC PL outputs as usual artifacts instead `virtual` ones.
59
81
60
82
## Using the registry
61
83
62
-
Let's see what are the commands that help us use the registry. Let's clone the example repo first:
Use `--all-branches` or `--all-commits` to read `artifacts.yaml` from more commits than just HEAD.
101
+
Here we'll see both artifacts that have git tags created for them (i.e. artifacts with registered or promoted versions) and artifacts that were annotated in `artifacts.yaml`. Use `--all-branches` or `--all-commits` to read `artifacts.yaml` from more commits than just HEAD.
83
102
84
103
Add artifact name to print versions of that artifact:
`gto history` will print all registered versions of the artifact and all versions promoted to environments. This will help you to understand what was happening with the artifact.
117
+
`gto history` will print a journal of events happened with your artifact. This will help you to understand what was happening and audit changes.
To act upon created git tags, you can create simple CI workflow. With GH actions it can look like this:
136
+
```
137
+
name: Act on git tags that register versions / promote "rf" actifact
138
+
on:
139
+
push:
140
+
tags:
141
+
- "rf*"
142
+
```
117
143
118
-
When CI is triggered, you can use the triggering git reference to determine the version of the artifact that was registered or promoted. In GH Actions you can use the `GITHUB_REF` environment variable to determine the version (check out GH Actions workflow in the example repo). You can also do that locally:
144
+
When CI is triggered, you can use the git reference to determine the version of the artifact that was registered or promoted. In GH Actions you can use the `GITHUB_REF` environment variable to determine the version (check out GH Actions workflow in the example repo). You can parse tags manually or use `gto check-ref`. You can check out how it works locally:
119
145
120
146
```
121
147
$ gto check-ref rf@v1.0.1
122
-
WARNING:root:Provided ref doesn't exist or it is not a tag that promotes to an environment
123
-
env: {}
124
148
version:
125
149
rf:
126
150
artifact: rf
@@ -138,58 +162,52 @@ To get the latest artifact version, it's path and git reference, run:
138
162
```
139
163
$ gto latest rf
140
164
v1.0.1
141
-
$ gto latest rf --path
142
-
models/random-forest.pkl
143
165
$ gto latest rf --ref
144
-
9fbb8664a4a48575ee5d422e177174f20e460b94
166
+
rf@v1.0.1
145
167
```
146
168
147
169
To get the version that is currently promoted to environment, run:
148
170
149
171
```
150
172
$ gto which rf production
151
173
v1.0.0
152
-
$ gto which rf production --path
153
-
models/random-forest.pkl
154
174
$ gto which rf production --ref
155
-
5eaf15a9fbb8664a4a48575ee5d422e177174f20e460b94
175
+
rf#production#2
156
176
```
157
177
158
-
To download artifacts that are stored with DVC or outside of repo, e.g. in `s3://` or in DVC cache, you'll need DVC or aws CLI.
178
+
To get details about those artifacts from `artifacts.yaml`, use `gto describe`:
179
+
```
180
+
$ gto describe rf
181
+
{
182
+
"type": "model",
183
+
"path": "models/random-forest.pkl",
184
+
"virtual": false
185
+
}
186
+
```
159
187
160
188
## Configuration
161
189
162
190
You can write configuration in `.gto` file in the root of your repo or use environment variables like this (note the `GTO_` prefix):
163
191
```shell
164
-
GTO_EMOJIS=true gto show
192
+
GTO_EMOJIS=false gto show
165
193
```
166
194
167
-
The default config written to `.gto` file will look like this (comments are there to help clarify the settings meaning and valid values):
195
+
The example config written to `.gto` file could look like this:
168
196
```
169
-
type_allowed: [] # list of allowed types
170
-
stage_allowed: [] # list of allowed Stages to promote to
197
+
type_allowed: [model, dataset] # list of allowed types
198
+
stage_allowed: [dev, stage, prod] # list of allowed Stages
171
199
```
172
200
173
-
If a list/dict should allow something but it's empty, that means that all values are allowed.
0 commit comments