Skip to content

Commit b202317

Browse files
naufalandikaMuhammad Naufal Andika Natsir Putra
andauthored
feat(api): image building process improvement (#404)
* feat: adjust kaniko job flags * feat: implement getHashedModelDependenciesUrl * feat: create unit test for getHashedModelDependenciesUrl * feat: refactor app.Dockerfile to use hashed dependency * feat: call getHashedModelDependenciesUrl and pass to kanikojob * chore: artifactService dependency injection * chore: adjust imagebuilder_test * chore: delete unused attribute and reorder code * chore: fix appcontext_test * feat: refactor ensembler job app.Dockerfile * feat: remove python installation process on base Dockerfile, install python on the app.Dockerfile instead * feat: remove python installation from ensembling job base image * feat: update Makefile to publish turing pyfunc-ensembler-service to pypi * chore: add version when publishing pyfunc-ensembler-service package * chore: test app.Dockerfile changes * chore: try to create new conda env instead of updating the conda env on app.Dockerfile * chore: change turing_dep value for testing * chore: change turing_dep value for testing * feat: install turing-sdk via process_conda_env.sh * chore: change TURING_DEP value * feat: use 1 base image for every python version * chore: remove BaseImageRefTag * feat: create rule to build pyfunc-ensembler-job setup.py * feat: adjust app.Dockerfile and base Dockerfile for pyfunc-ensembler-job * chore: add yq installation step on pyfunc-ensembler-job base Dockerfile * chore: fix breaking appcontext_test.go * chore: refactor line with char > 120 * chore: remove unittest related to image tag * chore: fix golangci-lint related issues * chore: fix golangci-lint related issues (2) * chore: fix golangci-lint related issue (3) * fix: fix wrong arg name in ensembler job app.Dockerfile * chore: use pyfunc-ensembler-job package instead of dummy package * feat: add Publish pyfunc-ensembler-job package step on github CI * chore: adjust github ci for testing * fix: use importlib.util instead of imp (imp was removed in python 3.12) * chore: remove TWINE auth from makefile command, use env instead * chore: adjust pypi related env * feat: add publish pyfunc-ensembler-service package to github CI * chore: adjust ci rule for testing * feat: use importlib.util instead of imp for pyfunc-ensembler-service * fix: add secret to pyfunc-ensembler-service CI * chore: remove arg in make build * feat: add turing prefix to package name * chore: revert changes related to CI testing * feat: remove python-version matrix from CI * chore: separate build and publish rule in Makefile * feat: use python/ tag to determine pyfunc-ensembler-service and pyfunc-ensembler-job version * feat: adjust make setup and make test * chore: adjust github workflow for testing * chore: remove python version requirement * fix: fix wrong make setup command * chore: debug git tag * chore: remove -y from vertagen * chore: trigger ci from python/ tag push * chore: delete Pipfile * chore: bump up numpy version * chore: try to bump python version * chore: fix invalid python version * feat: use pipenv for make type-check * chore: fix wrong folder name in make test-type * chore: add types-PyYAML to dev req * feat: adjust github CI for pyfunc-ensembler-service * feat: only run pyfunc-ensembler-* CI for python/ tag release * feat: change release-rules for sdk, release if new python/ tag is released * chore: fix make rule called by pyfunc-ensembler-* workflow, add tags trigger for sdk workflow * fix: change regex to parse tag to dist version * feat: only run pyfunc-ensembler-* workflow after sdk workflow is successfully run * feat: remove other trigger for pyfunc-ensembler-* workflow * fix: change sdk app name param for vertagen * chore: remove --platform=linux/amd64 from pyfunc-ensembler-job base Dockerfile * feat: do pyfunc-ensembler-job unit testing for python 3.8 - 3.10 * feat: check error from initArtifactService * chore: remove BaseImageRef and adjust the unittest * feat: use artifactService.GetType() instead of artifactServiceType to get artifact service type * chore: reorder install yq command in Dockerfile * feat: adjust how to get version from version.py * feat: remove from requirements.txt * feat: use conda instead of pipenv for make setup * feat: use conda instead of pipenv for make setup --------- Co-authored-by: Muhammad Naufal Andika Natsir Putra <muhammad.putra@gojek.com>
1 parent 66651cf commit b202317

35 files changed

+662
-314
lines changed

.github/workflows/pyfunc-ensembler-job.yaml

Lines changed: 20 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,19 @@
11
name: engines/pyfunc-ensembler-job
22

33
on:
4-
# Automatically run CI on Release and Pre-Release tags and main branch
5-
# (only if there are changes to relevant paths)
6-
push:
7-
tags:
8-
- "pyfunc-ensembler-job/v[0-9]+.[0-9]+.[0-9]+*"
9-
branches:
10-
- main
11-
paths:
12-
- ".github/workflows/pyfunc-ensembler-job.yaml"
13-
- "engines/pyfunc-ensembler-job/**"
14-
- "sdk/**"
15-
16-
# Automatically run CI on branches, that have active PR opened
17-
pull_request:
18-
branches:
19-
- main
20-
paths:
21-
- ".github/workflows/pyfunc-ensembler-job.yaml"
22-
- "engines/pyfunc-ensembler-job/**"
23-
- "sdk/**"
24-
25-
# To make it possible to trigger e2e CI workflow for any arbitrary git ref
26-
workflow_dispatch:
4+
# The package build by this job requires latest version of sdk,
5+
# so we can only run this workflow after sdk workflow is successfully run
6+
workflow_run:
7+
workflows: ["sdk"]
8+
types:
9+
- completed
2710

2811
jobs:
2912
test:
3013
runs-on: ubuntu-latest
3114
strategy:
3215
matrix:
3316
python-version: ["3.8", "3.9", "3.10"]
34-
3517
steps:
3618
- uses: actions/checkout@v4
3719

@@ -40,7 +22,6 @@ jobs:
4022
with:
4123
python-version: ${{ matrix.python-version }}
4224
cache-dependency-path: |
43-
engines/pyfunc-ensembler-job/env-${{ matrix.python-version }}.yaml
4425
engines/pyfunc-ensembler-job/requirements.txt
4526
engines/pyfunc-ensembler-job/requirements.dev.txt
4627
@@ -70,7 +51,7 @@ jobs:
7051
- id: release-rules
7152
uses: ./.github/actions/release-rules
7253
with:
73-
prefix: pyfunc-ensembler-job/
54+
prefix: python/
7455

7556
publish:
7657
# Automatically publish release and pre-release artifacts.
@@ -81,14 +62,12 @@ jobs:
8162
# Dev build can be released either from the 'main' branch or
8263
# by running this workflow manually with `workflow_dispatch` event.
8364
if: >-
84-
contains('release,pre-release', needs.release-rules.outputs.release-type)
65+
(contains('release,pre-release', needs.release-rules.outputs.release-type)
8566
|| ( github.event_name != 'pull_request' )
86-
|| ( github.event.pull_request.head.repo.full_name == github.repository )
67+
|| ( github.event.pull_request.head.repo.full_name == github.repository )) &&
68+
${{ github.event.workflow_run.conclusion == 'success' }}
8769
environment: ${{ needs.release-rules.outputs.release-type == 'dev' && 'manual' || '' }}
8870
runs-on: ubuntu-latest
89-
strategy:
90-
matrix:
91-
python-version: ["3.8", "3.9", "3.10"]
9271
needs:
9372
- release-rules
9473
- test
@@ -109,11 +88,20 @@ jobs:
10988
working-directory: engines/pyfunc-ensembler-job
11089
env:
11190
DOCKER_REGISTRY: ghcr.io/${{ github.repository }}
112-
PYTHON_VERSION: ${{ matrix.python-version }}
11391
run: |
11492
set -o pipefail
11593
make build-image | tee output.log
11694
echo "::set-output name=pyfunc-ensembler-job::$(sed -n 's%Building docker image: \(.*\)%\1%p' output.log)"
11795
11896
- name: Publish Pyfunc Ensembler Job Docker Image
11997
run: docker push ${{ steps.build.outputs.pyfunc-ensembler-job }}
98+
99+
- name: Publish pyfunc-ensembler-job package
100+
env:
101+
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
102+
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
103+
working-directory: engines/pyfunc-ensembler-job
104+
run: |
105+
set -o pipefail
106+
make build-and-publish | tee output.log
107+
echo "pyfunc-ensembler-job=$(sed -n 's%Building docker image: \(.*\)%\1%p' output.log)" >> $GITHUB_OUTPUT
Lines changed: 22 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,25 @@
11
name: engines/pyfunc-ensembler-service
22

33
on:
4-
# Automatically run CI on Release and Pre-Release tags and main branch
5-
# (only if there are changes to relevant paths)
6-
push:
7-
tags:
8-
- "pyfunc-ensembler-service/v[0-9]+.[0-9]+.[0-9]+*"
9-
branches:
10-
- main
11-
paths:
12-
- ".github/workflows/pyfunc-ensembler-service.yaml"
13-
- "engines/pyfunc-ensembler-service/**"
14-
- "sdk/**"
15-
16-
# Automatically run CI on branches, that have active PR opened
17-
pull_request:
18-
branches:
19-
- main
20-
paths:
21-
- ".github/workflows/pyfunc-ensembler-service.yaml"
22-
- "engines/pyfunc-ensembler-service/**"
23-
- "sdk/**"
24-
25-
# To make it possible to trigger e2e CI workflow for any arbitrary git ref
26-
workflow_dispatch:
4+
# The package build by this job requires latest version of sdk,
5+
# so we can only run this workflow after sdk workflow is successfully run
6+
workflow_run:
7+
workflows: ["sdk"]
8+
types:
9+
- completed
2710

2811
jobs:
2912
test:
3013
runs-on: ubuntu-latest
31-
strategy:
32-
matrix:
33-
python-version: ["3.8", "3.9", "3.10"]
3414

3515
steps:
3616
- uses: actions/checkout@v4
3717

38-
- name: Setup Python ${{ matrix.python-version }}
18+
- name: Setup Python
3919
uses: actions/setup-python@v5
4020
with:
41-
python-version: ${{ matrix.python-version }}
21+
python-version: "3.10"
4222
cache-dependency-path: |
43-
engines/pyfunc-ensembler-service/env-${{ matrix.python-version }}.yaml
4423
engines/pyfunc-ensembler-service/requirements.txt
4524
engines/pyfunc-ensembler-service/requirements.dev.txt
4625
@@ -64,7 +43,7 @@ jobs:
6443
- id: release-rules
6544
uses: ./.github/actions/release-rules
6645
with:
67-
prefix: pyfunc-ensembler-service/
46+
prefix: python/
6847

6948
publish:
7049
# Automatically publish release and pre-release artifacts.
@@ -75,14 +54,12 @@ jobs:
7554
# Dev build can be released either from the 'main' branch or
7655
# by running this workflow manually with `workflow_dispatch` event.
7756
if: >-
78-
contains('release,pre-release', needs.release-rules.outputs.release-type)
57+
(contains('release,pre-release', needs.release-rules.outputs.release-type)
7958
|| ( github.event_name != 'pull_request' )
80-
|| ( github.event.pull_request.head.repo.full_name == github.repository )
59+
|| ( github.event.pull_request.head.repo.full_name == github.repository )) &&
60+
${{ github.event.workflow_run.conclusion == 'success' }}
8161
environment: ${{ needs.release-rules.outputs.release-type == 'dev' && 'manual' || '' }}
8262
runs-on: ubuntu-latest
83-
strategy:
84-
matrix:
85-
python-version: ["3.8", "3.9", "3.10"]
8663
needs:
8764
- release-rules
8865
- test
@@ -103,11 +80,20 @@ jobs:
10380
working-directory: engines/pyfunc-ensembler-service
10481
env:
10582
DOCKER_REGISTRY: ghcr.io/${{ github.repository }}
106-
PYTHON_VERSION: ${{ matrix.python-version }}
10783
run: |
10884
set -o pipefail
10985
make build-image | tee output.log
11086
echo "::set-output name=pyfunc-ensembler-service-image::$(sed -n 's%Building docker image: \(.*\)%\1%p' output.log)"
11187
11288
- name: Publish Pyfunc Ensembler Service Docker Image
11389
run: docker push ${{ steps.build.outputs.pyfunc-ensembler-service-image }}
90+
91+
- name: Publish pyfunc-ensembler-service package
92+
env:
93+
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
94+
TWINE_PASSWORD: ${{ secrets.PYPI_API_TOKEN }}
95+
working-directory: engines/pyfunc-ensembler-service
96+
run: |
97+
set -o pipefail
98+
make build-and-publish | tee output.log
99+
echo "pyfunc-ensembler-service=$(sed -n 's%Building docker image: \(.*\)%\1%p' output.log)" >> $GITHUB_OUTPUT

.github/workflows/sdk.yaml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ on:
66
push:
77
tags:
88
- "sdk/v[0-9]+.[0-9]+.[0-9]+*"
9+
- "python/v[0-9]+.[0-9]+.[0-9]+*"
910
branches:
1011
- main
1112
paths:
@@ -66,7 +67,7 @@ jobs:
6667
- id: release-rules
6768
uses: ./.github/actions/release-rules
6869
with:
69-
prefix: sdk/
70+
prefix: python/
7071

7172
publish:
7273
# Automatically publish release and pre-release artifacts.

api/turing/api/appcontext.go

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import (
1010

1111
mlpcluster "github.com/caraml-dev/mlp/api/pkg/cluster"
1212

13+
"github.com/caraml-dev/mlp/api/pkg/artifact"
1314
batchensembling "github.com/caraml-dev/turing/api/turing/batch/ensembling"
1415
batchrunner "github.com/caraml-dev/turing/api/turing/batch/runner"
1516
"github.com/caraml-dev/turing/api/turing/cluster"
@@ -94,6 +95,11 @@ func NewAppContext(
9495
// Init ensemblers service
9596
ensemblersService := service.NewEnsemblersService(db)
9697

98+
artifactService, err := initArtifactService(cfg)
99+
if err != nil {
100+
return nil, errors.Wrapf(err, "Failed initializing artifact service")
101+
}
102+
97103
if cfg.BatchEnsemblingConfig.Enabled {
98104
if cfg.BatchEnsemblingConfig.JobConfig == nil {
99105
return nil, errors.Wrapf(err, "BatchEnsemblingConfig.JobConfig was not set")
@@ -124,7 +130,7 @@ func NewAppContext(
124130
ensemblingImageBuilder, err = imagebuilder.NewEnsemblerJobImageBuilder(
125131
imageBuildingController,
126132
*cfg.BatchEnsemblingConfig.ImageBuildingConfig,
127-
cfg.MlflowConfig.ArtifactServiceType,
133+
artifactService,
128134
)
129135
if err != nil {
130136
return nil, errors.Wrapf(err, "Failed initializing ensembling image builder")
@@ -159,7 +165,7 @@ func NewAppContext(
159165
ensemblerServiceImageBuilder, err := imagebuilder.NewEnsemblerServiceImageBuilder(
160166
clusterControllers[cfg.EnsemblerServiceBuilderConfig.ClusterName],
161167
*cfg.EnsemblerServiceBuilderConfig.ImageBuildingConfig,
162-
cfg.MlflowConfig.ArtifactServiceType,
168+
artifactService,
163169
)
164170
if err != nil {
165171
return nil, errors.Wrapf(err, "Failed initializing ensembler service builder")
@@ -236,3 +242,16 @@ func buildKubeconfigStore(mlpSvc service.MLPService, cfg *config.Config) (map[st
236242
}
237243
return k8sConfigStore, nil
238244
}
245+
246+
func initArtifactService(cfg *config.Config) (artifact.Service, error) {
247+
if cfg.MlflowConfig.ArtifactServiceType == "gcs" {
248+
return artifact.NewGcsArtifactClient()
249+
}
250+
if cfg.MlflowConfig.ArtifactServiceType == "s3" {
251+
return artifact.NewS3ArtifactClient()
252+
}
253+
if cfg.MlflowConfig.ArtifactServiceType == "nop" {
254+
return artifact.NewNopArtifactClient(), nil
255+
}
256+
return nil, fmt.Errorf("invalid artifact service type %s", cfg.MlflowConfig.ArtifactServiceType)
257+
}

api/turing/api/appcontext_test.go

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -70,10 +70,8 @@ func TestNewAppContext(t *testing.T) {
7070
MaxRetryCount: 3,
7171
},
7272
ImageBuildingConfig: &config.ImageBuildingConfig{
73-
DestinationRegistry: "ghcr.io",
74-
BaseImageRef: map[string]string{
75-
"3.7.*": "ghcr.io/caraml-dev/turing/pyfunc-ensembler-job:0.0.0-build.1-98b071d",
76-
},
73+
DestinationRegistry: "ghcr.io",
74+
BaseImage: "ghcr.io/caraml-dev/turing/pyfunc-ensembler-job:0.0.0-build.1-98b071d",
7775
BuildNamespace: "default",
7876
BuildTimeoutDuration: 10 * time.Minute,
7977
KanikoConfig: config.KanikoConfig{
@@ -97,10 +95,8 @@ func TestNewAppContext(t *testing.T) {
9795
EnsemblerServiceBuilderConfig: config.EnsemblerServiceBuilderConfig{
9896
ClusterName: defaultEnvironment,
9997
ImageBuildingConfig: &config.ImageBuildingConfig{
100-
DestinationRegistry: "ghcr.io",
101-
BaseImageRef: map[string]string{
102-
"3.7.*": "ghcr.io/caraml-dev/turing/pyfunc-ensembler-service:0.0.0-build.1-98b071d",
103-
},
98+
DestinationRegistry: "ghcr.io",
99+
BaseImage: "ghcr.io/caraml-dev/turing/pyfunc-ensembler-service:0.0.0-build.1-98b071d",
104100
BuildNamespace: "default",
105101
BuildTimeoutDuration: 10 * time.Minute,
106102
KanikoConfig: config.KanikoConfig{
@@ -306,10 +302,13 @@ func TestNewAppContext(t *testing.T) {
306302
alertService, err := service.NewGitlabOpsAlertService(nil, *testCfg.AlertConfig)
307303
assert.NoError(t, err)
308304

305+
artifactService, err := initArtifactService(testCfg)
306+
assert.NoError(t, err)
307+
309308
ensemblingImageBuilder, err := imagebuilder.NewEnsemblerJobImageBuilder(
310309
nil,
311310
*testCfg.BatchEnsemblingConfig.ImageBuildingConfig,
312-
testCfg.MlflowConfig.ArtifactServiceType,
311+
artifactService,
313312
)
314313
assert.Nil(t, err)
315314

@@ -343,7 +342,7 @@ func TestNewAppContext(t *testing.T) {
343342
ensemblerImageBuilder, err := imagebuilder.NewEnsemblerServiceImageBuilder(
344343
nil,
345344
*testCfg.EnsemblerServiceBuilderConfig.ImageBuildingConfig,
346-
testCfg.MlflowConfig.ArtifactServiceType,
345+
artifactService,
347346
)
348347

349348
assert.NoError(t, err)

api/turing/batch/ensembling/runner.go

Lines changed: 1 addition & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -287,28 +287,9 @@ func (r *ensemblingJobRunner) processOneEnsemblingJob(ensemblingJob *models.Ense
287287
return
288288
}
289289

290-
// Get ensembler
291-
ensembler, err := r.ensemblersService.FindByID(
292-
ensemblingJob.EnsemblerID,
293-
service.EnsemblersFindByIDOptions{ProjectID: &ensemblingJob.ProjectID},
294-
)
295-
if err != nil {
296-
r.saveStatusOrTerminate(ensemblingJob, mlpProject, models.JobPending, err.Error(), true)
297-
return
298-
}
299-
300-
// Get base image tag
301-
var baseImageTag string
302-
if pyfuncEnsembler, ok := ensembler.(*models.PyFuncEnsembler); ok {
303-
baseImageTag = pyfuncEnsembler.PythonVersion
304-
} else {
305-
r.saveStatusOrTerminate(ensemblingJob, mlpProject, models.JobPending, err.Error(), true)
306-
return
307-
}
308-
309290
// Build Image
310291
labels := r.buildLabels(ensemblingJob, mlpProject)
311-
imageRef, imageBuildErr := r.buildImage(ensemblingJob, mlpProject, labels, baseImageTag)
292+
imageRef, imageBuildErr := r.buildImage(ensemblingJob, mlpProject, labels)
312293

313294
if imageBuildErr != nil {
314295
// Here unfortunately we have to wait till the image building process has
@@ -433,7 +414,6 @@ func (r *ensemblingJobRunner) buildImage(
433414
ensemblingJob *models.EnsemblingJob,
434415
mlpProject *mlp.Project,
435416
buildLabels map[string]string,
436-
baseImageTag string,
437417
) (string, error) {
438418
request := imagebuilder.BuildImageRequest{
439419
ProjectName: mlpProject.Name,
@@ -443,7 +423,6 @@ func (r *ensemblingJobRunner) buildImage(
443423
ArtifactURI: *ensemblingJob.InfraConfig.ArtifactUri,
444424
BuildLabels: buildLabels,
445425
EnsemblerFolder: service.EnsemblerFolder,
446-
BaseImageRefTag: baseImageTag,
447426
}
448427
return r.imageBuilder.BuildImage(request)
449428
}

api/turing/config/config.go

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -169,10 +169,6 @@ type ImageBuildingConfig struct {
169169
BuildTimeoutDuration time.Duration `validate:"required"`
170170
// DestinationRegistry is the registry of the newly built ensembler image.
171171
DestinationRegistry string `validate:"required"`
172-
// BaseImageRef is the image name of the base ensembler image built from the
173-
// engines/pyfunc-ensembler-*/Dockerfile. It's a map of image names, per
174-
// minor python version supported by the SDK.
175-
BaseImageRef map[string]string `validate:"required"`
176172
// KanikoConfig contains the configuration related to the kaniko executor image builder.
177173
KanikoConfig KanikoConfig `validate:"required"`
178174
// TolerationName allow the scheduler to schedule image building jobs with the matching name
@@ -181,6 +177,8 @@ type ImageBuildingConfig struct {
181177
NodeSelector map[string]string
182178
// Value for cluster-autoscaler.kubernetes.io/safe-to-evict annotation
183179
SafeToEvict bool
180+
// BaseImageRef is the image name of the base ensembler image built from the engines/pyfunc-ensembler-*/Dockerfile
181+
BaseImage string
184182
}
185183

186184
// Resource contains the Kubernetes resource request and limits

0 commit comments

Comments
 (0)