Skip to content

Conversation

@CYarros10
Copy link
Contributor

@CYarros10 CYarros10 commented Aug 27, 2024

This pull request adds the following:

SupervisedFineTuningHook: Hook for Google Cloud Vertex AI Supervised Fine Tuning APIs.
SupervisedFineTuningTrainOperator: Use the Google Cloud Supervised Fine Tuning API to create a tuning job.

About Model tuning: a crucial process in adapting Gemini to perform specific tasks with greater precision and accuracy. Model tuning works by providing a model with a training dataset that contains a set of examples of specific downstream tasks.

A sample DAG containing these operators could look like:
JSONL training data arrives in GCS >> GCSObjectExistenceSensor >> SupervisedFineTuningTrainOperator >> GenerativeModelGenerateContentOperator

@MaksYermak
Copy link
Contributor

@CYarros10 What do you think about adding Links to this Operator and, maybe, for the previous operators related to generative AI? It is an example of code how it looks for PipelineJob https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/vertex_ai/pipeline_job.py#L115 and https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/links/vertex_ai.py#L329

@CYarros10
Copy link
Contributor Author

CYarros10 commented Aug 29, 2024

@CYarros10 What do you think about adding Links to this Operator and, maybe, for the previous operators related to generative AI? It is an example of code how it looks for PipelineJob https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/vertex_ai/pipeline_job.py#L115 and https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/links/vertex_ai.py#L329

I think this is a great idea, I would love to do this but want to prioritize CountTokensAPI and EvaluationAPI as these could be part of a broader LLM pipeline with SupervisedTuningFineTrainOperator and GenerativeModelGenerateContentOperator. Will work on Links when I've completed those first!

@CYarros10
Copy link
Contributor Author

CYarros10 commented Aug 29, 2024

Refactored the code in this PR to be included in airflow/airflow/providers/google/cloud/operators/vertex_ai/generative_model.py - to keep all generative AI / generative model operations in one place. as well as hooks, tests, docs, etc. Feel free to add thoughts @MaksYermak - thank you for the review!

Copy link
Contributor

@MaksYermak MaksYermak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one

@potiuk potiuk merged commit 35ce2f1 into apache:main Aug 30, 2024
@CYarros10 CYarros10 deleted the sft-tuning-operator branch August 30, 2024 16:15
@CYarros10 CYarros10 restored the sft-tuning-operator branch August 30, 2024 16:16
@CYarros10 CYarros10 deleted the sft-tuning-operator branch August 30, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants