This series of notebooks will introduce Vertex AI AutoML with a focus on Tabular data Classification Methods.
Vertex AI AutoML accelerate the workflow of creating an ML model by preprocessing the data and choosing model architectures for you, even testing multiple and creating ensembles to achieve a best model. This is available for ML models on text, image, video, and tabular data.
Prerequisites
AutoML is a service on Vertex AI that creates custom models from users data. The data source for AutoML jobs is a Vertex AI managed dataset. These managed datasets are links to actual data locations in GCS or BigQuery. The data is not imported so each training job that uses them will always grab a current version of the data source. When creating a dataset, the location selected needs to match the location of the linked data (like us-central1
for example). AutoML training jobs use these datasets as inputs. The AutoML service availability by region should be reviewed to make sure it is available in the data location - feature availability.
When using AutoML from BigQuery ML a Vertex AI managed dataset is not required. Instead, the BigQuery locations should be checked for AutoML availability via this table of BigQuery ML resource locations.
This list is in the suggest order of review for anyone getting an overview and learning about Vertex AI AutoML. It is also ok to pick a particular notebook of interest and if there are dependencies on prior notebooks they will be listed in the prerequisites section at the top of the notebook.
The notebooks are designed to be editable for trying with other data sources. The same parameter names are used across the notebooks to also help when trying multiple methods on a custom data source.
- 02a - Vertex AI - AutoML in GCP Console (no code).ipynb
- 02b - Vertex AI - AutoML with clients (code).ipynb
- 02c - Vertex AI > Pipelines - AutoML with clients (code) in automated pipeline.ipynb
- BQML AutoML Using AutoML directly from BigQuery ML