Skip to content

Conversation

jortiz16
Copy link
Collaborator

@jortiz16 jortiz16 commented Apr 2, 2024

The README.md includes instructions to run the scripts. There are four scripts in total. Two are for object tables and the other two are for structured tables.

There are two scripts based whether the input data is in an object table or a native BigQuery table.

## Object table script
The object table script creates a target table to store successful annotations. To do this, it calls the inference in a loop. In the first iteration, a small LIMIT is set on the inference call to quickly create a table with the desired schema. The number of rows to process for each inference call can be modified through the batch_size parameter.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just for annotations, but for the result of the ML operation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use "`" to quote the batch_size to make it clear it's a code parameter.

This script applies to the following models:
- ML.ANNOTATE_IMAGE
- ML.PROCESS_DOCUMENT
- ML.TRANSCRIBE
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also for the ML.GENERATE_TEXT with the vision model.

ml_query
DEFAULT
FORMAT(
"SELECT %s, text AS content FROM `%s`", ARRAY_TO_STRING(key_columns, ','), source_table);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the default, probably simpler to just have it as:

DECLARE ml_query DEFAULT "SELECT *, /* ML operation dependent field */ FROM `" || source_table || "`";

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants