Here you can find additional components for running Gaarf tool in Google Cloud.
There are the following components provided:
- Cloud Functions (in functions folder) - two CFs that you can use for running gaarf with scripts located on GCS
- Cloud Workflow (in workflow folder) - a Cloud Workflow that orchestrates enumerating scripts on GCS and calling CFs with each of them
- Interactive generator that creates all required components and shell scrips for you
So for using Cloud Workflow you need to deploy the workflow and cloud functions. But you can use cloud functions independently if you need.
The easiest way to initialize a Google Cloud infrastructure for automated running of your queries is to use our interactive generator create-gaarf-wf.
npm init gaarf-wf
To better understand what the generator creates for you we recommend to read the following sections.
NOTE: if you're installing on Google Cloud Shell then it's worth cleaning npm cache first to make sure you're using the latest versions:
rm -rf ~/.npm/
We provide several cloud functions that correlate to cli tools: gaarf and gaarf-bq. Default name of main function is 'gaarf'
and other ones' names based on it with suffixes. You can change the base name during deployment by supplying a -n
parameter for deploy.sh
and/or setup.sh
.
To deploy cloud functions you need to execute setup.sh
which enables required APIs, sets up permissions and does actual deployment of functions using deploy.sh
. So later on if something changed in functions' code or Ads API settings you can redeploy by executing deploy.sh
.
Additionally you can customize region (-r
) and memory (-m
).
There are following functions provided:
- gaarf-getcids - fetches and returns customer account ids (CIDs) for further processing
- gaarf - the main function for executing Ads queries, correlates to the
gaarf
cli tool - gaarf-bq-view - function for creating unified views for all customer-based tables in BQ with data from Ads API
- gaarf-bq - function for executing post-processing queries in BigQuery, correlates to the
gaarf-bq
cli tool
Please note that you need to copy your google-ads.yaml
somewhere where functions (gaarf and gaarf-getcids) can find it. Preferably onto Cloud Storage. Then you should provide ads_config_path
query argument or ADS_CONFIG
environment variable with a path to the Ads config (e.g. gs://myproject/path/to/google-ads.yaml). As a last resort the function will search for google-ads.yaml
locally, so we can just copy your config to function' folder before deployment.
Alternatively you can provide all configuration values for Ads API via environment variables (you can supply env vars via --env-vars-file env.yaml
cli option for gcloud tool - you'll need to adapt the deploy.sh
for this):
- DEVELOPER_TOKEN,
- LOGIN_CUSTOMER_ID
- CLIENT_ID
- CLIENT_SECRET
- REFRESH_TOKEN
Normally your functions are called by Cloud Workflow and you don't need to bother about their parameters' format.
Note that actual functions names are chosen on deployment. Here they are mentioned under default ones.
Body:
script
- an object with two fieldsquery
andname
, wherequery
contains an Ads query text (as alternative to providing it via thescript_path
query argument) andname
is a name (normally base file name would use) used for target table with data retuned by the scriptmacro
- an object with macros for ads queries
Query string:
ads_config_path
- a path to Ads config, e.g. gs://bucket/path/to/google-ads.yaml, if not passed then a local file will be tried to used (should be deployed with the function)script_path
- a path to Ads query file, currently only GCS paths are supported, e.g. gs://bucket/path/to/file, it's mandatory if script/name were not provided via bodysingle_customer
- true/false, pass true to prevent fetching child accounts for a given customer id (customer_id
)customer_id
- customer id (CID), without '-', can be specified in google-ads.yaml as well, if so then can be omittedbq_project_id
- BigQuery project id for outputbq_dataset
- BiQuery dataset id for outputbq_dataset_location
- BigQuery dataset location
Returns:
- A map of CID (customer account id) to row counts.
Body:
script
- an object with two fieldsquery
andname
, wherequery
contains an Ads query text (as alternative to providing it via thescript_path
query argument) andname
is a name (normally base file name would use) used for target table with data retuned by the scriptmacro
- an object with macros for queriessql
- an object with sql parameters for queries
Query string:
project_id
- a GCP project iddataset_location
- BigQuery dataset location
Returns:
- an object with
rowCount
field with a row count if the script returned data, otherwise (the script is a DDL like create view) response is empty
Query string:
ads_config_path
- a path to Ads config, same as for gaarfcustomer_id
- customer id (CID), without '-', can be specified in google-ads.yaml as well, if so then can be omittedcustomer_ids_query
- custom Ads query to filter customer accounts expanded fromcustomer_id
, same as same-name argument for gaarf cli tool. Query's first column should be a customer id (CID)
Body:
customer_ids_query
- same as QueryString's argument as alternative
Returns:
- Array of CIDs (customer account ids) for a given account
Query string:
project_id
- a GCP project iddataset
- BiQuery dataset iddataset_location
- BigQuery dataset locationscript_path
- a path to Ads query file to get table id from (file base name will be used)table
- table id, not needed ifscript_path
is specified
To deploy workflow for gaarf you can use gcp/workflow/setup.sh
script. It accepts parameters:
-n
or--name
- name of the workflow, by default it will begaarf-wf
-l
or--location
- location region for workflow, be default it will beus-central1
As soon as you deployed your workflow (you can use the same workflow for many projects by the way) you can call it directly (via gcloud workflows run
) or create a Scheduler job for calling it regularly.
For both methods we'll need to provide parameters for workflow.
Executing workflow directly:
gcloud workflows run gaarf-wf \
--data='{
"cloud_function":"gaarf",
...
}'
Create a Scheduler job:
PROJECT_ID=$(gcloud config get-value project 2> /dev/null)
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="csv(projectNumber)" | tail -n 1)
# be default Cloud Worflows run under the Compute Engine default service account:
SERVICE_ACCOUNT=$PROJECT_NUMBER-compute@developer.gserviceaccount.com
REGION=us-central1
WORKFLOW_NAME=gaarf-wf
JOB_NAME=$WORKFLOW_NAME
data='{
\"cloud_function\":\"gaarf\",
...
}'
# you might need to delte job first:
#gcloud scheduler jobs delete $JOB_NAME --location $REGION
# daily at midnight
gcloud scheduler jobs create http $JOB_NAME \
--schedule="0 8 * * *" \
--uri="https://workflowexecutions.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/workflows/$WORKFLOW_NAME/executions" \
--location=$REGION \
--message-body="{\"argument\": \"$data\"}" \
--oauth-service-account-email="$SERVICE_ACCOUNT"
Please notice the escaping of quotes for job's argument.
cloud_function
- name for gaarf cloud function (by defaultgaarf
but could be customized during deployment)gcs_bucket
- GCS bucket name where queries are stored, by default your GCP project idads_queries_path
- relative GCS path for ads queries, e.g. "gaarf/ads-queries/" (then workflow will fetch all files from gs://your_bucket/gaarf/adds-queries/*) (required)bq_queries_path
- relative GCS path for BigQuery queries, e.g. "gaarf/bq-queries" (required)dataset
- BigQuery dataset id for writing results of ads queries (required)bq_dataset_location
- BigQuery dataset location, e.g. "europe", by default "us"cid
- Ads customer id, can be either a MCC or child account, without dashes (required)ads_config_path
- a full GCS path to your google-ads.yaml config, e.g. "gs://MYPROJECT/path/to/google-ads.yaml" (required)ads_macro
- an object with macro for Ads queries, see the root READMEbq_macro
- an object with macro for BigQuery queries, see the root READMEbq_sql
- an object with sql parameters for BigQuery queries
For deployment we recommend using our interactive generator create-gaarf-wf. It greatly simplifies deployment. As a result of running it creates a bunch of shell scripts that reviewed here.
Common deployment scripts are:
- deploy-scripts.sh - copy queries and google-ads.yaml to GCS
- deploy-wf.sh - deploy cloud functions and cloud workflow
- schedule-wf.sh - create a schedule job with parameters
For deployment of all components you can use the setup.sh script with an argument with a name of your project.
Providing you supplied a name "myproject" the script will deploy functions with base name "myproject" ("myproject-bq", "myproject-getcids", "myproject-bq-view") and a workflow "myproject-wf". You can always customize names and other settings for components by adjusting setup.sh
scripts in components' folders.
PROJECT_ID=$(gcloud config get-value project 2> /dev/null)
if [[ "PROJECT_ID" = '' ]]
then
echo -e "${RED}There is no activate project. Activate one with gcloud config set project${NC}"
exit
fi
GCS_BUCKET=gs://${PROJECT_ID}
GCS_BASE_PATH=$GCS_BUCKET/gaarf
# create a GCS bucket (one time):
#gsutil mb -b on $GCS_BUCKET
gsutil -m cp google-ads.yaml $GCS_BASE_PATH/google-ads.yaml
gsutil rm -r $GCS_BASE_PATH/ads-queries
gsutil -m cp -R ./ads-queries/* $GCS_BASE_PATH/ads-queries/
gsutil rm -r $GCS_BASE_PATH/bq-queries
gsutil -m cp -R ./bq-queries/* $GCS_BASE_PATH/bq-queries/
cd ./ads-api-fetcher/gcp/functions
./setup.sh -n gaarf
cd ../workflow
./setup.sh -n gaarf-wf
PROJECT_ID=$(gcloud config get-value project 2> /dev/null)
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="csv(projectNumber)" | tail -n 1)
# be default Cloud Worflows run under the Compute Engine default service account:
SERVICE_ACCOUNT=$PROJECT_NUMBER-compute@developer.gserviceaccount.com
REGION=us-central1
WORKFLOW_NAME=gaarf-wf
JOB_NAME=$WORKFLOW_NAME
data='{
\"cloud_function\":\"gaarf\",
\"cloud_function_bq\":\"gaarf-bq\",
\"gcs_bucket\":\"'"$PROJECT_ID"'\",
\"ads_queries_path\":\"gaarf/ads-queries/\",
\"bq_queries_path\":\"gaarf/bq-queries/\",
\"dataset\":\"gaarf_ads\",
\"cid\":\"3717154796\",
\"ads_config_path\":\"gs://'"$PROJECT_ID"'/gaarf/google-ads.yaml\",
\"bq_dataset_location\": \"europe\",
\"ads_macro\":{ \"start_date\": \"2022-01-01\", \"end_date\": \":YYYYMMDD\" },
\"bq_macro\": {\"ads_ds\": \"gaarf_ads\", \"ds_dst\": \"gaarf\"},
\"bq_sql\": {}
}'
gcloud scheduler jobs delete $JOB_NAME --location $REGION
# daily at midnight
gcloud scheduler jobs create http $JOB_NAME \
--schedule="0 0 * * *" \
--uri="https://workflowexecutions.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/workflows/$WORKFLOW_NAME/executions" \
--location=$REGION \
--message-body="{\"argument\": \"$data\"}" \
--oauth-service-account-email="$SERVICE_ACCOUNT"
# --time-zone="TIME_ZONE"
# for time zone use names from: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
# e.g. "Etc/GMT+3"