Skip to content

Commit

Permalink
Adding Anti-pattern Recognition tool to the Optimization Scripts (Goo…
Browse files Browse the repository at this point in the history
…gleCloudPlatform#397)

* adding anti pattern recognition step to optimization scripts

* using viewable_queries_grouped_by_hash for anti pattern processing

* moving anti pattern recognition tool steps to separate script

* fixing bug in column names

* fixing bug in column names

* adding anti pattern script, accounting for null has

* adding anti pattern script, supporting multiple executions

* adding anti pattern script, addressing duplicate hashes

* adding anti pattern script, addressing duplicate hashes
  • Loading branch information
franklinWhaite authored Mar 25, 2024
1 parent 4bdddd7 commit c5c3c14
Show file tree
Hide file tree
Showing 5 changed files with 121 additions and 0 deletions.
7 changes: 7 additions & 0 deletions scripts/optimization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,11 @@ gcloud auth login &&
bash run_all_scripts.sh
```

Run [Anti-pattern Recognition Tool](https://github.com/GoogleCloudPlatform/bigquery-antipattern-recognition/tree/main):
```bash
bash run_anti_pattern_tool.sh
```

The scripts are described in more detail in the following sections.

---
Expand Down Expand Up @@ -322,6 +327,8 @@ SELECT * FROM my_table WHERE date = '2020-01-02';
SELECT * FROM my_table WHERE date = '2020-01-03';
```

Running the `run_anti_pattern_tool.sh` bash script will build and run the Anti-Pattern Recognition tool and output the results to the `viewable_queries_grouped_by_hash` table in the `recommendation` column. The tool will identify syntaxes that are know to frequently cause performance issues.

### Examples of querying script results

* Top 100 queries with the highest bytes processed
Expand Down
41 changes: 41 additions & 0 deletions scripts/optimization/anti_pattern_recoginition_tool_tables.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* Copyright 2023 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

CREATE OR REPLACE TABLE optimization_workshop.antipattern_output_table (
job_id STRING,
user_email STRING,
query STRING,
recommendation ARRAY<STRUCT<name STRING, description STRING>>,
slot_hours FLOAT64,
optimized_sql STRING,
process_timestamp TIMESTAMP
);

CREATE OR REPLACE VIEW optimization_workshop.antipattern_tool_input_view AS
SELECT
Query_Hash id,
ANY_VALUE(Query_Raw_Sample) query,
FROM
optimization_workshop.viewable_queries_grouped_by_hash
WHERE
Query_Hash is not null
GROUP BY
Query_Hash
ORDER BY
ANY_VALUE(Total_Slot_Hours) desc
LIMIT
1000
;
1 change: 1 addition & 0 deletions scripts/optimization/run_all_scripts.sh
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,4 @@ done
# actively_read_tables_with_partitioning_clustering_info.sql
bq query ${bq_flags} <table_read_patterns.sql
bq query ${bq_flags} <actively_read_tables_with_partitioning_clustering_info.sql &

49 changes: 49 additions & 0 deletions scripts/optimization/run_anti_pattern_tool.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Exit immediately if a command exits with a non-zero status.
set -e
# Set the following flags for the bq command:
# --quiet: suppress status updates while jobs are running
# --nouse_legacy_sql: use standard SQL syntax
# --nouse_cache: do not use cached results
bq_flags="--quiet --nouse_legacy_sql --nouse_cache"


# Run setup for anti pattern recognition tool
bq query ${bq_flags} <anti_pattern_recoginition_tool_tables.sql

{ # try

## build anti-pattern recognition tool locally
git clone https://github.com/GoogleCloudPlatform/bigquery-antipattern-recognition.git
(cd bigquery-antipattern-recognition && mvn clean package jib:dockerBuild -DskipTests)

## build anti-pattern recognition tool locally
export PROJECT_ID=$(gcloud config get-value project)
docker run -i bigquery-antipattern-recognition \
--input_bq_table ${PROJECT_ID}.optimization_workshop.antipattern_tool_input_view \
--output_table ${PROJECT_ID}.optimization_workshop.antipattern_output_table

# write anti pattern output to queries by has table
bq query ${bq_flags} <update_queries_by_hash_w_anti_patterns.sql

} || { # catch
echo 'Error: could not run Anti-pattern Recognition Tool. Try using GCP Cloud Shell https://cloud.google.com/shell/docs/launching-cloud-shell'
}

# Clean up anti pattern recognition tool
rm -rf bigquery-antipattern-recognition
23 changes: 23 additions & 0 deletions scripts/optimization/update_queries_by_hash_w_anti_patterns.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
* Copyright 2023 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

ALTER TABLE optimization_workshop.viewable_queries_grouped_by_hash
ADD COLUMN IF NOT EXISTS recommendation ARRAY<STRUCT<name STRING, description STRING>>;

UPDATE optimization_workshop.viewable_queries_grouped_by_hash t1
SET t1.recommendation = t2.recommendation
FROM optimization_workshop.antipattern_output_table t2
WHERE t1.Query_Hash = t2.job_id;

0 comments on commit c5c3c14

Please sign in to comment.