Skip to content

Commit

Permalink
Merge branch 'master' into ericfe-master
Browse files Browse the repository at this point in the history
  • Loading branch information
ericfe committed Jun 12, 2019
2 parents af28d14 + d6d2123 commit 36d37ba
Show file tree
Hide file tree
Showing 122 changed files with 140,517 additions and 832 deletions.
Binary file modified .DS_Store
Binary file not shown.
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ RUN find /usr/src/app -name "*.py"|xargs chmod +x && find /usr/src/app -name "*.

ENV PATH="/usr/src/app/AnalyzeVacuumUtility:/usr/src/app/ColumnEncodingUtility:/usr/src/app/UnloadCopyUtility:${PATH}"

RUN pip install -r /usr/src/app/requirements.txt
RUN pip install -r /usr/src/app/requirements.txt && \
pip install -r /usr/src/app/UnloadCopyUtility/requirements.txt

ENTRYPOINT ["/usr/src/app/bin/entrypoint.sh"]
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ By turning on/off '--analyze-flag' and '--vacuum-flag' parameters, you can run
or 'analyze-only' utility. This script can be scheduled to run VACUUM and ANALYZE as part of
regular maintenance/housekeeping activities, when there are less database activities (quiet period).

# Cloud Data Warehousing Benchmark

The [Cloud DW Benchmark](src/CloudDataWarehouseBenchmark) consists of a set of workloads used to characterize and study the performance of Redshift running a variety of analytic queries. The DDL to set up the databases, including COPY utility commands to load the data from a public S3 directory, as well as the queries for both single user and multi-user throughput testing are provided.

# Unload/Copy Utility

The [Redshift Unload/Copy Utility](src/UnloadCopyUtility) helps you to migrate data between Redshift Clusters or Databases. It exports data from a source cluster to a location on S3, and all data is encrypted with Amazon Key Management Service. It then automatically imports the data into the configured Redshift Cluster, and will cleanup S3 if required. This utility is intended to be used as part of an ongoing scheduled activity, for instance run as part of a Data Pipeline Shell Activity (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-shellcommandactivity.html).
Expand Down
Binary file removed Redshift_DBA_Commands.pptx
Binary file not shown.
Binary file modified src/.DS_Store
Binary file not shown.
7 changes: 4 additions & 3 deletions src/AdminScripts/top_queries.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ total: Total execution time of all occurences
max_query_id: Largest query id of the query occurence
last_run: Last day the query ran
aborted: 0 if query ran to completion, 1 if it was canceled.
alert: Alert event related to the query
alerts: Alert events related to the query
Notes:
There is a commented filter of the query to filter for only Select statements (otherwise it includes all statements like insert, update, COPY)
Expand All @@ -22,7 +22,8 @@ History:
**********************************************************************************************/
-- query runtimes
select trim(database) as DB, count(query) as n_qry, max(substring (qrytext,1,80)) as qrytext, min(run_seconds) as "min" , max(run_seconds) as "max", avg(run_seconds) as "avg", sum(run_seconds) as total, max(query) as max_query_id,
max(starttime)::date as last_run, aborted, event
max(starttime)::date as last_run, aborted,
listagg(event, ', ') within group (order by query) as events
from (
select userid, label, stl_query.query, trim(database) as database, trim(querytxt) as qrytext, md5(trim(querytxt)) as qry_md5, starttime, endtime, datediff(seconds, starttime,endtime)::numeric(12,2) as run_seconds,
aborted, decode(alrt.event,'Very selective query filter','Filter','Scanned a large number of deleted rows','Deleted','Nested Loop Join in the query plan','Nested Loop','Distributed a large number of rows across the network','Distributed','Broadcasted a large number of rows across the network','Broadcast','Missing query planner statistics','Stats',alrt.event) as event
Expand All @@ -33,5 +34,5 @@ where userid <> 1
-- and database = ''
and starttime >= dateadd(day, -7, current_Date)
)
group by database, label, qry_md5, aborted, event
group by database, label, qry_md5, aborted
order by total desc limit 50;
4 changes: 3 additions & 1 deletion src/AdminScripts/wlm_apex.sql
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,11 @@ Notes:
- Best run after period of heaviest query activity
History:
2015-08-31 chriz-bigdata created
2018-12-10 zach-data improved performance by switching to stl_scan with 5 second granularity
**********************************************************************************************/
WITH
generate_dt_series AS (select sysdate - (n * interval '1 second') as dt from (select row_number() over () as n from svl_query_report limit 604800)),
generate_dt_series AS (select sysdate - (n * interval '5 second') as dt from (select row_number() over () as n from stl_scan limit 120960)),
-- For 1 second granularity use the below CTE for generate_dt_series scanning any table with more than 604800 rows
-- generate_dt_series AS (select sysdate - (n * interval '1 second') as dt from (select row_number() over () as n from [table_with_604800_rows] limit 604800)),
apex AS (SELECT iq.dt, iq.service_class, iq.num_query_tasks, count(iq.slot_count) as service_class_queries, sum(iq.slot_count) as service_class_slots
FROM
Expand Down
3 changes: 2 additions & 1 deletion src/AdminViews/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Redshift Admin Views
Views objective is to help on Administration of Redshift
Views objective is to help with administration of Redshift.
All views assume you have a schema called admin.

| View | Purpose |
Expand All @@ -11,6 +11,7 @@ All views assume you have a schema called admin.
| v\_check\_wlm\_query\_trend\_hourly.sql | View to get WLM Query Count, Queue Wait Time , Execution Time and Total Time by Hour |
| v\_constraint\_dependency.sql | View to get the the foreign key constraints between tables |
| v\_extended\_table\_info.sql| View to get extended table information for permanent database tables.
| v\_fragmentation\_info.sql| View to list all fragmented tables in the database
| v\_generate\_cancel\_query.sql | View to get cancel query |
| v\_generate\_database\_ddl.sql | View to get the DDL for a database |
| v\_generate\_group\_ddl.sql | View to get the DDL for a group. |
Expand Down
5 changes: 5 additions & 0 deletions src/AdminViews/v_check_data_distribution.sql
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ SELECT
,pgn.nspname AS schemaname
,id AS tbl_oid
,name AS tablename
,stv.diststyle AS diststyle
,stv.sortkey1 AS sortkey
,rows AS rowcount_on_slice
,SUM(rows) OVER (PARTITION BY name, id) AS total_rowcount
,CASE
Expand All @@ -34,6 +36,9 @@ INNER JOIN
INNER JOIN
pg_namespace AS pgn
ON pgn.oid = pgc.relnamespace
INNER JOIN
svv_table_info AS stv
ON stv.schema = pgn.nspname AND stv."table" = name
WHERE slice < 3201
AND pgc.relowner > 1
;
2 changes: 1 addition & 1 deletion src/AdminViews/v_connection_summary.sql
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ trim(a.dbname) as dbname,
trim(c.application_name) as app_name,
trim(b.authmethod) as authmethod,
case when d.duration > 0 then (d.duration/1000000)/86400||' days '||((d.duration/1000000)%86400)/3600||'hrs '
||((d.duration/1000000)%3600)/60||'mins '||(d.duration/1000000%60)||'secs' else datediff(s,a.recordtime,getdate())/86400||' days '||(datediff(s,a.recordtime,getdate())%86400)/3600||'hrs '
||((d.duration/1000000)%3600)/60||'mins '||(d.duration/1000000%60)||'secs' when f.process is null then null else datediff(s,a.recordtime,getdate())/86400||' days '||(datediff(s,a.recordtime,getdate())%86400)/3600||'hrs '
||(datediff(s,a.recordtime,getdate())%3600)/60||'mins '||(datediff(s,a.recordtime,getdate())%60)||'secs' end as duration,
b.mtu,
trim(b.sslversion) as sslversion,
Expand Down
12 changes: 6 additions & 6 deletions src/AdminViews/v_extended_table_info.sql
Original file line number Diff line number Diff line change
Expand Up @@ -151,16 +151,16 @@ SELECT ti.database,
ti.size || '/' || CASE
WHEN stp.sum_r = stp.sum_sr OR stp.sum_sr = 0 THEN
CASE
WHEN "diststyle" = 'EVEN' THEN (stp.pop_slices*(colenc.cols + 3))
WHEN ("diststyle" = 'EVEN' OR "diststyle"='AUTO(EVEN)') THEN (stp.pop_slices*(colenc.cols + 3))
WHEN SUBSTRING("diststyle",1,3) = 'KEY' THEN (stp.pop_slices*(colenc.cols + 3))
WHEN "diststyle" = 'ALL' THEN (cluster_info.node_count*(colenc.cols + 3))
END
WHEN ("diststyle" = 'ALL' OR "diststyle"='AUTO(ALL)') THEN (cluster_info.node_count*(colenc.cols + 3))
END
ELSE
CASE
WHEN "diststyle" = 'EVEN' THEN (stp.pop_slices*(colenc.cols + 3)*2)
WHEN ( "diststyle" = 'EVEN' OR "diststyle"='AUTO(EVEN)') THEN (stp.pop_slices*(colenc.cols + 3)*2)
WHEN SUBSTRING("diststyle",1,3) = 'KEY' THEN (stp.pop_slices*(colenc.cols + 3)*2)
WHEN "diststyle" = 'ALL' THEN (cluster_info.node_count*(colenc.cols + 3)*2)
END
WHEN ( "diststyle" = 'ALL' OR "diststyle"='AUTO(ALL)') THEN (cluster_info.node_count*(colenc.cols + 3)*2)
END
END|| ' (' || ti.pct_used || ')' AS size,
ti.tbl_rows,
ti.unsorted,
Expand Down
23 changes: 18 additions & 5 deletions src/AdminViews/v_find_dropuser_objs.sql
Original file line number Diff line number Diff line change
@@ -1,25 +1,26 @@
/**********************************************************************************************
Purpose: View to help find all objects owned by the user to be dropped
Columns -
objtype: Type of object user has privilege on. Object types are Function,Schema,
Table or View, Database, Language or Default ACL
objowner: Object owner
userid: Owner user id
userid: Owner user id
schemaname: Schema for the object
objname: Name of the object
ddl: Generate DDL string to transfer object ownership to new user
Notes:
History:
2017-03-27 adedotua created
2017-04-06 adedotua improvements
2018-01-06 adedotua added ddl column to generate ddl for transferring object ownership
2018-01-15 pvbouwel Add QUOTE_IDENT for identifiers
2018-05-29 adedotua added filter to skip temp tables
2018-08-03 alexlsts added table pg_library with custom message in ddl column
**********************************************************************************************/


CREATE OR REPLACE VIEW admin.v_find_dropuser_objs as
SELECT owner.objtype,
owner.objowner,
Expand Down Expand Up @@ -72,6 +73,18 @@ FROM pg_class pgc,
pg_namespace nc
WHERE pgc.relnamespace = nc.oid
AND pgc.relkind IN ('r','v')
AND pgu.usesysid = pgc.relowner) OWNER ("objtype","objowner","userid","schemaname","objname","ddl")
AND pgu.usesysid = pgc.relowner
AND nc.nspname NOT ILIKE 'pg\_temp\_%'
UNION ALL
-- Python libraries owned by the user
SELECT 'Library',
pgu.usename,
pgu.usesysid,
'',
pgl.name,
'No DDL avaible for Python Library. You should DROP OR REPLACE the Python Library'
FROM pg_library pgl,
pg_user pgu
WHERE pgl.owner = pgu.usesysid) OWNER ("objtype","objowner","userid","schemaname","objname","ddl")
WHERE owner.userid > 1;

Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ History:
2018-02-06 adedotua refactored the script to use rowid column for estimation
**********************************************************************************************/

CREATE OR REPLACE VIEW v_fragmentation_info
CREATE OR REPLACE VIEW admin.v_fragmentation_info
AS
select tbl,tablename,dbname,sum(t_excess_blks) est_space_gain
from
Expand Down
2 changes: 1 addition & 1 deletion src/AdminViews/v_generate_cursor_query.sql
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ FROM STV_ACTIVE_CURSORS cur
AND util_text.text != 'begin;'
JOIN PG_USER usr
ON usr.usesysid = cur.userid
GROUP BY cur.userid, cur.xid, cur.pid, usr.usename
GROUP BY cur.userid, cur.xid, cur.pid, usr.usename;
23 changes: 20 additions & 3 deletions src/AdminViews/v_generate_schema_ddl.sql
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,27 @@ Purpose: View to get the DDL for schemas.
History:
2014-02-11 jjschmit Created
2018-01-15 pvbouwel Add QUOTE_IDENT for namespace literal
2018-03-30 burck1 Add logic to add AUTHORIZATION clause
Notes:
If you receive the error
[Amazon](500310) Invalid operation: cannot change data type of view column "ddl";
then you must drop the view and re-create it using
DROP VIEW admin.v_generate_schema_ddl;
**********************************************************************************************/
CREATE OR REPLACE VIEW admin.v_generate_schema_ddl
AS
SELECT nspname AS schemaname, 'CREATE SCHEMA ' + QUOTE_IDENT(nspname) + ';' AS ddl FROM pg_catalog.pg_namespace WHERE nspowner >= 100 ORDER BY nspname
SELECT
nspname AS schemaname,
'CREATE SCHEMA ' + QUOTE_IDENT(nspname) +
CASE
WHEN nspowner > 100
THEN ' AUTHORIZATION ' + QUOTE_IDENT(pg_user.usename)
ELSE ''
END
+ ';' AS ddl
FROM pg_catalog.pg_namespace as pg_namespace
LEFT OUTER JOIN pg_catalog.pg_user pg_user
ON pg_namespace.nspowner=pg_user.usesysid
WHERE nspowner >= 100
ORDER BY nspname
;


Loading

0 comments on commit 36d37ba

Please sign in to comment.