Skip to content

Commit 05d5ea2

Browse files
committed
Query tuning / Advanced Query Tuning
1 parent c08eddf commit 05d5ea2

File tree

1 file changed

+180
-6
lines changed

1 file changed

+180
-6
lines changed

03.postgreSQL.md

Lines changed: 180 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
---
44

55
## Table of contents
6-
[1. What is postgreSQL](#what-is-postgresql)
7-
[2. installation](#installation)
6+
+ [1. What is postgreSQL](#what-is-postgresql)
7+
+ [2. installation](#installation)
88

99
---
1010

11-
#### What is PostgreSQL
11+
## What is PostgreSQL
1212

1313
* Installing postgreSQL installs pgadmin.
1414
* Pgadmin is a web-based tool to manage and inspect a postgres database.
@@ -18,7 +18,7 @@
1818
* view table definition by clicking on database -> expanding schemas -> tables
1919
(right click on table) -> view/edit data -> all rows
2020

21-
#### Installation
21+
## Installation
2222

2323
* installation https://www.enterprisedb.com/downloads/postgres-postgresql-downloads
2424
* during install -> uncheck stack builder
@@ -27,7 +27,9 @@
2727
* clicking on servers -> requested password is password for localmachine.
2828
* have to refresh table to see updated info.
2929

30-
#### creating database
30+
---
31+
32+
## creating database
3133

3234
1. servers -> localhost -> databases -> (right-click create)
3335
2. open query tool by right clicking on specific database
@@ -184,6 +186,10 @@ filename -> ... -> *(format -> all files or .sql) -> *(select file to restore fr
184186
4. disable -> trigger -> yes
185187
5. miscelleaneous / behavior -> verbose messages -> yes
186188

189+
---
190+
191+
## Understanding the Internals of PostgreSQL
192+
187193
#### where postgreSQL stores data
188194
```SQL
189195
show data_directory;
@@ -280,4 +286,172 @@ to calculate length of data. click on 3rd byte of 4 the bytes. look at int16 val
280286
length of first item is 172.
281287

282288
68.6.1 Table row layout - And item is further made up of a fixed-size header (23 bytes), followed by optional data, then is the actual user data.
283-
the pointer to the first value -> data inspector -> Int16 -> eg. 203 is the actual data is the id of the data in the postgreSQL table.
289+
the pointer to the first value -> data inspector -> Int16 -> eg. 203 is the actual data is the id of the data in the postgreSQL table.
290+
291+
---
292+
293+
## Indexes
294+
Indexes help with performance
295+
* without indexes, to search data the data has to be loaded up to memory (large performance cost)
296+
* then search through all data for what we are looking for (iteration of rows) also know as 'Full table scan'
297+
* indexes are data structures that efficiently tells us which block and index a record is stored in a heap file on harddrive.
298+
299+
#### how an index works?
300+
* basically how index works is it picks a column we want to have a fast look up on.
301+
* you can have an index on multiple columns.
302+
* then we look at our heap file and that property we want to extract for every row and we then record the block and index and that property (column) value.
303+
* we then sort in some meaningful way, eg. alphabetically.
304+
* put this list of data into a binary tree
305+
* root node gets instructions on where the new node should go eg. left or right (elimination of wrong node direction)
306+
* we cut down the number of blocks we have to read data from.
307+
308+
#### creating an index
309+
* creating an index on users table -> username column.
310+
* view all indexes for a database -> schemas -> table -> users -> indexes -> *(refresh) ->
311+
* naming convention: users_username_idx
312+
313+
```SQL
314+
CREATE INDEX on users (username);
315+
```
316+
317+
#### deleting index
318+
```SQL
319+
DROP INDEX users_username_idx;
320+
```
321+
322+
#### Benchmarking queries with indexes
323+
Keyword : 'EXPLAIN ANALYZE' in front of SELECT statement
324+
Indexes make queries run 16X faster
325+
326+
```SQL
327+
EXPLAIN ANALYZE SELECT *
328+
FROM users
329+
WHERE username = 'Emil30';
330+
```
331+
332+
#### downside of indexes
333+
Indexes are performance benefit, however it can sometimes slow down database.
334+
Indexes take up storage space.
335+
336+
```SQL
337+
--Finding out how much space index use.
338+
SELECT pg_size_pretty(pg_relation_size('users_username_idx'))
339+
```
340+
341+
Indexes slow down insert/update/delete because index has to be updated (especially tables that get updated frequently)
342+
343+
Index might not get used.
344+
345+
#### PostgreSQL autogenerated indexes
346+
postgres does manage its own indexes.
347+
Idexes are created automatically by postgresql for columns (and therefore you never have to create your own indexes for these):
348+
349+
* primary keys
350+
* column that has UNIQUE constraint
351+
352+
Query to see which indexes (relkind = i) actually exist for database
353+
354+
```SQL
355+
SELECT relname, relkind
356+
FROM pg_class
357+
WHERE relkind = 'i' -- type: index
358+
```
359+
360+
---
361+
362+
## Query tuning
363+
When postgres receives query
364+
1. goes into parser: splits up the query and figures out what each SQL keyword means
365+
2. after evaluation, it builds a query tree (programatic tree of query)
366+
3. rewriter takes tree and makes adjustments to it.
367+
4. planner takes a look at query tree and strategizes to get that information.
368+
5. planner picks a strategy
369+
6. executer executes strategy
370+
371+
372+
keywords:
373+
374+
* EXPLAIN - build query plan and display info about it. (plan)
375+
376+
* EXPLAIN ANALYZE - build query plan, RUN IT!, and display info about it. (plan + run)
377+
378+
both keywords are only used for performance optimization and never for production.
379+
pgadmin has a explain analyze button (make sure all options are checked).
380+
How to read explain analyze output:
381+
go to deepest level of query node of query plan (most indented), we can imagine that they keep passing information up to the nearest parent node.
382+
383+
```SQL
384+
SELECT *
385+
FROM pg_stats
386+
WHERE tablename = 'users';
387+
```
388+
389+
postgres is able to make asumptions of (rows, width) in output of EXPLAIN ANALYZE because it actually keeps stats about whats going on in the database.
390+
391+
##### Cost
392+
EXPLAIN ANALYZE -> cost
393+
amount of time to execute a part of a query.
394+
* query plan has something like (cost=9.4...1233.11)
395+
- 9.4 is the cost to calculate the first row value
396+
- 1233.11 is the cost to calculate all rows
397+
* cost of parent node in query plan is the sum of all child nodes' costs.
398+
399+
---
400+
401+
## Advanced Query tuning
402+
403+
using index vs loading all up in memory and reading sequencially:
404+
405+
* assumption that jumping to specific blocks/random child page is 4x as long (eg. x2 pages loaded) = 4 x 2 = 8 units
406+
407+
* compared to sequencial reading x1 (assume 1x base value) which is loading for each page x 110 per file to search through list = 110 x 1 = 110 units
408+
409+
eg. EXPLAIN ANALYZE -
410+
411+
Seq Scan on comments (cost=0.00...1589.10 rows=60410 width=72) (actual time = 0.008...14.29). 60410 rows, 985 pages.
412+
413+
loading up a page to memory is more expensive than sequencial reading of rows
414+
attempting formula:
415+
416+
1.0 (assume score is 1.0 as base to judge everything else in terms of estimate costs)
417+
0.01 (assume loading row is 1% cost of loading up row)
418+
419+
(# pages) * 1.0 + (# rows) * (0.01)
420+
421+
(985) * 1.0 + (60410) * (0.01) = 1589.1 (same estimate as EXPLAIN ANALYZE execution)
422+
423+
##### actual formula for cost for any step of query plan (EXPLAIN ANALYZE)
424+
425+
link specifics default values for costs to calculations. sequencial base cost (seq_page_cost) is the default and all other costs are relative to that.
426+
427+
[runtime config query - postgresql.org/docs/current/runtime-config-query.html](http://postgresql.org/docs/current/runtime-config-query.html)
428+
19.7 Query Planning -> 19.7.2 Planner cost constants
429+
430+
random_page_cost -> 4x more expensive as fetching page in order (seq_page_cost)
431+
432+
seq_page_cost -> 1x (default base that all other costs relative to)
433+
434+
cpu_tuple_cost -> 0.1 (processing a single tuple (row) is 1% as expensive as fetching a page in order (seq_page_cost))
435+
436+
cpu_index_tuple_cost - 0.005 (processing a tuple from an index is 50% as expensive as processing a real row (cpu_tuple_cost))
437+
438+
cpu_operator_cost - 0.0025 (running an operator or function is 50% as expensive as processing an index tuple (cpu_index_tuple_cost))
439+
440+
##### Cost for steps of query plan =
441+
(# pages read sequentially) * seq_page_cost
442+
+
443+
(# page read at random) * random_page_cost
444+
+
445+
(# rows scanned) * cpu_tuple_cost
446+
+
447+
(# index entries scanned) * cpu_index_tuple_cost
448+
+
449+
(# times function/operator evaluated) * cpu_operator_cost
450+
451+
452+
Cost for sequential read =
453+
(# pages read sequentially) * seq_page_cost
454+
+
455+
(# rows scanned) * cpu_tuple_cost
456+
457+
---

0 commit comments

Comments
 (0)