|
3 | 3 | ---
|
4 | 4 |
|
5 | 5 | ## Table of contents
|
6 |
| - [1. What is postgreSQL](#what-is-postgresql) |
7 |
| - [2. installation](#installation) |
| 6 | + + [1. What is postgreSQL](#what-is-postgresql) |
| 7 | + + [2. installation](#installation) |
8 | 8 |
|
9 | 9 | ---
|
10 | 10 |
|
11 |
| -#### What is PostgreSQL |
| 11 | +## What is PostgreSQL |
12 | 12 |
|
13 | 13 | * Installing postgreSQL installs pgadmin.
|
14 | 14 | * Pgadmin is a web-based tool to manage and inspect a postgres database.
|
|
18 | 18 | * view table definition by clicking on database -> expanding schemas -> tables
|
19 | 19 | (right click on table) -> view/edit data -> all rows
|
20 | 20 |
|
21 |
| -#### Installation |
| 21 | +## Installation |
22 | 22 |
|
23 | 23 | * installation https://www.enterprisedb.com/downloads/postgres-postgresql-downloads
|
24 | 24 | * during install -> uncheck stack builder
|
|
27 | 27 | * clicking on servers -> requested password is password for localmachine.
|
28 | 28 | * have to refresh table to see updated info.
|
29 | 29 |
|
30 |
| -#### creating database |
| 30 | +--- |
| 31 | + |
| 32 | +## creating database |
31 | 33 |
|
32 | 34 | 1. servers -> localhost -> databases -> (right-click create)
|
33 | 35 | 2. open query tool by right clicking on specific database
|
@@ -184,6 +186,10 @@ filename -> ... -> *(format -> all files or .sql) -> *(select file to restore fr
|
184 | 186 | 4. disable -> trigger -> yes
|
185 | 187 | 5. miscelleaneous / behavior -> verbose messages -> yes
|
186 | 188 |
|
| 189 | +--- |
| 190 | + |
| 191 | +## Understanding the Internals of PostgreSQL |
| 192 | + |
187 | 193 | #### where postgreSQL stores data
|
188 | 194 | ```SQL
|
189 | 195 | show data_directory;
|
@@ -280,4 +286,172 @@ to calculate length of data. click on 3rd byte of 4 the bytes. look at int16 val
|
280 | 286 | length of first item is 172.
|
281 | 287 |
|
282 | 288 | 68.6.1 Table row layout - And item is further made up of a fixed-size header (23 bytes), followed by optional data, then is the actual user data.
|
283 |
| -the pointer to the first value -> data inspector -> Int16 -> eg. 203 is the actual data is the id of the data in the postgreSQL table. |
| 289 | +the pointer to the first value -> data inspector -> Int16 -> eg. 203 is the actual data is the id of the data in the postgreSQL table. |
| 290 | + |
| 291 | +--- |
| 292 | + |
| 293 | +## Indexes |
| 294 | +Indexes help with performance |
| 295 | +* without indexes, to search data the data has to be loaded up to memory (large performance cost) |
| 296 | +* then search through all data for what we are looking for (iteration of rows) also know as 'Full table scan' |
| 297 | +* indexes are data structures that efficiently tells us which block and index a record is stored in a heap file on harddrive. |
| 298 | + |
| 299 | +#### how an index works? |
| 300 | +* basically how index works is it picks a column we want to have a fast look up on. |
| 301 | +* you can have an index on multiple columns. |
| 302 | +* then we look at our heap file and that property we want to extract for every row and we then record the block and index and that property (column) value. |
| 303 | +* we then sort in some meaningful way, eg. alphabetically. |
| 304 | +* put this list of data into a binary tree |
| 305 | +* root node gets instructions on where the new node should go eg. left or right (elimination of wrong node direction) |
| 306 | +* we cut down the number of blocks we have to read data from. |
| 307 | + |
| 308 | +#### creating an index |
| 309 | +* creating an index on users table -> username column. |
| 310 | +* view all indexes for a database -> schemas -> table -> users -> indexes -> *(refresh) -> |
| 311 | +* naming convention: users_username_idx |
| 312 | + |
| 313 | +```SQL |
| 314 | +CREATE INDEX on users (username); |
| 315 | +``` |
| 316 | + |
| 317 | +#### deleting index |
| 318 | +```SQL |
| 319 | +DROP INDEX users_username_idx; |
| 320 | +``` |
| 321 | + |
| 322 | +#### Benchmarking queries with indexes |
| 323 | +Keyword : 'EXPLAIN ANALYZE' in front of SELECT statement |
| 324 | +Indexes make queries run 16X faster |
| 325 | + |
| 326 | +```SQL |
| 327 | +EXPLAIN ANALYZE SELECT * |
| 328 | +FROM users |
| 329 | +WHERE username = 'Emil30'; |
| 330 | +``` |
| 331 | + |
| 332 | +#### downside of indexes |
| 333 | +Indexes are performance benefit, however it can sometimes slow down database. |
| 334 | +Indexes take up storage space. |
| 335 | + |
| 336 | +```SQL |
| 337 | +--Finding out how much space index use. |
| 338 | +SELECT pg_size_pretty(pg_relation_size('users_username_idx')) |
| 339 | +``` |
| 340 | + |
| 341 | +Indexes slow down insert/update/delete because index has to be updated (especially tables that get updated frequently) |
| 342 | + |
| 343 | +Index might not get used. |
| 344 | + |
| 345 | +#### PostgreSQL autogenerated indexes |
| 346 | +postgres does manage its own indexes. |
| 347 | +Idexes are created automatically by postgresql for columns (and therefore you never have to create your own indexes for these): |
| 348 | + |
| 349 | +* primary keys |
| 350 | +* column that has UNIQUE constraint |
| 351 | + |
| 352 | +Query to see which indexes (relkind = i) actually exist for database |
| 353 | + |
| 354 | +```SQL |
| 355 | +SELECT relname, relkind |
| 356 | +FROM pg_class |
| 357 | +WHERE relkind = 'i' -- type: index |
| 358 | +``` |
| 359 | + |
| 360 | +--- |
| 361 | + |
| 362 | +## Query tuning |
| 363 | +When postgres receives query |
| 364 | +1. goes into parser: splits up the query and figures out what each SQL keyword means |
| 365 | +2. after evaluation, it builds a query tree (programatic tree of query) |
| 366 | +3. rewriter takes tree and makes adjustments to it. |
| 367 | +4. planner takes a look at query tree and strategizes to get that information. |
| 368 | +5. planner picks a strategy |
| 369 | +6. executer executes strategy |
| 370 | + |
| 371 | + |
| 372 | +keywords: |
| 373 | + |
| 374 | +* EXPLAIN - build query plan and display info about it. (plan) |
| 375 | + |
| 376 | +* EXPLAIN ANALYZE - build query plan, RUN IT!, and display info about it. (plan + run) |
| 377 | + |
| 378 | +both keywords are only used for performance optimization and never for production. |
| 379 | +pgadmin has a explain analyze button (make sure all options are checked). |
| 380 | +How to read explain analyze output: |
| 381 | +go to deepest level of query node of query plan (most indented), we can imagine that they keep passing information up to the nearest parent node. |
| 382 | + |
| 383 | +```SQL |
| 384 | +SELECT * |
| 385 | +FROM pg_stats |
| 386 | +WHERE tablename = 'users'; |
| 387 | +``` |
| 388 | + |
| 389 | +postgres is able to make asumptions of (rows, width) in output of EXPLAIN ANALYZE because it actually keeps stats about whats going on in the database. |
| 390 | + |
| 391 | +##### Cost |
| 392 | +EXPLAIN ANALYZE -> cost |
| 393 | +amount of time to execute a part of a query. |
| 394 | +* query plan has something like (cost=9.4...1233.11) |
| 395 | + - 9.4 is the cost to calculate the first row value |
| 396 | + - 1233.11 is the cost to calculate all rows |
| 397 | +* cost of parent node in query plan is the sum of all child nodes' costs. |
| 398 | + |
| 399 | +--- |
| 400 | + |
| 401 | +## Advanced Query tuning |
| 402 | + |
| 403 | +using index vs loading all up in memory and reading sequencially: |
| 404 | + |
| 405 | +* assumption that jumping to specific blocks/random child page is 4x as long (eg. x2 pages loaded) = 4 x 2 = 8 units |
| 406 | + |
| 407 | +* compared to sequencial reading x1 (assume 1x base value) which is loading for each page x 110 per file to search through list = 110 x 1 = 110 units |
| 408 | + |
| 409 | +eg. EXPLAIN ANALYZE - |
| 410 | + |
| 411 | +Seq Scan on comments (cost=0.00...1589.10 rows=60410 width=72) (actual time = 0.008...14.29). 60410 rows, 985 pages. |
| 412 | + |
| 413 | +loading up a page to memory is more expensive than sequencial reading of rows |
| 414 | +attempting formula: |
| 415 | + |
| 416 | +1.0 (assume score is 1.0 as base to judge everything else in terms of estimate costs) |
| 417 | +0.01 (assume loading row is 1% cost of loading up row) |
| 418 | + |
| 419 | +(# pages) * 1.0 + (# rows) * (0.01) |
| 420 | + |
| 421 | +(985) * 1.0 + (60410) * (0.01) = 1589.1 (same estimate as EXPLAIN ANALYZE execution) |
| 422 | + |
| 423 | +##### actual formula for cost for any step of query plan (EXPLAIN ANALYZE) |
| 424 | + |
| 425 | +link specifics default values for costs to calculations. sequencial base cost (seq_page_cost) is the default and all other costs are relative to that. |
| 426 | + |
| 427 | +[runtime config query - postgresql.org/docs/current/runtime-config-query.html](http://postgresql.org/docs/current/runtime-config-query.html) |
| 428 | +19.7 Query Planning -> 19.7.2 Planner cost constants |
| 429 | + |
| 430 | +random_page_cost -> 4x more expensive as fetching page in order (seq_page_cost) |
| 431 | + |
| 432 | +seq_page_cost -> 1x (default base that all other costs relative to) |
| 433 | + |
| 434 | +cpu_tuple_cost -> 0.1 (processing a single tuple (row) is 1% as expensive as fetching a page in order (seq_page_cost)) |
| 435 | + |
| 436 | +cpu_index_tuple_cost - 0.005 (processing a tuple from an index is 50% as expensive as processing a real row (cpu_tuple_cost)) |
| 437 | + |
| 438 | +cpu_operator_cost - 0.0025 (running an operator or function is 50% as expensive as processing an index tuple (cpu_index_tuple_cost)) |
| 439 | + |
| 440 | +##### Cost for steps of query plan = |
| 441 | +(# pages read sequentially) * seq_page_cost |
| 442 | ++ |
| 443 | +(# page read at random) * random_page_cost |
| 444 | ++ |
| 445 | +(# rows scanned) * cpu_tuple_cost |
| 446 | ++ |
| 447 | +(# index entries scanned) * cpu_index_tuple_cost |
| 448 | ++ |
| 449 | +(# times function/operator evaluated) * cpu_operator_cost |
| 450 | + |
| 451 | + |
| 452 | +Cost for sequential read = |
| 453 | +(# pages read sequentially) * seq_page_cost |
| 454 | ++ |
| 455 | +(# rows scanned) * cpu_tuple_cost |
| 456 | + |
| 457 | +--- |
0 commit comments