GSG final steps - Style edits and move to top level (github#1073)

* Style edits for GSG * Move GSG to top level * Fix build * Apply suggestions from code review Co-authored-by: Charis <26616127+charislam@users.noreply.github.com> * Edits per feedback * Apply suggestions from code review Committing Miranda's suggestions. I added one more that someone will need to look at. Co-authored-by: mirandaauhl <82287545+mirandaauhl@users.noreply.github.com> * Update getting-started/create-cagg/create-cagg-basics.md Co-authored-by: Charis Lam <26616127+charislam@users.noreply.github.com> Co-authored-by: Ryan Booz <ryan@timescale.com> Co-authored-by: mirandaauhl <82287545+mirandaauhl@users.noreply.github.com>
jnidzwetzki · May 17, 2022 · 4019ae2 · 4019ae2
1 parent 245ab7b
commit 4019ae2
Show file tree

Hide file tree

Showing 21 changed files with 948 additions and 1,179 deletions.
diff --git a/timescaledb/getting-started/add-data.md → getting-started/add-data.md b/timescaledb/getting-started/add-data.md → getting-started/add-data.md
@@ -1,87 +1,92 @@
 # Add time-series data
-
 To explore TimescaleDB's features, you need some sample data. This tutorial
-provides real-time stock trade data, also known as `tick data`, from [Twelve
-Data][twelve-data].
+provides real-time stock trade data, also known as tick data, from
+[Twelve Data][twelve-data].
 
 ## About the dataset
 The dataset contains second-by-second stock-trade data for the top 100
 most-traded symbols, in a hypertable named `stocks_real_time`. It also includes
 a separate table of company symbols and company names, in a regular PostgreSQL
-table named `company`. 
+table named `company`.
 
-The dataset is updated on a nightly basis and contains data from the last four 
+The dataset is updated on a nightly basis and contains data from the last four
 weeks, typically ~8 million rows of data. Stock trades are recorded in real-time
-Monday through Friday, typically during normal trading hours of the New York Stock 
-Exchange (9:30AM - 4:00PM EST).
+Monday through Friday, typically during normal trading hours of the New York Stock
+Exchange (9:30&nbsp;AM - 4:00&nbsp;PM EST).
 
 ### Table details
 
-`stocks_real_time`: contains stock data. Includes stock price quotes at every second
-during trading hours.
+`stocks_real_time`: contains stock data. Includes stock price quotes at every
+second during trading hours.
 
-| Field | Description |
-|-|-|
-| time | (timestamptz) timestamp column incrementing second by second | 
-| symbol | (text) symbols representing a company, mapped to company names in the `company` table | 
-| price | (double precision) stock quote price for a company at the given timestamp |
-| day_volume | (int) number of shares traded each day, NULL values indicate the market is closed | 
+|Field|Type|Description|
+|-|-|-|
+|time|timestamptz|Timestamp column incrementing second by second|
+|symbol|text|Symbols representing a company, mapped to company names in the `company` table|
+|price|double precision|Stock quote price for a company at the given timestamp|
+|day_volume|int|Number of shares traded each day, NULL values indicate the market is closed|
 
 `company`: contains a mapping for symbols to company names.
 
-| Field | Description |
-|-|-|
-| symbol | (text) the symbol representing a company name |
-| name | (text) corresponding company name |
-
+|Field|Type|Description|
+|-|-|-|
+|symbol|text|the symbol representing a company name|
+|name|text|Corresponding company name|
 
 ## Ingest the dataset
 To ingest data into the tables that you created, you need to download the
-dataset and copy the data to your database. 
+dataset and copy the data to your database.
 
 <procedure>
 
 ### Ingesting the dataset
+
 1.  Download the `real_time_stock_data.zip` file. The file contains two `.csv`
-    files: one with company information, and one with real-time stock trades for
-    the past one month. 
-
-    Download: <tag
-    type="download">[real_time_stock_data.zip](https://assets.timescale.com/docs/downloads/get-started/real_time_stock_data.zip)</tag>
+    files; one with company information, and one with real-time stock trades for
+    the past month. Download:
+    <tag
+    type="download">[real_time_stock_data.zip](https://assets.timescale.com/docs/downloads/get-started/real_time_stock_data.zip)
+    </tag>
+
 1.  In a new terminal window, run this command to unzip the `.csv` files:
+
     ```bash
     unzip real_time_stock_data.zip
     ```
+
 1.  At the `psql` prompt, use the `COPY` command to transfer data into your
-    TimescleDB instance . If the `.csv` files aren't in your current directory, specify
-    the file paths in the following commands:
+    TimescleDB instance . If the `.csv` files aren't in your current directory,
+    specify the file paths in the following commands:
+
     ```sql
     \COPY stocks_real_time from './tutorial_sample_tick.csv' DELIMITER ',' CSV HEADER;
     ```
+
     ```sql
     \COPY company from './tutorial_sample_company.csv' DELIMITER ',' CSV HEADER;
     ```
 
     Because there are millions of rows of data, the `COPY` process may take a few
-    minutes dependent on your internet connection and local client resources.
+    minutes depending on your internet connection and local client resources.
 
-    <highlight type="note">
-    If you're using a Docker container, add the data files to your container before 
-    copying them into your database.
+<highlight type="note">
+If you're using a Docker container, add the data files to your container before
+copying them into your database.
 
-    To add files to your container:
-    ```bash
-    docker cp tutorial_sample_tick.csv timescaledb:/tutorial_sample_tick.csv
-    docker cp tutorial_sample_company.csv timescaledb:/tutorial_sample_company.csv
-    ```
-    </highlight>
+To add files to your container:
+
+```bash
+docker cp tutorial_sample_tick.csv timescaledb:/tutorial_sample_tick.csv
+docker cp tutorial_sample_company.csv timescaledb:/tutorial_sample_company.csv
+```
+
+</highlight>
 
 </procedure>
 
 ## Next steps
 Now that you have data in your TimescaleDB instance, learn how to [query the
 data][query-data].
 
-
 [twelve-data]: https://twelvedata.com/
-[query-data]: /getting-started/query-data/
+[query-data]: /query-data/
diff --git a/getting-started/compress-data.md b/getting-started/compress-data.md
@@ -0,0 +1,162 @@
+# Compression 
+TimescaleDB includes native compression capabilities which enable you to
+analyze and query massive amounts of historical time-series data inside a
+database while also saving on storage costs. Additionally, all PostgreSQL data
+types can be used in compression.
+
+Compressing time-series data in a hypertable is a two-step process. First, you
+need to enable compression on a hypertable by telling TimescaleDB how to compress
+and order the data as it is compressed. Once compression is enabled, the data can
+then be compressed in one of two ways:
+
+* Using an automatic policy
+* Manually compressing chunks
+
+## Enable TimescaleDB compression on the hypertable
+
+To enable compression, you need to [`ALTER`][alter-table-compression] the `stocks_real_time` hypertable. There
+are three parameters you can specify when enabling compression:
+
+*  `timescaledb.compress` (required): enable TimescaleDB compression on the
+  hypertable
+*  `timescaledb.compress_orderby` (optional): columns used to order compressed data
+*  `timescaledb.compress_segmentby` (optional): columns used to group compressed
+  data
+
+If you do not specify `compress_orderby` or `compress_segmentby` columns, the compressed data is automatically ordered by the hypertable time column.
+
+<procedure>
+
+### Enabling compression on a hypertable
+
+1.  Use this SQL function to enable compression on the `stocks_real_time`
+    hypertable:
+
+    ```sql
+    ALTER TABLE stocks_real_time SET (
+      timescaledb.compress,
+      timescaledb.compress_orderby = 'time DESC',
+      timescaledb.compress_segmentby = 'symbol'
+    );
+    ```
+
+1.  View and verify the compression settings for your hypertables by using the
+    `compression_settings` informational view, which returns information about
+    each compression option and its `orderby` and `segmentby` attributes:
+
+    ```sql
+    SELECT * FROM timescaledb_information.compression_settings;
+    ```
+
+1.  The results look like this:
+
+    ```bash
+    hypertable_schema|hypertable_name |attname|segmentby_column_index|orderby_column_index|orderby_asc|orderby_nullsfirst|
+    -----------------+----------------+-------+----------------------+--------------------+-----------+------------------+
+    public           |stocks_real_time|symbol |                     1|                    |           |                  |
+    public           |stocks_real_time|time   |                      |                   1|false      |true              |
+    ```
+
+</procedure>
+
+<highlight type="note"> To learn more about the `segmentby` and `orderby`
+options for compression in TimescaleDB and how to pick the right columns, see
+this detailed explanation in the
+[TimescaleDB compression docs](/timescaledb/latest/how-to-guides/compression/).
+</highlight>
+
+## Automatic compression
+When you have enabled compression, you can schedule a
+policy to [automatically compress][compress-automatic] data according to the
+settings you defined earlier.
+
+For example, if you want to compress data on your hypertable that is older than
+two weeks, run this SQL:
+
+```sql
+SELECT add_compression_policy('stocks_real_time', INTERVAL '2 weeks');
+```
+
+Similar to the continuous aggregates policy and retention policies, when you run
+this SQL, all chunks that contain data that is at least two weeks old are
+compressed in `stocks_real_time`, and a recurring compression policy is created.
+
+It is important that you don't try to compress all your data. Although you can
+insert new data into compressed chunks, compressed rows can't be updated or
+deleted. Therefore, it is best to only compress data after it has aged, once
+data is less likely to require updating.
+
+Just like for automated policies for continuous aggregates, you can view
+information and statistics about your compression background job in these two
+information views:
+
+Policy details:
+
+```sql
+SELECT * FROM timescaledb_information.jobs;
+```
+
+Policy job statistics:
+
+```sql
+SELECT * FROM timescaledb_information.job_stats;
+```
+
+## Manual compression
+While it is usually best to use compression policies to compress data
+automatically, there might be situations where you need to
+[manually compress chunks][compress-manual].
+
+Use this query to manually compress chunks that consist of data older than 2
+weeks. If you manually compress hypertable chunks, consider adding
+`if_not_compressed=>true` to the `compress_chunk()` function. Otherwise,
+TimescaleDB shows an error when it tries to compress a chunk that is already
+compressed:
+
+```sql
+SELECT compress_chunk(i, if_not_compressed=>true)
+  FROM show_chunks('stocks_real_time', older_than => INTERVAL ' 2 weeks') i;
+```
+
+## Verify your compression
+
+You can check the overall compression rate of your hypertables using this query
+to view the size of your compressed chunks before and after applying compression:
+
+```sql
+SELECT pg_size_pretty(before_compression_total_bytes) as "before compression",
+  pg_size_pretty(after_compression_total_bytes) as "after compression"
+  FROM hypertable_compression_stats('stocks_real_time');
+```
+
+**Sample results:**
+
+```bash
+|before compression|after compression|
+|------------------|-----------------|
+|326 MB            |29 MB            |
+```
+
+## Next steps
+Your overview of TimescaleDB is almost complete. The final thing to explore is [data retention][data-retention],
+which allows you to drop older raw data from a hypertable quickly without
+deleting data from the precalculated continuous aggregate.
+
+## Learn more about compression
+For more information on how native compression in TimescaleDB works,
+as well as the compression algorithms involved, see this in-depth blog post on
+the topic:
+[Building columnar compression in a row-oriented database][columnar-compression].
+
+For an introduction to compression algorithms, see this blog post:
+[Time-series compression algorithms, explained][compression-algorithms].
+
+For more information, see the [compression docs][compression-docs].
+
+[data-retention]: /data-retention/
+[columnar-compression]: https://blog.timescale.com/blog/building-columnar-compression-in-a-row-oriented-database/
+[compression-algorithms]: https://blog.timescale.com/blog/time-series-compression-algorithms-explained/
+[compression-docs]: /timescaledb/:currentVersion:/how-to-guides/compression
+[alter-table-compression]:  /api/:currentVersion:/compression/alter_table_compression/
+[compress-automatic]: /api/:currentVersion:/compression/add_compression_policy/
+[compress-manual]: /api/:currentVersion:/compression/compress_chunk/