forked from github/docs
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GSG final steps - Style edits and move to top level (github#1073)
* Style edits for GSG * Move GSG to top level * Fix build * Apply suggestions from code review Co-authored-by: Charis <26616127+charislam@users.noreply.github.com> * Edits per feedback * Apply suggestions from code review Committing Miranda's suggestions. I added one more that someone will need to look at. Co-authored-by: mirandaauhl <82287545+mirandaauhl@users.noreply.github.com> * Update getting-started/create-cagg/create-cagg-basics.md Co-authored-by: Charis Lam <26616127+charislam@users.noreply.github.com> Co-authored-by: Ryan Booz <ryan@timescale.com> Co-authored-by: mirandaauhl <82287545+mirandaauhl@users.noreply.github.com>
- Loading branch information
1 parent
245ab7b
commit 4019ae2
Showing
21 changed files
with
948 additions
and
1,179 deletions.
There are no files selected for viewing
85 changes: 45 additions & 40 deletions
85
timescaledb/getting-started/add-data.md → getting-started/add-data.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,87 +1,92 @@ | ||
# Add time-series data | ||
|
||
To explore TimescaleDB's features, you need some sample data. This tutorial | ||
provides real-time stock trade data, also known as `tick data`, from [Twelve | ||
Data][twelve-data]. | ||
provides real-time stock trade data, also known as tick data, from | ||
[Twelve Data][twelve-data]. | ||
|
||
## About the dataset | ||
The dataset contains second-by-second stock-trade data for the top 100 | ||
most-traded symbols, in a hypertable named `stocks_real_time`. It also includes | ||
a separate table of company symbols and company names, in a regular PostgreSQL | ||
table named `company`. | ||
table named `company`. | ||
|
||
The dataset is updated on a nightly basis and contains data from the last four | ||
The dataset is updated on a nightly basis and contains data from the last four | ||
weeks, typically ~8 million rows of data. Stock trades are recorded in real-time | ||
Monday through Friday, typically during normal trading hours of the New York Stock | ||
Exchange (9:30AM - 4:00PM EST). | ||
Monday through Friday, typically during normal trading hours of the New York Stock | ||
Exchange (9:30 AM - 4:00 PM EST). | ||
|
||
### Table details | ||
|
||
`stocks_real_time`: contains stock data. Includes stock price quotes at every second | ||
during trading hours. | ||
`stocks_real_time`: contains stock data. Includes stock price quotes at every | ||
second during trading hours. | ||
|
||
| Field | Description | | ||
|-|-| | ||
| time | (timestamptz) timestamp column incrementing second by second | | ||
| symbol | (text) symbols representing a company, mapped to company names in the `company` table | | ||
| price | (double precision) stock quote price for a company at the given timestamp | | ||
| day_volume | (int) number of shares traded each day, NULL values indicate the market is closed | | ||
|Field|Type|Description| | ||
|-|-|-| | ||
|time|timestamptz|Timestamp column incrementing second by second| | ||
|symbol|text|Symbols representing a company, mapped to company names in the `company` table| | ||
|price|double precision|Stock quote price for a company at the given timestamp| | ||
|day_volume|int|Number of shares traded each day, NULL values indicate the market is closed| | ||
|
||
`company`: contains a mapping for symbols to company names. | ||
|
||
| Field | Description | | ||
|-|-| | ||
| symbol | (text) the symbol representing a company name | | ||
| name | (text) corresponding company name | | ||
|
||
|Field|Type|Description| | ||
|-|-|-| | ||
|symbol|text|the symbol representing a company name| | ||
|name|text|Corresponding company name| | ||
|
||
## Ingest the dataset | ||
To ingest data into the tables that you created, you need to download the | ||
dataset and copy the data to your database. | ||
dataset and copy the data to your database. | ||
|
||
<procedure> | ||
|
||
### Ingesting the dataset | ||
|
||
1. Download the `real_time_stock_data.zip` file. The file contains two `.csv` | ||
files: one with company information, and one with real-time stock trades for | ||
the past one month. | ||
|
||
Download: <tag | ||
type="download">[real_time_stock_data.zip](https://assets.timescale.com/docs/downloads/get-started/real_time_stock_data.zip)</tag> | ||
files; one with company information, and one with real-time stock trades for | ||
the past month. Download: | ||
<tag | ||
type="download">[real_time_stock_data.zip](https://assets.timescale.com/docs/downloads/get-started/real_time_stock_data.zip) | ||
</tag> | ||
|
||
1. In a new terminal window, run this command to unzip the `.csv` files: | ||
|
||
```bash | ||
unzip real_time_stock_data.zip | ||
``` | ||
|
||
1. At the `psql` prompt, use the `COPY` command to transfer data into your | ||
TimescleDB instance . If the `.csv` files aren't in your current directory, specify | ||
the file paths in the following commands: | ||
TimescleDB instance . If the `.csv` files aren't in your current directory, | ||
specify the file paths in the following commands: | ||
```sql | ||
\COPY stocks_real_time from './tutorial_sample_tick.csv' DELIMITER ',' CSV HEADER; | ||
``` | ||
```sql | ||
\COPY company from './tutorial_sample_company.csv' DELIMITER ',' CSV HEADER; | ||
``` | ||
Because there are millions of rows of data, the `COPY` process may take a few | ||
minutes dependent on your internet connection and local client resources. | ||
minutes depending on your internet connection and local client resources. | ||
<highlight type="note"> | ||
If you're using a Docker container, add the data files to your container before | ||
copying them into your database. | ||
<highlight type="note"> | ||
If you're using a Docker container, add the data files to your container before | ||
copying them into your database. | ||
|
||
To add files to your container: | ||
```bash | ||
docker cp tutorial_sample_tick.csv timescaledb:/tutorial_sample_tick.csv | ||
docker cp tutorial_sample_company.csv timescaledb:/tutorial_sample_company.csv | ||
``` | ||
</highlight> | ||
To add files to your container: | ||
|
||
```bash | ||
docker cp tutorial_sample_tick.csv timescaledb:/tutorial_sample_tick.csv | ||
docker cp tutorial_sample_company.csv timescaledb:/tutorial_sample_company.csv | ||
``` | ||
|
||
</highlight> | ||
|
||
</procedure> | ||
|
||
## Next steps | ||
Now that you have data in your TimescaleDB instance, learn how to [query the | ||
data][query-data]. | ||
|
||
|
||
[twelve-data]: https://twelvedata.com/ | ||
[query-data]: /getting-started/query-data/ | ||
[query-data]: /query-data/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,162 @@ | ||
# Compression | ||
TimescaleDB includes native compression capabilities which enable you to | ||
analyze and query massive amounts of historical time-series data inside a | ||
database while also saving on storage costs. Additionally, all PostgreSQL data | ||
types can be used in compression. | ||
|
||
Compressing time-series data in a hypertable is a two-step process. First, you | ||
need to enable compression on a hypertable by telling TimescaleDB how to compress | ||
and order the data as it is compressed. Once compression is enabled, the data can | ||
then be compressed in one of two ways: | ||
|
||
* Using an automatic policy | ||
* Manually compressing chunks | ||
|
||
## Enable TimescaleDB compression on the hypertable | ||
|
||
To enable compression, you need to [`ALTER`][alter-table-compression] the `stocks_real_time` hypertable. There | ||
are three parameters you can specify when enabling compression: | ||
|
||
* `timescaledb.compress` (required): enable TimescaleDB compression on the | ||
hypertable | ||
* `timescaledb.compress_orderby` (optional): columns used to order compressed data | ||
* `timescaledb.compress_segmentby` (optional): columns used to group compressed | ||
data | ||
|
||
If you do not specify `compress_orderby` or `compress_segmentby` columns, the compressed data is automatically ordered by the hypertable time column. | ||
|
||
<procedure> | ||
|
||
### Enabling compression on a hypertable | ||
|
||
1. Use this SQL function to enable compression on the `stocks_real_time` | ||
hypertable: | ||
|
||
```sql | ||
ALTER TABLE stocks_real_time SET ( | ||
timescaledb.compress, | ||
timescaledb.compress_orderby = 'time DESC', | ||
timescaledb.compress_segmentby = 'symbol' | ||
); | ||
``` | ||
|
||
1. View and verify the compression settings for your hypertables by using the | ||
`compression_settings` informational view, which returns information about | ||
each compression option and its `orderby` and `segmentby` attributes: | ||
|
||
```sql | ||
SELECT * FROM timescaledb_information.compression_settings; | ||
``` | ||
|
||
1. The results look like this: | ||
|
||
```bash | ||
hypertable_schema|hypertable_name |attname|segmentby_column_index|orderby_column_index|orderby_asc|orderby_nullsfirst| | ||
-----------------+----------------+-------+----------------------+--------------------+-----------+------------------+ | ||
public |stocks_real_time|symbol | 1| | | | | ||
public |stocks_real_time|time | | 1|false |true | | ||
``` | ||
|
||
</procedure> | ||
|
||
<highlight type="note"> To learn more about the `segmentby` and `orderby` | ||
options for compression in TimescaleDB and how to pick the right columns, see | ||
this detailed explanation in the | ||
[TimescaleDB compression docs](/timescaledb/latest/how-to-guides/compression/). | ||
</highlight> | ||
|
||
## Automatic compression | ||
When you have enabled compression, you can schedule a | ||
policy to [automatically compress][compress-automatic] data according to the | ||
settings you defined earlier. | ||
|
||
For example, if you want to compress data on your hypertable that is older than | ||
two weeks, run this SQL: | ||
|
||
```sql | ||
SELECT add_compression_policy('stocks_real_time', INTERVAL '2 weeks'); | ||
``` | ||
|
||
Similar to the continuous aggregates policy and retention policies, when you run | ||
this SQL, all chunks that contain data that is at least two weeks old are | ||
compressed in `stocks_real_time`, and a recurring compression policy is created. | ||
|
||
It is important that you don't try to compress all your data. Although you can | ||
insert new data into compressed chunks, compressed rows can't be updated or | ||
deleted. Therefore, it is best to only compress data after it has aged, once | ||
data is less likely to require updating. | ||
|
||
Just like for automated policies for continuous aggregates, you can view | ||
information and statistics about your compression background job in these two | ||
information views: | ||
|
||
Policy details: | ||
|
||
```sql | ||
SELECT * FROM timescaledb_information.jobs; | ||
``` | ||
|
||
Policy job statistics: | ||
|
||
```sql | ||
SELECT * FROM timescaledb_information.job_stats; | ||
``` | ||
|
||
## Manual compression | ||
While it is usually best to use compression policies to compress data | ||
automatically, there might be situations where you need to | ||
[manually compress chunks][compress-manual]. | ||
|
||
Use this query to manually compress chunks that consist of data older than 2 | ||
weeks. If you manually compress hypertable chunks, consider adding | ||
`if_not_compressed=>true` to the `compress_chunk()` function. Otherwise, | ||
TimescaleDB shows an error when it tries to compress a chunk that is already | ||
compressed: | ||
|
||
```sql | ||
SELECT compress_chunk(i, if_not_compressed=>true) | ||
FROM show_chunks('stocks_real_time', older_than => INTERVAL ' 2 weeks') i; | ||
``` | ||
|
||
## Verify your compression | ||
|
||
You can check the overall compression rate of your hypertables using this query | ||
to view the size of your compressed chunks before and after applying compression: | ||
|
||
```sql | ||
SELECT pg_size_pretty(before_compression_total_bytes) as "before compression", | ||
pg_size_pretty(after_compression_total_bytes) as "after compression" | ||
FROM hypertable_compression_stats('stocks_real_time'); | ||
``` | ||
|
||
**Sample results:** | ||
|
||
```bash | ||
|before compression|after compression| | ||
|------------------|-----------------| | ||
|326 MB |29 MB | | ||
``` | ||
|
||
## Next steps | ||
Your overview of TimescaleDB is almost complete. The final thing to explore is [data retention][data-retention], | ||
which allows you to drop older raw data from a hypertable quickly without | ||
deleting data from the precalculated continuous aggregate. | ||
|
||
## Learn more about compression | ||
For more information on how native compression in TimescaleDB works, | ||
as well as the compression algorithms involved, see this in-depth blog post on | ||
the topic: | ||
[Building columnar compression in a row-oriented database][columnar-compression]. | ||
|
||
For an introduction to compression algorithms, see this blog post: | ||
[Time-series compression algorithms, explained][compression-algorithms]. | ||
|
||
For more information, see the [compression docs][compression-docs]. | ||
|
||
[data-retention]: /data-retention/ | ||
[columnar-compression]: https://blog.timescale.com/blog/building-columnar-compression-in-a-row-oriented-database/ | ||
[compression-algorithms]: https://blog.timescale.com/blog/time-series-compression-algorithms-explained/ | ||
[compression-docs]: /timescaledb/:currentVersion:/how-to-guides/compression | ||
[alter-table-compression]: /api/:currentVersion:/compression/alter_table_compression/ | ||
[compress-automatic]: /api/:currentVersion:/compression/add_compression_policy/ | ||
[compress-manual]: /api/:currentVersion:/compression/compress_chunk/ |
Oops, something went wrong.