|
| 1 | +# Inspecting & Validating Data |
| 2 | + |
| 3 | +The quality of your backtesting results is directly dependent on the quality of your historical data. Before running a strategy, it is essential to inspect your downloaded data to ensure it is complete and consistent. Gaps, duplicates, or other errors in your data can lead to misleading backtest results. |
| 4 | + |
| 5 | +Stochastix provides a dedicated command-line tool, `stochastix:data:info`, to help you with this process. |
| 6 | + |
| 7 | +## The `data:info` Command |
| 8 | + |
| 9 | +This command reads a `.stchx` binary data file and displays its metadata and a sample of its content. Its most powerful feature is the ability to perform a full consistency validation on the data. |
| 10 | + |
| 11 | +### Command Signature |
| 12 | + |
| 13 | +```bash |
| 14 | +make sf c="stochastix:data:info <file-path> [options]" |
| 15 | +``` |
| 16 | + |
| 17 | +### Argument |
| 18 | + |
| 19 | +* **`file-path`**: The full path to the `.stchx` file you want to inspect. |
| 20 | + |
| 21 | +### Example |
| 22 | + |
| 23 | +```bash |
| 24 | +make sf c="stochastix:data:info data/market/binance/ETH_USDT/1d.stchx" |
| 25 | +``` |
| 26 | + |
| 27 | +## Inspecting File Contents |
| 28 | + |
| 29 | +When run without any options, the command provides a quick overview of the file: |
| 30 | + |
| 31 | +1. **Header Metadata**: It displays the key information from the file's header, such as the `Symbol`, `Timeframe`, and the total `Number of Records` contained within the file. |
| 32 | +2. **Data Sample**: It shows the first 5 and last 5 records from the file. This is useful for a quick sanity check to ensure the timestamps and price ranges look correct. |
| 33 | + |
| 34 | +```bash |
| 35 | +📊 Stochastix STCHXBF1 File Information 📊 |
| 36 | +========================================== |
| 37 | + |
| 38 | + File: /app/data/market/okx/ETH_USDT/1d.stchx |
| 39 | + Size: 17,584 bytes |
| 40 | + |
| 41 | +Header Metadata |
| 42 | +--------------- |
| 43 | + |
| 44 | + ------------------- ---------- |
| 45 | + Magic Number STCHXBF1 |
| 46 | + Format Version 1 |
| 47 | + Header Length 64 |
| 48 | + Record Length 48 |
| 49 | + Timestamp Format 1 |
| 50 | + OHLCV Format 1 |
| 51 | + Symbol ETH/USDT |
| 52 | + Timeframe 1d |
| 53 | + Number of Records 365 |
| 54 | + ------------------- ---------- |
| 55 | + |
| 56 | +Data Sample (Head & Tail) |
| 57 | +------------------------- |
| 58 | + |
| 59 | + ------------ --------------------- ------------- ------------- ------------- ------------- ------------ |
| 60 | + Timestamp Date (UTC) Open High Low Close Volume |
| 61 | + ------------ --------------------- ------------- ------------- ------------- ------------- ------------ |
| 62 | + 1672531200 2023-01-01 00:00:00 1,196.39000 1,204.70000 1,191.27000 1,200.43000 26,631.66 |
| 63 | + 1672617600 2023-01-02 00:00:00 1,200.27000 1,224.64000 1,192.90000 1,214.00000 75,316.11 |
| 64 | + 1672704000 2023-01-03 00:00:00 1,214.00000 1,220.00000 1,204.98000 1,214.51000 37,567.06 |
| 65 | + 1672790400 2023-01-04 00:00:00 1,214.51000 1,273.55000 1,212.73000 1,256.73000 175,177.68 |
| 66 | + 1672876800 2023-01-05 00:00:00 1,256.74000 1,259.98000 1,243.00000 1,251.34000 58,564.63 |
| 67 | + ... ... ... ... ... ... ... |
| 68 | + 1703635200 2023-12-27 00:00:00 2,230.68000 2,392.94000 2,212.01000 2,378.35000 196,149.91 |
| 69 | + 1703721600 2023-12-28 00:00:00 2,378.36000 2,445.80000 2,335.27000 2,344.17000 223,327.62 |
| 70 | + 1703808000 2023-12-29 00:00:00 2,344.18000 2,385.27000 2,255.01000 2,299.15000 213,180.88 |
| 71 | + 1703894400 2023-12-30 00:00:00 2,299.14000 2,322.69000 2,267.72000 2,291.65000 97,952.85 |
| 72 | + 1703980800 2023-12-31 00:00:00 2,291.73000 2,321.39000 2,256.01000 2,282.13000 90,254.81 |
| 73 | + ------------ --------------------- ------------- ------------- ------------- ------------- ------------ |
| 74 | +``` |
| 75 | + |
| 76 | +## Validating Data Consistency |
| 77 | + |
| 78 | +The most important feature of the `data:info` command is the `--validate` flag. When this option is added, the tool will iterate through every single record in the file to check for common data quality issues. |
| 79 | + |
| 80 | +```bash |
| 81 | +make sf c="stochastix:data:info data/market/binance/ETH_USDT/1d.stchx --validate" |
| 82 | +``` |
| 83 | + |
| 84 | +The validation checks for three types of problems: |
| 85 | + |
| 86 | +1. **Gaps**: The time difference between every consecutive record is checked. If it doesn't match the file's timeframe (e.g., 86,400 seconds for a `1d` file), it is flagged as a gap. |
| 87 | +2. **Duplicates**: The tool checks for any records that have the exact same timestamp as the one before it. |
| 88 | +3. **Out of Order**: The tool ensures that timestamps are always increasing. Any timestamp that is less than the previous one is flagged. |
| 89 | + |
| 90 | +### Interpreting the Validation Output |
| 91 | + |
| 92 | +* **If the data is clean**, you will see a "passed" status: |
| 93 | + |
| 94 | + ```bash |
| 95 | + 🔍 Data Consistency Validation |
| 96 | + ----------------------------- |
| 97 | + |
| 98 | + [OK] Data appears consistent. |
| 99 | + ``` |
| 100 | + |
| 101 | +* **If problems are found**, you will see a "failed" status with a detailed list of every issue, including the index of the problematic record: |
| 102 | + |
| 103 | + ```bash |
| 104 | + 🔍 Data Consistency Validation |
| 105 | + ----------------------------- |
| 106 | +
|
| 107 | + [ERROR] Found 2 issue(s). |
| 108 | +
|
| 109 | + Gaps: |
| 110 | + ----- |
| 111 | + ! [WARNING] - At index 452: Diff: 172800s, Expected: 86400s |
| 112 | +
|
| 113 | + Duplicates: |
| 114 | + ----------- |
| 115 | + ! [WARNING] - At index 788: Timestamp 1698883200 |
| 116 | + ``` |
| 117 | + |
| 118 | +::: tip Best Practice |
| 119 | +Always run your downloaded data through the `--validate` check before using it for backtesting. Clean data is the bedrock of trustworthy results. If you find errors, it's best to re-download the data from the exchange or another source. |
| 120 | +::: |
0 commit comments