Comparison - Compressing high volume of grouped data into database
This project demonstrates a Python script that compares the performance of storing records in a standard SQLite database versus a database with compressed records. It showcases how compressing data before insertion can lead to significant storage savings and evaluates the trade-offs in terms of write and read performance.
- Data Compression: Uses
zlib
to compress records grouped by date. - Parallel Processing: Leverages
multiprocessing
to perform compression and decompression in parallel, utilizing multiple CPU cores to speed up the process. - Performance Comparison: Provides a clear summary table comparing storage size, write time, and read time between the normal and compressed database approaches.
-
Ensure you have Python 3 installed.
-
Clone the repository.
-
Run the script from your terminal:
python database_insert.py
The script will:
- Clean up any old database files.
- Generate a configurable number of sample records.
- Write the records to a normal SQLite database (
example.db
). - Group, compress, and write the records to a compressed SQLite database (
example-compressed.db
). - Read the records from both databases to measure read performance.
- Verify that the data in both databases is identical.
- Print a summary table with the final performance metrics.
Example Output:
--- Final Summary ---
Results for 10,000,000 records and 1,000 unique groups.
---------------------------------------------------------------------------------------------
| Metric | Normal | Compressed | Variation |
|---------------------------|-----------------|-----------------|---------------------------|
| Storage Size (KB) | 395184.00 | 103024.00 | 3.84x smaller (-73.93%) |
| Write Time (s) | 14.0467 | 9.3553 | -33.40% |
| Read Time (s) | 14.2523 | 7.1820 | -49.61% |
---------------------------------------------------------------------------------------------