Skip to content

💾 Implementation of heap file storage, extendible hashing index, and external merge-sort for DB systems.

License

Notifications You must be signed in to change notification settings

VassTs/heapfile-hashing-mergesort

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heap File, Extendible Hashing & External Merge-Sort Implementation

Overview

This project implements three core database components: heap file storage for sequential data management, extendible hashing for dynamic indexing, and an external merge‑sort algorithm specifically designed for heap files.

Repository Information

This is a three-part project for the “K18: Database Systems Implementation” course of the Department of Informatics & Telecommunications of the National and Kapodistrian University of Athens. Each assignment’s statement is in the assignment-instructions/ folder.

Further assumptions and implementation details are collected in the per-assignment READMEs under detailed-readmes/.

Example usages of the implemented functions can be found in the examples/ directory.

Detailed Description

We note that all functions are implemented on top of a block-management layer provided by the bf.h library. Both file types store records of identical structure (struct Record).

Heap File Management (HP_* functions)

The heap file consists of BF_Blocks (bf.h). Each BF_Block contains user records followed by a struct HP_block_info. The first BF_Block is special: it contains only the file header (struct HP_info) without records.

Implemented functions:

  • HP_CreateFile: Creates a heap file with the specified name and initializes the first block with metadata (struct HP_info).
  • HP_OpenFile: Opens the specified file and pins its first block in memory.
  • HP_CloseFile: Closes the file and unpins its first block from memory.
  • HP_InsertEntry: Inserts a record into the last non-full block, creating a new block when necessary.
  • HP_GetAllEntries: Retrieves all records matching a given ID by scanning blocks sequentially.

Example usage:

make hp && ./build/hp_main

Extendible Hashing (HT_* functions)

The hash file consists of BF_Blocks. The first BF_Block contains metadata (struct HT_info). Subsequent blocks store either records (these blocks are called buckets) or index keys (these blocks are called index blocks). Each bucket ends with a struct HT_block_info.

Implemented functions:

  • HT_Init: Initializes necessary data structures.
  • HT_CreateIndex: Creates an empty hash file with metadata (struct HT_info).
  • HT_OpenIndex: Opens the specified hash file.
  • HT_CloseIndex: Closes the hash file.
  • HT_InsertEntry: Inserts records using hashing, handling bucket splits and index doubling when full.
  • HT_PrintAllEntries: Prints all records matching a given ID.
  • HashStatistics: Outputs file statistics.

Example usage:

make ht && ./build/runner

External Merge Sort on Heap Files

Implements the external merge-sort algorithm using ChatGPT 3.5 for assistance. The algorithm consists of:

  1. Sorting pass: In-memory sorting of chunks
  2. Merge passes: Combining sorted runs

Key components:

  • Sorting stage (prefix sort_)
  • Merging stage (prefix merge_)
  • Run handling (prefix CHUNK_)

Implemented functions:

  • shouldSwap: Determines record swap order during sorting.
  • sort_FileInChunks: Sorts file contents in fixed-size chunks.
  • sort_Chunk: Sorts records within a chunk (by name/surname).
  • merge: Merges chunks into sorted runs.
  • CHUNK_CreateIterator: Creates chunk iterator.
  • CHUNK_GetNext: Retrieves next chunk.
  • CHUNK_GetIthRecordInChunk: Gets specific record from chunk.
  • CHUNK_UpdateIthRecord: Updates records in chunks.

Example usage:

make sort && ./build/sort_main

Authors

Chara Marinaki
Vasiliki Tsantila

License

This project is licensed under the MIT License. See LICENSE for details.

About

💾 Implementation of heap file storage, extendible hashing index, and external merge-sort for DB systems.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published