Parallel File System

Project Overview

This is a simple proof-of-concept for a parallel file system interface. Basically, it takes a file, breaks it into pieces, stores those pieces across the system using multiple processes, and then can put it back together later. It's built for basic concurrency and not losing data (mostly). We built the simple stuff: put, get, and delete.

If you need to move big files fast, this is where you start.

⚙️ How It Works (The Bare Minimum)

We use Python's built-in tools to handle the heavy lifting. The core principle is "divide and conquer."

Key Mechanisms

File Partitioning: When you run put, the file gets split into fixed-size chunks (parts).
Parallel Processing: We use a multiprocessing pool (equal to your CPU count) to handle the parts concurrently. Each part is processed and stored independently.
Concurrency Control: We use a threading condition (memory_usage_lock) to enforce a Max Memory Limit and prevent the system from exploding while loading parts into memory. Threads wait their turn if memory is tight.
Data Integrity: Each file part gets a unique ID, and we track its MD5 hash during storage. If the hash doesn't match on retrieval, we know something went wrong, and we skip the bad part.

Registry

We use two simple registry classes (FileRegistry and PartRegistry) to track everything, since we can't just trust the file paths alone:

File Registry: Tracks the file name, its status, and the list of part IDs that belong to it.
Part Registry: Tracks the location, status, and MD5 hash for every individual file part.

🚀 Usage

You interact with the system via a basic command-line loop.

Commands

Command	Action	Notes
`put <file_path>`	Stores a file in the system. Runs in a new thread.	Breaks file into parts, processes them in parallel.
`get <file_id>`	Retrieves a file by its ID. Runs in a new thread.	Reconstructs the file from its parts in parallel.
`delete <file_id>`	Removes a file and all its parts. Runs in a new thread.	Deletes all associated parts in parallel.
`list`	Shows all files currently tracked by the system.	Prints ID, name, and current status.
`exit`	Shuts down the system and waits for all active operations to finish.	Don't forget to wait.

Example

put /path/to/my/big/data.file
list
get 1

📦 Setup & Dependencies

It's all Python. Just make sure the stuff in the import list is installed, especially process, storage.*, and loader, which are assumed to be separate files you have lying around.

os, threading, multiprocessing (Standard Python)

zlib, hashlib, uuid (Standard Python)

External: process.py, storage/fileregistry.py, storage/partregistry.py, loader.py

This project is not production-ready; it's just a sandbox to understand the parallel processing concepts. Don't use it for anything important.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app		app
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel File System

Project Overview

⚙️ How It Works (The Bare Minimum)

Key Mechanisms

Registry

🚀 Usage

Commands

Example

📦 Setup & Dependencies

About

Uh oh!

Releases

Packages

Languages

Vujavujavuja/Parallel_file_reg_python

Folders and files

Latest commit

History

Repository files navigation

Parallel File System

Project Overview

⚙️ How It Works (The Bare Minimum)

Key Mechanisms

Registry

🚀 Usage

Commands

Example

📦 Setup & Dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages