This project implements a distributed file system with RAID 5 fault tolerance. The system provides data protection through distributed parity, allowing the file system to continue operating even when one server fails. This is a Computer System Design project that demonstrates RAID 5 concepts in practice.
- RAID 5 Distributed Parity: Parity blocks rotate across all servers for fault tolerance
- Fault Tolerance: Can survive single server failure with automatic data recovery
- Data Recovery: Automatic reconstruction of data using parity and remaining servers
- Consistency Verification: Built-in tools to verify RAID 5 integrity
- Interactive Shell: Command-line interface for file operations and RAID management
- Multi-Server Architecture: Distributed storage across multiple block servers
- Distributed Parity: Parity blocks rotate across servers for each stripe
- Stripe Distribution: For N servers, each stripe contains (N-1) data blocks + 1 parity block
- Fault Tolerance: Can handle single server failure with automatic recovery
- Space Efficiency: 25% storage overhead for 4-server configuration
- Block Servers: Individual storage servers (default: 4 servers)
- File System: Main file system with RAID 5 layer
- Interactive Shell: Command-line interface for operations
- RAID Controller: Handles parity calculation and data distribution
- Python 3.6 or higher
- Network connectivity between servers (localhost for testing)
-
Clone the repository:
git clone <your-repository-url> cd project
-
Verify all files are present:
ls -la
You should see:
block.py
,blockserver.py
,fsmain.py
,fsconfig.py
,shell.py
Open 4 separate terminal windows and run these commands:
Terminal 1 (Server 0):
python blockserver.py -nb 256 -bs 128 -port 8000
Terminal 2 (Server 1):
python blockserver.py -nb 256 -bs 128 -port 8001
Terminal 3 (Server 2):
python blockserver.py -nb 256 -bs 128 -port 8002
Terminal 4 (Server 3):
python blockserver.py -nb 256 -bs 128 -port 8003
Open a 5th terminal and run:
python fsmain.py -nb 256 -bs 128 -ni 16 -is 16 -cid 0 -port 8000 -startport 8000 -ns 4
You'll see a prompt like: [cwd=/]%
Try these commands:
create myfile
append myfile "Hello RAID 5!"
cat myfile
ls
create <filename>
- Create a new fileappend <filename> <content>
- Append content to filecat <filename>
- Display file contentsls
- List files in current directorycd <directory>
- Change directorymkdir <directory>
- Create directoryrm <filename>
- Remove file
verify <block_number>
- Verify RAID 5 consistency for a blockrepair <server_id>
- Repair a failed server using parity reconstructionshowblock <block_number>
- Display block contentsshowfsconfig
- Show file system configuration
save <filename>
- Save file system stateload <filename>
- Load file system stateexit
- Exit the file system
-
Create and write to a file:
create testfile append testfile "Testing RAID 5 functionality" cat testfile
-
Verify RAID 5 consistency:
verify 0 verify 1
- Create a file with data
- Stop one of the block servers (Ctrl+C in its terminal)
- Try to read the file - it should still work using parity recovery
- Restart the stopped server
- Use
repair <server_id>
to reconstruct any missing data
Run the test suite:
python test_raid5.py
-nb <number>
- Total number of blocks (default: 256)-bs <number>
- Block size in bytes (default: 128)-ni <number>
- Maximum number of inodes (default: 16)-is <number>
- Inode size in bytes (default: 16)
-cid <number>
- Client ID (default: 0)-port <number>
- Main port (default: 8000)-startport <number>
- Starting port for servers (default: 8000)-ns <number>
- Number of servers (default: 4)
- Data Distribution: Data blocks are distributed across (N-1) servers
- Parity Calculation: Parity is calculated using XOR operations
- Rotating Parity: Parity location rotates across servers for each stripe
- Fault Tolerance: Can recover data if one server fails
Stripe 0: Data[Server0, Server1, Server2] Parity[Server3]
Stripe 1: Data[Server0, Server1, Server3] Parity[Server2]
Stripe 2: Data[Server0, Server2, Server3] Parity[Server1]
Stripe 3: Data[Server1, Server2, Server3] Parity[Server0]
When a server fails:
- System detects the failure during read/write
- Uses parity data from the parity server
- XORs with data from remaining servers
- Reconstructs the missing data
"SERVER_DISCONNECTED" messages
- Ensure all 4 block servers are running
- Check that ports 8000-8003 are available
- Verify no firewall blocking connections
"CORRUPTED_BLOCK" messages
- Run
verify <block_number>
to check consistency - Use
repair <server_id>
to reconstruct data - Check server logs for corruption events
File system won't start
- Make sure all block servers are running first
- Check command line arguments
- Verify Python version (3.6+)
Enable detailed logging by modifying fsmain.py
:
logging.basicConfig(filename='memoryfs.log', filemode='w', level=logging.DEBUG)
- Write Performance: 4x slower than single server (due to parity calculations)
- Read Performance: Same as single server (normal case)
- Recovery Performance: Slower during server failure (requires multiple reads)
project/
├── block.py # RAID 5 implementation and block layer
├── blockserver.py # Individual block storage servers
├── fsmain.py # Main file system entry point
├── fsconfig.py # Configuration and constants
├── shell.py # Interactive command shell
├── test_raid5.py # Automated test suite
├── RAID5_README.md # Detailed RAID 5 documentation
└── README.md # This file
This project demonstrates:
- RAID 5 Concepts: Distributed parity, fault tolerance, data recovery
- Distributed Systems: Multi-server architecture, network communication
- File System Design: Block storage, inodes, directory structure
- Error Handling: Graceful failure recovery, consistency checking
- System Programming: Low-level storage operations, XOR calculations
This is an educational project. Feel free to:
- Report bugs or issues
- Suggest improvements
- Add new features
- Improve documentation
This project is for educational purposes as part of a Computer System Design course.
- Based on file system design principles
- Implements RAID 5 fault tolerance concepts
- Educational project for learning distributed systems
Note: This is a simplified implementation for educational purposes. Production RAID systems have additional features like hot spares, multiple failure handling, and advanced error correction.