Skip to content
This repository was archived by the owner on Oct 10, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8a33e42
feat: add local KVM/libvirt testing infrastructure with automated fixes
josecelano Jul 1, 2025
34750e1
refactor: reorganize repo into infrastructure and application components
josecelano Jul 1, 2025
832fc68
docs: update makefile validation checklist with comprehensive test re…
josecelano Jul 1, 2025
091029f
fix: resolve Docker Compose compatibility and firewall SSH blocking i…
josecelano Jul 2, 2025
a43d130
chore: [#10] remove obsolete MAKEFILE_TESTING_TODO.md file
josecelano Jul 3, 2025
70639c5
fix: [#10] remove undefined service dependencies from Docker Compose
josecelano Jul 3, 2025
9fbf7bd
fix: [#10] correct typo in install script message
josecelano Jul 3, 2025
a786666
docs: [#10] add explicit git permission requirements to AI Assistant …
josecelano Jul 3, 2025
b9e6606
feat: [#10] add Torrust Tracker dependencies for future source compil…
josecelano Jul 3, 2025
374d763
feat: [#10] enhance development workflow and SSH debugging
josecelano Jul 4, 2025
94d01d4
feat: update infrastructure to use Ubuntu 24.04 and fix cloud-init co…
josecelano Jul 4, 2025
d7d9ddf
feat: add VM console access commands and documentation
josecelano Jul 4, 2025
7f3251a
docs: enforce GPG commit signing requirement in copilot instructions
josecelano Jul 4, 2025
cf61dfd
docs: add preferred working methodology to copilot instructions
josecelano Jul 4, 2025
fdf1a95
fix: resolve YAML line length in user-data.yaml.tpl
josecelano Jul 4, 2025
5b70235
feat: [#10] implement comprehensive linting infrastructure
josecelano Jul 4, 2025
a4a5e5f
refactor: [#10] rename workflow from infrastructure to testing
josecelano Jul 4, 2025
3d5c1ee
refactor: [#10] simplify lint.sh to use tools' built-in file discovery
josecelano Jul 4, 2025
9dc6b00
docs: [#10] add mandatory linting requirement to copilot instructions
josecelano Jul 4, 2025
53b7591
docs: [#10] add nullglob to project dictionary
josecelano Jul 4, 2025
c292adb
fix: [#10] resolve SSH authentication failure in cloud-init configura…
josecelano Jul 4, 2025
b272f1b
docs: organize SSH bug documentation into structured archive
josecelano Jul 4, 2025
3a3746c
docs: [#10] add DHCP lease behavior explanation to libvirt setup guide
josecelano Jul 4, 2025
e4833aa
docs: [#10] update all Ubuntu version references from 22.04 to 24.04
josecelano Jul 7, 2025
ed1bcb0
fix: modernize cloud-init user password configuration
josecelano Jul 7, 2025
e5f29a2
security: disable password authentication by default
josecelano Jul 7, 2025
22ee5f3
docs: [#10] add twelve-factor app refactoring plan and guides
josecelano Jul 7, 2025
6203f29
feat: upgrade Docker installation to use official Docker repository
josecelano Jul 7, 2025
4c0edc0
feat: [#10] add Rust installation to cloud-init configuration
josecelano Jul 7, 2025
a2e0554
docs: [#10] add troubleshooting for VM IP detection issue
josecelano Jul 7, 2025
58c7294
docs: [#10] add ADR-002 documenting Docker for all services decision
josecelano Jul 7, 2025
75df631
refactor: comment out Rust dependencies for Docker-only deployment
josecelano Jul 7, 2025
8fac056
fix: [#10] add X-Forwarded-For header to nginx HTTP config
josecelano Jul 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: organize SSH bug documentation into structured archive
- Create infrastructure/docs/bugs/ directory for systematic bug documentation
- Move SSH authentication failure documentation to 001-ssh-authentication-failure/
- Organize content into logical structure:
  - README.md: Bug overview and quick reference
  - SSH_BUG_ANALYSIS.md: Initial investigation and analysis
  - SSH_BUG_SUMMARY.md: Complete timeline and resolution
  - test-configs/: All 17 test configurations used during debugging

- Add comprehensive README.md for bugs directory explaining:
  - Purpose and scope of bug documentation archive
  - Directory structure and naming conventions
  - Content guidelines and quality standards
  - Usage examples for contributors and maintainers

- Fix markdown linting issues in all documentation files
- Add markdownlint disable for technical content with long lines

This establishes a systematic approach for documenting infrastructure bugs
with complete investigation trails, test artifacts, and lessons learned.
Future bugs can follow this template for consistent documentation quality.
  • Loading branch information
josecelano committed Jul 4, 2025
commit b272f1b90cbd585db85a1206c0ddf2a3f8f48418
104 changes: 104 additions & 0 deletions infrastructure/docs/bugs/001-ssh-authentication-failure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# SSH Authentication Failure Bug - #001

**Date Resolved:** July 4, 2025
**Status:** βœ… Resolved
**Impact:** High - Blocked VM access completely
**Root Cause:** YAML document start marker (`---`) breaking cloud-init parsing

## Problem Summary

The full cloud-init configuration (`user-data.yaml.tpl`) for the Torrust Tracker
Demo VM was causing SSH authentication failures for both SSH key and password
authentication, preventing users from accessing deployed VMs.

## Root Cause

The issue was caused by using the YAML document start marker (`---`) at the
beginning of the cloud-init configuration file instead of the required
`#cloud-config` header. This caused cloud-init to misprocess the entire
configuration, resulting in:

- Empty SSH authorized_keys (SSH key variable not templated)
- Broken password authentication setup
- Schema validation errors in cloud-init

## The Fix

**Simple but Critical Change:**

```yaml
# BEFORE (BROKEN):
---
# cloud-config

# AFTER (FIXED):
#cloud-config
```

**File Changed:** `infrastructure/cloud-init/user-data.yaml.tpl`

## Investigation Process

This bug was resolved through systematic incremental testing:

1. **Incremental Testing**: Created 15+ test configurations, adding features one by one
2. **Root Cause Isolation**: Compared working vs. broken configurations using diff analysis
3. **Hypothesis Formation**: Identified YAML header as the key difference
4. **Validation**: Deployed fresh VM with corrected header and confirmed fix

## Validation Results

After applying the fix:

- βœ… SSH Key Authentication: Works perfectly
- βœ… Password Authentication: Works perfectly
- βœ… All Cloud-Init Features: Docker, UFW, packages, etc. - ALL WORKING
- βœ… Integration Tests: Complete test suite passes
- βœ… Make Commands: Standard workflow (`make init`, `make plan`, `make apply`) works

## Files in This Directory

### Core Documentation

- `SSH_BUG_ANALYSIS.md` - Initial analysis and hypothesis formation
- `SSH_BUG_SUMMARY.md` - Complete investigation summary with detailed timeline

### Test Artifacts

- `test-configs/` - All 16 test configurations used during incremental testing
- `user-data-test-1.1.yaml.tpl` through `user-data-test-15.1.yaml.tpl`
- `user-data-test-header.yaml.tpl` - Final test that confirmed the fix

### Validation

- `validation/` - (Currently empty, reserved for future validation scripts)

## Lessons Learned

1. **Cloud-init requires specific headers**: `#cloud-config` is mandatory, not `---`
2. **Incremental testing is powerful**: Systematic approach isolated the issue effectively
3. **Template variable validation**: Always verify that template variables are being substituted correctly
4. **Integration testing is crucial**: End-to-end testing revealed the full scope of the issue

## Prevention

To prevent similar issues:

- Always use `#cloud-config` as the first line in cloud-init files
- Test template variable substitution in terraform plans
- Run integration tests after any cloud-init configuration changes
- Use the documented make workflow for deployments

## Related Issues

This fix resolves SSH access problems that were preventing users from following
the integration testing guide and deploying the Torrust Tracker Demo
successfully.

## Technical Details

For complete technical details, debugging methodology, and step-by-step
investigation process, see:

- [SSH_BUG_ANALYSIS.md](SSH_BUG_ANALYSIS.md) - Initial investigation
- [SSH_BUG_SUMMARY.md](SSH_BUG_SUMMARY.md) - Comprehensive analysis with timeline
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
<!-- markdownlint-disable MD013 -->

# SSH Authentication Bug Analysis - Cloud-Init Configuration

## Problem Summary
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
<!-- markdownlint-disable MD013 -->

# SSH Authentication Bug Analysis Summary

**Date:** July 4, 2025
Expand Down
133 changes: 133 additions & 0 deletions infrastructure/docs/bugs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Bug Documentation Archive

This directory contains comprehensive documentation for bugs that have been
investigated and resolved in the Torrust Tracker Demo infrastructure project.

## Purpose

The purpose of this archive is to:

- **Preserve Investigation Process**: Document the complete debugging methodology
and thought process used to identify and resolve infrastructure issues
- **Enable Knowledge Transfer**: Provide detailed reference material for future
contributors who encounter similar problems
- **Improve Debugging Skills**: Demonstrate systematic approaches to
infrastructure troubleshooting
- **Prevent Regression**: Maintain test cases and validation procedures to
ensure fixes remain effective

## Structure

Each bug is documented in its own numbered directory following this convention:

```text
infrastructure/docs/bugs/
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ 001-ssh-authentication-failure/ # First documented bug
β”‚ β”œβ”€β”€ README.md # Bug overview and summary
β”‚ β”œβ”€β”€ SSH_BUG_ANALYSIS.md # Initial analysis and hypothesis
β”‚ β”œβ”€β”€ SSH_BUG_SUMMARY.md # Complete investigation summary
β”‚ β”œβ”€β”€ test-configs/ # Test configurations used
β”‚ β”‚ β”œβ”€β”€ user-data-test-1.1.yaml.tpl
β”‚ β”‚ β”œβ”€β”€ user-data-test-2.1.yaml.tpl
β”‚ β”‚ └── ...
β”‚ └── validation/ # Final validation artifacts
└── 002-next-bug/ # Future bug documentation
└── ...
```

## Documentation Standards

When documenting a new bug, create a new numbered directory and include:

### Required Files

1. **README.md** - Bug overview with:

- Problem description
- Root cause summary
- Fix applied
- Validation results
- References to related files

2. **Analysis Documentation** - Detailed investigation process:

- Initial symptoms and error messages
- Hypothesis formation and testing
- Step-by-step debugging methodology
- Dead ends and lessons learned

3. **Test Artifacts** - Evidence and test cases:
- Configuration files used during testing
- Test scripts and validation procedures
- Before/after comparisons
- Reproducible test cases

### Naming Conventions

- **Directories**: Use format `NNN-short-description` (e.g., `001-ssh-authentication-failure`)
- **Files**: Use descriptive names with consistent prefixes:
- `ANALYSIS_` for investigation documentation
- `SUMMARY_` for comprehensive overviews
- `test-` for test configurations
- `validation-` for final verification artifacts

### Content Guidelines

- **Be Comprehensive**: Include all relevant information, even failed attempts
- **Document Process**: Explain the reasoning behind each debugging step
- **Include Context**: Provide enough background for newcomers to understand
- **Show Evidence**: Include relevant log outputs, error messages, and test results
- **Explain the Fix**: Detail exactly what was changed and why it works
- **Provide Validation**: Include steps to verify the fix and prevent regression

## Usage Examples

### For Contributors Encountering Similar Issues

1. **Search by Symptoms**: Look through bug directories for similar error messages
or behavior patterns
2. **Review Methodology**: Study the debugging approach used in similar cases
3. **Adapt Test Procedures**: Use existing test configurations as templates
4. **Apply Lessons Learned**: Benefit from documented pitfalls and solutions

### For Maintainers

1. **Validate Fixes**: Use documented test cases to ensure fixes remain effective
2. **Onboard New Contributors**: Point to relevant bug documentation for learning
3. **Improve Infrastructure**: Identify patterns in bugs to prevent future issues
4. **Review Process**: Use documented methodologies to improve debugging practices

## Quality Standards

All bug documentation should:

- βœ… Be reproducible by following the documented steps
- βœ… Include complete context and background information
- βœ… Demonstrate systematic debugging methodology
- βœ… Provide clear validation procedures
- βœ… Explain both what worked and what didn't work
- βœ… Include timing information and performance impacts
- βœ… Reference related infrastructure components

## Contributing

When adding new bug documentation:

1. **Create New Directory**: Use next available number with descriptive name
2. **Follow Standards**: Use the structure and naming conventions above
3. **Include All Artifacts**: Don't leave out "failed" attempts or test files
4. **Write for Others**: Assume the reader is unfamiliar with the specific issue
5. **Validate Documentation**: Ensure someone else can follow your steps
6. **Update This README**: Add any new patterns or insights to these guidelines

## Index of Documented Bugs

| Bug ID | Description | Status | Impact | Date Resolved |
| ------ | -------------------------- | ----------- | ------------------------ | ------------- |
| 001 | SSH Authentication Failure | βœ… Resolved | High - Blocked VM access | 2025-07-04 |

---

_This archive serves as a knowledge base for infrastructure debugging and should
be maintained as a valuable resource for the Torrust community._
1 change: 1 addition & 0 deletions project-words.txt
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ logpath
mailcatcher
Makefiles
maxretry
misprocess
mkisofs
netdev
newgrp
Expand Down