Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions .claude/plans/cloudwatch-logging-standardization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# CloudWatch Logging Standardization Plan

## Task Tracker

| Phase | Task | Status |
|-------|------|-----------|
| 1 | Fix logrotate ACL persistence | ✅ Done |
| 2 | Remove btmp/wtmp from jumphost config | ✅ Done |
| 2 | Remove utmp group from cwagent user | ✅ Done |
| 2 | Remove duplicate DiskSpaceUsed metric | ✅ Done |
| 2 | Propagate Phase 2 to sandbox environment | ✅ Done |
| 2 | Propagate Phase 2 to development environment | ✅ Done |
| 3 | Create shared base class `profile::cloudwatch_agent` | ⬜ Pending |
| 3 | Create shared ACL scripts | ⬜ Pending |
| 4 | Upgrade OpenVPN CloudWatch manifest | ⬜ Pending |
| 4 | Upgrade OpenVPN CloudWatch template | ⬜ Pending |
| 5 | Update jumphost to use shared base | ⬜ Pending |
| 5 | Delete old jumphost ACL scripts | ⬜ Pending |
| 6 | Create OpenVPN auditd profile | ⬜ Pending |
| 6 | Include auditd in openvpn_server.pp | ⬜ Pending |
| - | Run puppet-lint validation | ⬜ Pending |

## Overview

Standardize CloudWatch logging across EC2 services (jumphost, openvpn_server) with consistent, secure,
and SOC2/ISO27001 compliant patterns.

## Issues to Fix


### 1. Audit Log Permission Denied (Critical) ✅ FIXED
- **Problem**: Logrotate creates new `audit.log` with `create 0640 root root` but no ACLs
- **Impact**: CloudWatch agent loses access after log rotation
- **Fix**: Add postrotate ACL reapplication in logrotate config
- **Status**: Fixed in commit `43fc19a` (PR #215)

### 2. Binary Log Files (btmp/wtmp)
- **Problem**: Binary files that CloudWatch agent can't parse
- **Impact**: Missing streams `auth/successful-logins`, garbled data in `auth/failed-logins`
- **Fix**: Remove from CloudWatch config, remove `utmp` group from cwagent user

### 3. Inconsistent Implementations
- **Problem**: OpenVPN has weaker security/logging than jumphost
- **Fix**: Upgrade OpenVPN to match jumphost pattern

## Implementation Plan

### Phase 1: Fix Logrotate ACL Persistence

**File**: `modules/profile/templates/auditd/logrotate.erb`

Add postrotate ACL reapplication:
```erb
postrotate
/usr/sbin/service auditd rotate
# Reapply ACLs for CloudWatch agent access
if [ -x /usr/local/bin/set-audit-acl ]; then
/usr/local/bin/set-audit-acl
fi
endscript
```

### Phase 2: Remove Binary Log Files

**File**: `modules/profile/templates/jumphost/amazon-cloudwatch-agent.json.erb`

Remove entries for:
- `/var/log/btmp` (failed-logins)
- `/var/log/wtmp` (successful-logins)

**File**: `modules/profile/manifests/jumphost/cloudwatch_agent.pp`

Change cwagent groups from `['adm', 'utmp']` to `['adm']`

### Phase 3: Create Shared Base Class

**New File**: `modules/profile/manifests/cloudwatch_agent.pp`

Shared resources:
- Package `acl`
- Script `/usr/local/bin/set-audit-acl`
- Script `/usr/local/bin/check-audit-acl`

**New Files**:
- `modules/profile/templates/cloudwatch_agent/set-audit-acl.sh.erb`
- `modules/profile/templates/cloudwatch_agent/check-audit-acl.sh.erb`

### Phase 4: Standardize OpenVPN CloudWatch Agent

**File**: `modules/profile/manifests/openvpn_server/cloudwatch_agent.pp`

Changes:
1. Include `profile::cloudwatch_agent` base class
2. Add cwagent user with `['adm']` group
3. Fix config file permissions: `0644` -> `0640`
4. Add ACL exec for audit log access
5. Add health check cron (every 5 min)
6. Add monitoring script `/usr/local/bin/check-cloudwatch-agent`

**File**: `modules/profile/templates/openvpn_server/amazon-cloudwatch-agent.json.erb`

Changes:
1. Add agent section (`run_as_user: cwagent`, `buffer_time: 10000`)
2. Add `timezone: UTC` to all log entries
3. Standardize log stream naming (hierarchical: `{instance_id}/category/type`)
4. Add audit.log collection
5. Add dpkg.log for package tracking
6. Add metrics section (CPU, disk, memory, swap, procstat for openvpn)
7. Add dimensions (Hostname, Environment)

### Phase 5: Update Jumphost to Use Shared Base

**File**: `modules/profile/manifests/jumphost/cloudwatch_agent.pp`

Changes:
1. Include `profile::cloudwatch_agent` base class
2. Remove duplicate ACL package/script resources
3. Update ACL exec to depend on shared class

**Files to Delete**:
- `modules/profile/templates/jumphost/set-audit-acl.sh.erb`
- `modules/profile/templates/jumphost/check-audit-acl.sh.erb`

### Phase 6: Add OpenVPN Auditd Profile

**New File**: `modules/profile/manifests/openvpn_server/auditd.pp`

Include base `profile::auditd` and optionally add OpenVPN-specific rules.

**File**: `modules/profile/manifests/openvpn_server.pp`

Add: `include 'profile::openvpn_server::auditd'`

## File Summary

### Create:
| File | Purpose |
|------|---------|
| `modules/profile/manifests/cloudwatch_agent.pp` | Shared base class |
| `modules/profile/templates/cloudwatch_agent/set-audit-acl.sh.erb` | Shared ACL script |
| `modules/profile/templates/cloudwatch_agent/check-audit-acl.sh.erb` | Shared ACL check |
| `modules/profile/manifests/openvpn_server/auditd.pp` | OpenVPN auditd config |

### Modify:
| File | Changes |
|------|---------|
| `modules/profile/templates/auditd/logrotate.erb` | ✅ Already done (PR #215) |
| `modules/profile/templates/jumphost/amazon-cloudwatch-agent.json.erb` | Remove btmp/wtmp |
| `modules/profile/manifests/jumphost/cloudwatch_agent.pp` | Use shared base, remove utmp |
| `modules/profile/manifests/openvpn_server/cloudwatch_agent.pp` | Full upgrade |
| `modules/profile/templates/openvpn_server/amazon-cloudwatch-agent.json.erb` | Add agent/metrics/timezone |
| `modules/profile/manifests/openvpn_server.pp` | Include auditd |

### Delete:
| File | Reason |
|------|---------|
| `modules/profile/templates/jumphost/set-audit-acl.sh.erb` | Moved to shared |
| `modules/profile/templates/jumphost/check-audit-acl.sh.erb` | Moved to shared |

## Environment Propagation

After changes to `modules/profile/`:
1. Copy to `environments/sandbox/modules/profile/`
2. Copy to `environments/development/modules/profile/`

## Testing

```bash
# Puppet lint
puppet-lint --fail-on-warnings modules/profile

# Verify ACL persistence
sudo logrotate -f /etc/logrotate.d/audit
getfacl /var/log/audit/audit.log
sudo -u cwagent cat /var/log/audit/audit.log | head

# Verify CloudWatch agent
/usr/local/bin/check-cloudwatch-agent
```

## Namespace Defaults

If Terraform doesn't provide `cloudwatch_namespace` fact, Puppet will use defaults:

**In manifest** (`openvpn_server/cloudwatch_agent.pp`):
```puppet
$cloudwatch_namespace = pick($facts['openvpn']['cloudwatch_namespace'], 'InfraHouse/OpenVPN')
```

**In manifest** (`jumphost/cloudwatch_agent.pp`):
```puppet
$cloudwatch_namespace = pick($facts['jumphost']['cloudwatch_namespace'], 'InfraHouse/Jumphost')
```

This ensures metrics always work, with sensible defaults that Terraform can override.

## Rollout

1. **Sandbox** - Deploy, test for 48-72 hours
2. **Development** - Deploy, test for 48-72 hours
3. **Production** - Deploy during maintenance window
28 changes: 14 additions & 14 deletions .claude/plans/compliance-logging-rollout.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,24 +18,24 @@
- [x] Create `jumphost/cloudwatch_agent.pp`
- [x] Create CloudWatch agent config template
- [x] Fix CloudWatch ACL permissions for audit logs
- [ ] Deploy to dev jumphost
- [ ] Validate audit rules: `sudo auditctl -l`
- [ ] Test SSH session logging
- [ ] Verify logs in `/var/log/audit/audit.log`
- [ ] **Release**: `puppet-jumphost-auditd-dev-v1.0.0`
- [x] Deploy to dev jumphost
- [x] Validate audit rules: `sudo auditctl -l`
- [x] Test SSH session logging
- [x] Verify logs in `/var/log/audit/audit.log`
- [x] **Release**: `puppet-jumphost-auditd-dev-v1.0.0`

#### 1.2 Sandbox Environment
- [ ] Copy configuration to sandbox
- [ ] Deploy to sandbox jumphost
- [ ] Run compliance validation
- [ ] Performance impact assessment
- [ ] **Release**: `puppet-jumphost-auditd-sandbox-v1.0.0`
- [x] Copy configuration to sandbox
- [x] Deploy to sandbox jumphost
- [x] Run compliance validation
- [x] Performance impact assessment
- [x] **Release**: `puppet-jumphost-auditd-sandbox-v1.0.0`

#### 1.3 Global Modules (Production)
- [ ] Move to global modules
- [ ] Remove environment-specific versions
- [ ] Deploy to production jumphost
- [ ] **Release**: `puppet-jumphost-auditd-prod-v1.0.0`
- [x] Move to global modules
- [x] Remove environment-specific versions
- [x] Deploy to production jumphost
- [x] **Release**: `puppet-jumphost-auditd-prod-v1.0.0`

### Phase 2: Terraformer
#### 2.1 Development Environment
Expand Down
14 changes: 13 additions & 1 deletion debian/changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
puppet-code (0.1.0-1build261) noble; urgency=medium

* commit event. see changes history in git log

-- root <packager@infrahouse.com> Tue, 23 Dec 2025 13:44:24 +0000

puppet-code (0.1.0-1build260) noble; urgency=medium

* commit event. see changes history in git log

-- root <packager@infrahouse.com> Tue, 23 Dec 2025 02:24:17 +0000

puppet-code (0.1.0-1build259) noble; urgency=medium

* commit event. see changes history in git log
Expand All @@ -8,7 +20,7 @@ puppet-code (0.1.0-1build258) noble; urgency=medium

* commit event. see changes history in git log

-- root <packager@infrahouse.com> Tue, 23 Dec 2025 00:27:36 +0000
-- root <packager@infrahouse.com> Tue, 23 Dec 2025 02:22:09 +0000

puppet-code (0.1.0-1build257) noble; urgency=medium

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,9 @@

# Add cwagent user to groups needed to read log files
# adm: for /var/log/syslog, /var/log/auth.log, /var/log/kern.log
# utmp: for /var/log/btmp, /var/log/wtmp
user { 'cwagent':
ensure => present,
groups => ['adm', 'utmp'],
groups => ['adm'],
membership => minimum,
require => Package['amazon-cloudwatch-agent'],
notify => Service['amazon-cloudwatch-agent'],
Expand Down Expand Up @@ -98,12 +97,13 @@
# Configure and start CloudWatch agent
exec { 'configure-cloudwatch-agent-jumphost':
command => "/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config -m ec2 -s -c file:${config_file}",
-a fetch-config -m ec2 -c file:${config_file}",
refreshonly => true,
require => [
File[$config_file],
User['cwagent'],
],
notify => Service['amazon-cloudwatch-agent'],
}

# Ensure CloudWatch agent service is running
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,18 +45,6 @@
"log_stream_name": "{instance_id}/security/fail2ban",
"timezone": "UTC"
},
{
"file_path": "/var/log/btmp",
"log_group_name": "<%= @cloudwatch_log_group %>",
"log_stream_name": "{instance_id}/auth/failed-logins",
"timezone": "UTC"
},
{
"file_path": "/var/log/wtmp",
"log_group_name": "<%= @cloudwatch_log_group %>",
"log_stream_name": "{instance_id}/auth/successful-logins",
"timezone": "UTC"
},
{
"file_path": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log",
"log_group_name": "<%= @cloudwatch_log_group %>",
Expand Down Expand Up @@ -113,11 +101,6 @@
"rename": "DISK_USED_PERCENT",
"unit": "Percent"
},
{
"name": "used_percent",
"rename": "DiskSpaceUsed",
"unit": "Percent"
},
{
"name": "inodes_free",
"rename": "DISK_INODES_FREE",
Expand Down Expand Up @@ -220,4 +203,4 @@
"Environment": "<%= @environment %>"
}
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -31,10 +31,9 @@

# Add cwagent user to groups needed to read log files
# adm: for /var/log/syslog, /var/log/auth.log, /var/log/kern.log
# utmp: for /var/log/btmp, /var/log/wtmp
user { 'cwagent':
ensure => present,
groups => ['adm', 'utmp'],
groups => ['adm'],
membership => minimum,
require => Package['amazon-cloudwatch-agent'],
notify => Service['amazon-cloudwatch-agent'],
Expand Down Expand Up @@ -98,12 +97,13 @@
# Configure and start CloudWatch agent
exec { 'configure-cloudwatch-agent-jumphost':
command => "/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config -m ec2 -s -c file:${config_file}",
-a fetch-config -m ec2 -c file:${config_file}",
refreshonly => true,
require => [
File[$config_file],
User['cwagent'],
],
notify => Service['amazon-cloudwatch-agent'],
}

# Ensure CloudWatch agent service is running
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,18 +45,6 @@
"log_stream_name": "{instance_id}/security/fail2ban",
"timezone": "UTC"
},
{
"file_path": "/var/log/btmp",
"log_group_name": "<%= @cloudwatch_log_group %>",
"log_stream_name": "{instance_id}/auth/failed-logins",
"timezone": "UTC"
},
{
"file_path": "/var/log/wtmp",
"log_group_name": "<%= @cloudwatch_log_group %>",
"log_stream_name": "{instance_id}/auth/successful-logins",
"timezone": "UTC"
},
{
"file_path": "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log",
"log_group_name": "<%= @cloudwatch_log_group %>",
Expand Down Expand Up @@ -113,11 +101,6 @@
"rename": "DISK_USED_PERCENT",
"unit": "Percent"
},
{
"name": "used_percent",
"rename": "DiskSpaceUsed",
"unit": "Percent"
},
{
"name": "inodes_free",
"rename": "DISK_INODES_FREE",
Expand Down Expand Up @@ -220,4 +203,4 @@
"Environment": "<%= @environment %>"
}
}
}
}
Loading