[Feature] Alert Preview & "Would Have Fired" Analysis

## Feature Description
Before enabling an alert rule, show users how many times it would have triggered in the past 7 days (or custom time window). This "alert preview" feature helps users tune thresholds, avoid alert fatigue, and build confidence that the alert will actually be useful.

## Problem/Use Case
**Current problem:**
- Users create alert rules blindly, hoping the threshold is "about right"
- Alert goes live and either:
  - Fires constantly (alert fatigue) → gets disabled
  - Never fires (threshold too high) → misses real issues
- No way to know if an alert is tuned correctly without trial-and-error
- Takes weeks to realize an alert is poorly configured
- Teams lose trust in alerting systems due to false positives

**Real-world scenario:**
```
DevOps creates alert: "Trigger when error rate > 100/min"

Possibilities:
❌ Too sensitive: Fires 50 times/day → ignored → real issue missed
❌ Too loose: Never fires → critical outage goes unnoticed for hours
✓ Just right: Fires 2-3 times/week for real issues

Problem: No way to know which scenario you're in until it's live!
```

**User frustration:**
> "I set up an alert for high error rates, but I have no idea if 100 errors/min is a good threshold. Should it be 50? 500? I'm just guessing."

## Proposed Solution

**Add "Alert Preview" feature that analyzes historical data:**

**UI/UX Flow:**

**Step 1: Create alert rule (as usual)**
```
Alert Name: High Error Rate
Condition: level:error
Threshold: rate > 100/min for 5 minutes
```

**Step 2: Click "Preview Alert" button**

**Step 3: See analysis**
```
┌─────────────────────────────────────────────────────┐
│ 📊 Alert Preview (Last 7 Days)                     │
├─────────────────────────────────────────────────────┤
│                                                     │
│ This alert would have fired 23 times                │
│                                                     │
│ Breakdown:                                          │
│ • 15 times on weekdays (during business hours)     │
│ • 8 times on weekends                              │
│                                                     │
│ Average duration: 3.2 minutes                       │
│ Longest incident: 47 minutes (Jan 12, 14:32)       │
│                                                     │
│ Most recent trigger:                                │
│ • Yesterday at 14:32 (347 errors/min, 12min)       │
│ • Jan 13 at 09:15 (156 errors/min, 4min)           │
│ • Jan 12 at 14:32 (523 errors/min, 47min) ← worst  │
│                                                     │
│ ⚠️ Suggestion:                                      │
│ This alert may be too sensitive. Consider:         │
│ • Increasing threshold to 150/min                  │
│ • Adding time-of-day filters (weekdays only)       │
│ • Requiring 10min duration instead of 5min         │
│                                                     │
│ [Adjust Threshold] [Enable Alert] [Cancel]         │
└─────────────────────────────────────────────────────┘
```

**Step 4: Adjust and re-preview**
```
User changes threshold: 100/min → 150/min
Clicks "Preview" again
New result: "Would have fired 7 times" ← much better!
```

**Step 5: Enable with confidence**
```
User clicks "Enable Alert"
→ Alert goes live with tuned threshold
→ Minimal false positives
→ Team trusts the alert system
```

## Alternatives Considered

1. **Manual backtesting**
   - User must manually query logs and count matches
   - ✗ Time-consuming, error-prone
   - ✗ Doesn't show timeline or suggestions

2. **"Dry run" mode for alerts**
   - Alert runs but doesn't notify, logs what would have fired
   - ✗ Must wait days/weeks to gather data
   - ✗ Still trial-and-error
   - ✓ Could complement preview feature

3. **AI-suggested thresholds**
   - ML analyzes patterns and suggests optimal values
   - ✗ Black box, users don't understand why
   - ✗ Requires ML infrastructure
   - ✗ Overkill for most cases

4. **Show only aggregated stats (no preview)**
   - Display "avg errors/min: 45" in UI
   - ✗ User still has to mentally calculate
   - ✗ Doesn't show actual trigger events

**Chosen approach:** Historical simulation with visual timeline + actionable suggestions

## Implementation Details (Optional)

**Technical approach:**

**1. Backend: Alert simulation engine**
```typescript
interface AlertPreview {
  totalTriggers: number;
  incidents: AlertIncident[];
  suggestions: AlertSuggestion[];
  statistics: {
    avgDuration: number;
    maxDuration: number;
    byDayOfWeek: Record<string, number>;
    byHourOfDay: Record<number, number>;
  };
}

interface AlertIncident {
  startTime: Date;
  endTime: Date;
  duration: number; // minutes
  peakValue: number;
  sampleLogs: LogEntry[];
}

async function previewAlert(
  rule: AlertRule,
  timeWindow: { start: Date; end: Date }
): Promise<AlertPreview> {
  // 1. Execute alert query against historical logs
  const results = await queryHistoricalLogs(rule.query, timeWindow);
  
  // 2. Apply threshold logic with sliding window
  const incidents: AlertIncident[] = [];
  let currentIncident: AlertIncident | null = null;
  
  for (const window of slidingWindows(results, rule.duration)) {
    const value = aggregateWindow(window, rule.aggregation); // count, rate, etc.
    
    if (evaluateThreshold(value, rule.threshold)) {
      if (!currentIncident) {
        currentIncident = {
          startTime: window.start,
          endTime: window.end,
          peakValue: value,
          sampleLogs: window.logs.slice(0, 5),
        };
      } else {
        // Extend current incident
        currentIncident.endTime = window.end;
        currentIncident.peakValue = Math.max(currentIncident.peakValue, value);
      }
    } else if (currentIncident) {
      // Incident ended
      currentIncident.duration = 
        (currentIncident.endTime - currentIncident.startTime) / 60000;
      incidents.push(currentIncident);
      currentIncident = null;
    }
  }
  
  // 3. Generate statistics
  const statistics = calculateStatistics(incidents);
  
  // 4. Generate suggestions
  const suggestions = generateSuggestions(rule, incidents, statistics);
  
  return {
    totalTriggers: incidents.length,
    incidents,
    suggestions,
    statistics,
  };
}
```

**2. Suggestion engine**
```typescript
function generateSuggestions(
  rule: AlertRule,
  incidents: AlertIncident[],
  stats: AlertStatistics
): AlertSuggestion[] {
  const suggestions: AlertSuggestion[] = [];
  
  // Too many triggers?
  if (incidents.length > 20) {
    suggestions.push({
      type: 'threshold_too_low',
      message: 'Alert may be too sensitive (23 triggers in 7 days)',
      action: {
        type: 'adjust_threshold',
        currentValue: rule.threshold,
        suggestedValue: calculateOptimalThreshold(incidents, 0.3), // 30th percentile
        reason: 'Would reduce triggers to ~7/week',
      },
    });
  }
  
  // Too few triggers?
  if (incidents.length === 0) {
    suggestions.push({
      type: 'threshold_too_high',
      message: 'Alert would never have fired',
      action: {
        type: 'adjust_threshold',
        currentValue: rule.threshold,
        suggestedValue: calculateOptimalThreshold(incidents, 0.95),
        reason: 'Would catch 95th percentile spikes',
      },
    });
  }
  
  // Noisy during specific times?
  if (stats.byHourOfDay[2] > incidents.length * 0.3) {
    suggestions.push({
      type: 'time_filter',
      message: '30% of triggers happen at 2am (likely batch jobs)',
      action: {
        type: 'add_time_filter',
        suggestedFilter: 'hour >= 6 AND hour <= 22', // Only 6am-10pm
        reason: 'Exclude scheduled maintenance windows',
      },
    });
  }
  
  return suggestions;
}
```

**3. Frontend UI**
```typescript
// Alert preview component
function AlertPreviewModal({ rule, onClose, onApply }) {
  const [preview, setPreview] = useState<AlertPreview | null>(null);
  const [loading, setLoading] = useState(true);
  const [timeWindow, setTimeWindow] = useState('7d');
  
  useEffect(() => {
    loadPreview();
  }, [rule, timeWindow]);
  
  async function loadPreview() {
    setLoading(true);
    const result = await api.previewAlert(rule, timeWindow);
    setPreview(result);
    setLoading(false);
  }
  
  function applySuggestion(suggestion: AlertSuggestion) {
    // Update rule with suggested changes
    // Re-run preview with new values
  }
  
  return (
    <Modal>
      <h2>Alert Preview: {rule.name}</h2>
      
      {loading ? <Spinner /> : (
        <>
          <StatsSummary preview={preview} />
          <IncidentTimeline incidents={preview.incidents} />
          <Suggestions 
            suggestions={preview.suggestions}
            onApply={applySuggestion}
          />
          
          <Button onClick={() => onApply(rule)}>
            Enable Alert
          </Button>
        </>
      )}
    </Modal>
  );
}
```

**4. Database optimization**
```sql
-- Preview queries need to be fast
-- Ensure indexes support common alert patterns

CREATE INDEX idx_logs_preview 
ON logs (source_id, timestamp, level)
WHERE timestamp > NOW() - INTERVAL '30 days';

-- For rate-based alerts
CREATE INDEX idx_logs_time_bucket
ON logs (source_id, time_bucket('1 minute', timestamp));
```

**Performance considerations:**
- Cache preview results (invalidate on new logs)
- Limit preview window to max 30 days
- Sample data for very high-volume sources
- Run preview queries asynchronously (show progress bar)
- Pre-aggregate common metrics (errors/min, etc.)

## Priority
- [ ] Critical - Blocking my usage of LogTide
- [x] High - Would significantly improve my workflow
- [ ] Medium - Nice to have
- [ ] Low - Minor enhancement

**Rationale:** This feature **dramatically reduces alert fatigue** and makes Logtide's alerting system actually usable for production teams. It's the difference between "alerts I trust" and "alerts I ignore."

## Target Users
- [x] DevOps Engineers (primary: responsible for alerting)
- [x] Developers (configure alerts for their services)
- [x] Security/SIEM Users (tune security alerts)
- [x] System Administrators
- [ ] All Users

**Primary benefit:** Anyone who creates alerts and wants them to be useful, not noisy.

## Additional Context

**Why this is important:**

**1. Alert fatigue is a real problem:**
```
Gartner study: "50% of alerts are ignored due to false positives"
PagerDuty: "Average team receives 200+ alerts/week, only 30 are actionable"

With preview:
→ User sees "would fire 200 times/week"
→ Adjusts threshold
→ New preview: "would fire 8 times/week"
→ Enables alert with confidence
```

**2. Competitive differentiation:**
- **Datadog:** No preview feature (just trial-and-error)
- **PagerDuty:** Has "alert testing" but requires live traffic
- **Grafana:** No built-in preview
- **Splunk:** Has backtesting but it's complex
- **Logtide advantage:** Built-in, visual, actionable

**3. Trust-building:**
Users trust Logtide more when it helps them avoid mistakes before making them.

**Real user scenario:**

**Without preview:**
```
Day 1: Create alert "error rate > 50/min"
Day 2: Alert fires 30 times
Day 3: Increase to 100/min, still fires 20 times
Day 4: Increase to 200/min, never fires
Day 5: Miss critical outage because threshold too high
Week 2: Disable alert entirely, go back to manual monitoring
```

**With preview:**
```
Day 1: Create alert "error rate > 50/min"
Day 1: Preview shows "would fire 89 times in last week"
Day 1: Adjust to 120/min, preview shows "would fire 6 times"
Day 1: Enable alert with confidence
Week 2: Alert fires twice for real issues, team responds
```

**Marketing angles:**

> "Stop guessing. Start knowing. Preview exactly how your alerts will behave before enabling them."

> "Logtide's Alert Preview helps you tune thresholds in seconds, not weeks."

**Future enhancements:**
- Compare multiple threshold values side-by-side
- Export preview report for team review
- "Seasonal" preview (compare same day last month)
- Integration with detection packs (preview entire pack)
- A/B testing for alert rules

**Educational content opportunity:**
```
Blog post: "The Alert Tuning Problem (And How We Solved It)"
- Explain alert fatigue
- Show preview feature
- Include best practices
- Position Logtide as thoughtfully designed
```

**Implementation phases:**

**MVP (v1):**
- Basic preview: trigger count, last 7 days
- Simple timeline of incidents
- No suggestions (yet)

**v2:**
- Add suggestions engine
- Time-of-day analysis
- Duration statistics

**v3:**
- Multiple threshold comparison
- Seasonal analysis
- Team sharing

## Contribution
- [ ] I would like to work on implementing this feature


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Alert Preview & "Would Have Fired" Analysis #91

Feature Description

Problem/Use Case

Proposed Solution

Alternatives Considered

Implementation Details (Optional)

Priority

Target Users

Additional Context

Contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Alert Preview & "Would Have Fired" Analysis #91

Description

Feature Description

Problem/Use Case

Proposed Solution

Alternatives Considered

Implementation Details (Optional)

Priority

Target Users

Additional Context

Contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions