-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Feature Description
Enable users to click any identifier (request_id, trace_id, user_id, order_id, etc.) in a log entry and instantly see all related logs across services, time ranges, and log sources. This creates a narrative timeline of events related to a specific transaction, user action, or request.
Problem/Use Case
Current problem:
- Debugging distributed systems requires manually searching for request IDs across multiple log sources
- Users must copy-paste IDs and run multiple searches to piece together what happened
- No visual timeline showing the sequence of events across services
- Difficult to understand causal relationships between log entries
- Time-consuming to debug incidents involving multiple microservices
Real-world scenario:
1. User reports: "My payment failed at 14:32"
2. DevOps finds error log with request_id: req_abc123
3. Needs to search manually:
- API gateway logs (initial request)
- Auth service logs (token validation)
- Payment service logs (charge attempt)
- Database logs (transaction records)
- Queue logs (webhook processing)
4. Must manually correlate timestamps and piece together the story
5. Takes 20-30 minutes to reconstruct what happened
With event correlation:
1. Click request_id: req_abc123
2. See complete timeline:
14:32:01.234 [API Gateway] Request received
14:32:01.456 [Auth] Token validated
14:32:02.123 [Payment] Stripe API called
14:32:03.789 [Payment] ERROR: Card declined ← root cause
14:32:04.012 [Database] Transaction rolled back
14:32:04.156 [Queue] Webhook retry scheduled
3. Problem identified in 30 seconds
Proposed Solution
Core feature: Auto-detect and link common identifiers
Phase 1: ID Detection & Linking
- Auto-detect common ID patterns in logs:
- UUID format (8-4-4-4-12)
- Request IDs (req_, request-, etc.)
- Trace IDs (trace_, span_, etc.)
- User IDs (user_, uid_, etc.)
- Transaction IDs (txn_, order_, etc.)
- Correlation IDs (correlation_*, x-correlation-id)
Phase 2: UI/UX
Log entry view:
┌─────────────────────────────────────────────────────┐
│ 2025-01-15 14:32:03 ERROR Payment service │
│ Card declined for user_789 │
│ │
│ request_id: req_abc123 ← clickable, highlighted │
│ user_id: user_789 ← clickable, highlighted │
│ transaction_id: txn_xyz ← clickable, highlighted │
└─────────────────────────────────────────────────────┘
On click → Opens correlation view:
┌─────────────────────────────────────────────────────┐
│ Timeline for request_id: req_abc123 │
│ │
│ ▼ 14:32:01.234 [API Gateway] │
│ POST /api/payment received │
│ │
│ ▼ 14:32:01.456 [Auth Service] │
│ Token validated for user_789 │
│ │
│ ▼ 14:32:02.123 [Payment Service] │
│ Stripe API called │
│ │
│ ⚠ 14:32:03.789 [Payment Service] ERROR │
│ Card declined: insufficient_funds │
│ │
│ ▼ 14:32:04.012 [Database] │
│ Transaction rolled back │
└─────────────────────────────────────────────────────┘
Phase 3: Advanced Correlation
- Correlation across multiple IDs (e.g., same user_id + different request_ids)
- Waterfall view showing service dependencies
- Automatic time range expansion (search ±5 minutes from clicked log)
Alternatives Considered
- Manual search only - Current state, too time-consuming
- OpenTelemetry traces required - Too heavyweight, not all apps use OTLP
- Pre-defined correlation rules - Too rigid, doesn't handle custom IDs
- Graph database for relationships - Over-engineered, adds complexity
- Regex-based correlation - Part of solution, but needs smart defaults
Chosen approach: Smart auto-detection with user-configurable patterns
Implementation Details (Optional)
Technical approach:
1. ID Extraction at Ingestion
// During log ingestion, extract structured IDs
interface ExtractedIDs {
request_id?: string;
trace_id?: string;
span_id?: string;
user_id?: string;
transaction_id?: string;
custom_ids: Record<string, string>;
}
function extractIDs(logEntry: LogEntry): ExtractedIDs {
const patterns = {
request_id: /(?:request_id|req_id|requestId)[:\s=]+([a-zA-Z0-9_-]+)/,
trace_id: /(?:trace_id|traceId|x-trace-id)[:\s=]+([a-f0-9-]+)/,
user_id: /(?:user_id|userId|uid)[:\s=]+([a-zA-Z0-9_-]+)/,
// ... more patterns
};
// Also check structured fields (JSON logs)
// Also check OTLP attributes
return extractedIDs;
}2. Database Schema
-- Store extracted IDs for fast correlation queries
CREATE TABLE log_identifiers (
log_id UUID REFERENCES logs(id),
identifier_type VARCHAR(50), -- 'request_id', 'user_id', etc.
identifier_value VARCHAR(255),
timestamp TIMESTAMPTZ,
PRIMARY KEY (log_id, identifier_type),
INDEX idx_identifier_lookup (identifier_type, identifier_value, timestamp)
);
-- Query for correlation:
SELECT l.*
FROM logs l
JOIN log_identifiers li ON l.id = li.log_id
WHERE li.identifier_type = 'request_id'
AND li.identifier_value = 'req_abc123'
ORDER BY l.timestamp;3. UI Implementation
// Make IDs clickable in log viewer
function renderLogMessage(message: string, extractedIDs: ExtractedIDs) {
let rendered = message;
for (const [type, value] of Object.entries(extractedIDs)) {
rendered = rendered.replace(
value,
`<a class="correlation-link" data-type="${type}" data-value="${value}">
${value}
</a>`
);
}
return rendered;
}
// Handle click
function onCorrelationLinkClick(type: string, value: string) {
// Open correlation view modal or sidebar
showCorrelationTimeline(type, value);
}4. Configuration UI
# User-configurable correlation patterns
correlation_patterns:
- name: "Request ID"
pattern: "(?:request_id|req)[:\s=]+([a-zA-Z0-9_-]+)"
enabled: true
- name: "Order ID"
pattern: "(?:order_id|order)[:\s=]+([a-zA-Z0-9_-]+)"
enabled: true
- name: "Custom Job ID"
pattern: "job_([a-f0-9-]+)"
enabled: truePerformance considerations:
- Index
log_identifierstable heavily - Limit correlation queries to reasonable time windows (default ±5min, max ±1 hour)
- Cache common correlation queries
- Lazy-load timeline entries (load more as user scrolls)
Priority
- Critical - Blocking my usage of LogTide
- High - Would significantly improve my workflow
- Medium - Nice to have
- Low - Minor enhancement
Rationale: This is a core differentiator that transforms Logtide from "log search" to "incident investigation tool". It's the kind of feature that's genuinely hard to replicate and provides massive time savings during critical debugging sessions.
Target Users
- DevOps Engineers (primary: incident response)
- Developers (primary: debugging distributed systems)
- Security/SIEM Users (secondary benefit)
- System Administrators
- All Users
Primary audience: Teams running microservices, distributed systems, or any architecture where a single user action involves multiple services/components.
Additional Context
Why this is a moat:
- Grep can't do this across services
- Basic log search tools don't provide timeline visualization
- Implementing well requires deep understanding of distributed tracing concepts
- The UX matters enormously (auto-detection vs manual configuration)
Competitive analysis:
- Datadog APM: Has this via distributed tracing, but requires APM agent installation ($$$)
- Elastic APM: Similar to Datadog, heavyweight setup
- Grafana Loki: No built-in correlation, requires LogQL wizardry
- Splunk: Has transaction correlation, but enterprise-only and complex
- Logtide advantage: Works with plain logs, no agents required
User testimonial (hypothetical):
"Before Logtide correlation: 20 minutes to debug a payment failure across 5 services. After: 30 seconds. I click the request_id and see the entire story."
Marketing message:
"Click any ID, see the whole story. Logtide automatically correlates logs across your entire system. No agents, no distributed tracing setup required."
Future enhancements:
- Service dependency graph (automatically discover which services talk to each other)
- Anomaly highlighting (automatically flag unusual patterns in correlated logs)
- Export timeline as shareable link for incident reports
- Integration with alerting (when alert fires, show correlated timeline)
Example workflows:
Debugging production incident:
1. Alert fires: "Payment service errors spiking"
2. Click alert → See recent error logs
3. Click request_id on any error
4. Timeline shows:
- Frontend made request
- API validated input
- Payment service called Stripe
- Stripe returned timeout
- Database transaction rolled back
- Queue scheduled retry
5. Root cause identified: Stripe API degradation
Understanding user journey:
1. Support ticket: "User says checkout is broken"
2. Search for user_id: user_12345
3. See all actions for that user:
- Viewed product page
- Added to cart
- Started checkout
- ERROR: Tax calculation failed
- Abandoned cart
4. Fix tax calculation bug
Contribution
- I would like to work on implementing this feature