Skip to content

Incident Documentation Guidelines

This file provides specific guidance for writing incident reports in the docs/incidents/ directory.

Content Guidelines

Essential Sections

  • Hook - A very interesting hook, it should provoke the curiosity of readers to want to understand and learn what happened in the incident
  • Incident Summary - Include date, cluster/environment, status
  • Impact - Focus on user and business impact
  • Timeline - Use structured format with clear phases
  • Root Cause Analysis - Both immediate and systemic causes
  • What Could Be Improved - Focus on prevention and detection
  • Next Steps - Actionable items with checkboxes

Timeline Structure Best Practices

When writing incident timelines, apply these key improvements:

  1. Chronological Flow - Events now follow a clear sequence from incident start to resolution
  2. Clear Phases - Separated into distinct stages (Start, Detection, Investigation, Resolution, End)
  3. Time Ranges - Added time windows for multi-step activities
  4. Detection Delay Highlighted - Made the detection gap very visible with summary metrics
  5. Summary Metrics - Added total duration and detection delay at the end

These improvements make incident timelines much easier to follow and help identify systemic issues like monitoring gaps.

Writing Style

  • Use specific timestamps and durations
  • Include actual log snippets when helpful
  • Focus on learning and prevention
  • Keep it concise but complete
  • Use bullet points for clarity