Incident Response Plan

#war-room
Last updated: Jul 16, 2023
Contributors: Sid Dabral
#war-room channel: [redacted]

  • Determining if there’s an incident
  • Responding to an incident
    • Response mechanics
      • Where
      • Who
    • Communication to users
      • When
      • How
  • Closing an incident

Determining if there’s an incident

The severity of any bug is determined by its business impact. And the severity affects how quickly we respond. Talking in “typical” bug severity levels, expect that:

  • P0 = drop everything and go to the #war-room
  • P1 = talk about at standup tomorrow
  • P2 = add to a Cycle
  • P3 = add to backlog

A P0 incident is typically triggered by a breakage in a user-facing feature that’s critical to most users being able to use our product.

Examples:

  1. The EDP won’t load
  2. Login/logout is broken
  3. Attendee data on the EDP is wrong

Other things may be P0-worthy, even if they don’t affect how users use vendelux.com app right now. Those would likely be things like:

  1. Security issues
  2. Backend issues that could cause significant data loss
  3. Payment related issues (less likely for us, given the nature of our contracts)

Err on the side of declaring too many P0’s, and marshaling people and information too early:

  • If it smells like a P0, treat it like it is a P0.
  • It’s easy to downgrade a P0 and close the war room, and we’d rather pay the occasional cost of interrupting folks for 15 minutes than delay our response to a critical incident.
  • Feel free to talk to any of {Alex, Stefan, Mike, Dan} or ping them all in #war-room if you’re unsure.

Responding to an incident

Response mechanics

Where

  • Gather in slack in the #war-room channel
  • Start a new, private slack channel #incident-YYYY-MM-DD-description
  • Invite the folks from the “Who” section
  • Leave behind a link to the new channel in #war-room, including:
    • Channel name
    • short description
    • the name of the private channel owner so that others can ask to join

Goal is to keep #war-room as a publicly known meeting place, and to keep sensitive information (e.g. in the case of a security incident) to a small set of folks

Who

[Redacted]

Communication to users

When

It’s especially likely we’ll need external comms if any of the following are true:

  • Key functionality has been or will be down for an extended period of time
  • Something security-related happened that we need to disclose

How

Typically, communication will be through users’ Customer Success managers

Closing an incident

Before the war room clears out, consider creating/scheduling a post-mortem wrangler/doc/meeting