1. Internal Knowledge Base
  2. Internal Process and Playbooks

Major Incident Communication Playbook

Use this process to provide communication to customers when a product incident/issue is affecting a subset or the entire customerbase

There are two primary ways iDonate may identify an incident:

  • Internal monitoring alerts of outage (Engineering)
  • Multiple concurrent customer reports (Customer Support or Success)

Workflow

  • Notify teams in #support_tech Slack channel
  • Support and Success - Create a ticket for each customer report and use the templates below to update customer as the issue progresses
  • File JIRA ticket including the following:
    • Customer(s) impacted
    • Start time of incident
    • Steps taken prior to issue/error (steps to duplicate)
    • Symptoms and/or errors
  • Link all related customer tickets to JIRA ticket
  • Support - All tickets should receive regular updates on the status of the issue and should be notified that the issue is resolved as soon as it is confirmed with Engineering. Responses to these tickets should be considered top priority.
  • Engineering - What expectations should we set for timing on RCAs?

Templates for Pendo

-------------------------------------------------------
:rotating_light: Service Status Update: Issue Under Investigation :rotating_light:
We're currently investigating an issue that may be impacting some of our services. Our team is working diligently to identify the cause and resolve it as quickly as possible.Estimated Resolution Time: Unknown
Symptoms: <Insert error(s) or other>We apologize for any inconvenience this may cause and appreciate your patience as we work to address this issue.

----------------------------------------------------------
:rotating_light: Service Status Update: Issue Identified - Fix in Progress :rotating_light:
We've identified the issue that may be impacting your experience with iDonate. Our team is actively working on implementing the fix to resolve this issue as quickly as possible.Estimated Resolution Time: <Insert ETA>
Symptoms: <Insert error(s) or other>We apologize for any inconvenience this may cause and appreciate your patience and understanding.

-------------------------------------------------------------

NOTE: This final notification should have an expiration time set for 1 hour from posting

:rotating_light: Service Status Update: Issue Resolved :rotating_light:
Great news! The issue has been resolved, and our services are back to normal. Thank you for your patience and understanding while we worked to address this issue. If you have any further questions or concerns, please don't hesitate to reach out to our support team.

-------------------------------------------------------------

Example Internal Notification Email

Incident Overview

Date: 8/30/24

  • Incident Description: Credit Card Testing
    • Fraudsters are exploiting nonprofit donation forms to validate stolen credit card numbers. They typically make small, low-dollar "donations" to test if the card is active. If the transaction is approved, they can use the card for larger fraudulent purchases elsewhere.
  • Impact Summary:

Customers

  • This fraudulent activity leads to multiple unauthorized transactions, increasing chargebacks, financial losses, and potentially damaging the nonprofit's reputation.


iDonate

      • Success – Previously "Green" customers have shifted to "Yellow" status due to decreased confidence in our ability to prevent this activity, increasing the risk of churn.
      • Support – Routine product issues and inquiries are receiving less attention as the team deals with a surge in card testing incidents. Addressing these issues requires collaboration with CardConnect to hold/void batches, customer outreach and explanation, and transaction cancellations in GMS2.
      • Sales/Mktg – There is a decline in the number of customers willing to serve as references, with more accounts now in "Yellow" health status compared to previous months.
      • Finance – There is a continued rise in overall transaction activity within Spreedly and SIFT, leading to increased operational costs.
      • Engineering – Resources are being diverted from other bug fixes and feature development to focus on investigating and mitigating fraud activity.
  • Current Status: Ongoing

Impact

  • How many orgs impacted in last 24 hours? 29
  • How many failed attempts? 2547
  • How many successful transactions? 30 (13 orgs)
    • Successful transactions since 8/13: 617 (53 orgs)
    • 12 orgs have had successful transactions across 5+ days
  • What trends are we seeing in the attacks?
    • Transactions being created by API calls vs embed
    • City = Miami
    • Most transactions for $5.43
  • What are our current steps to mitigate with ETAs?