
Cloud service emergency response planning
Keeping customers prepared, informed, and connected before, during and after major cloud service incidents
Project date: August 2022
Duration: 2.5 months
My Role: Lead UX Design & Supporting Research
Cloud service emergency response plans are designed to help customers identify subscriptions and resources critical to their workloads and prepare for, respond to, and mitigate risks from impacting service incidents, such as unplanned outages. This entails identifying the best suited people in the customer’s organization to be alerted of incidents impacting each subscription and ensuring those people are configured to receive alerts via their preferred channel for any, and all, specific conditions deemed necessary for that workload. It also ensures these individuals have instructions available to follow in response to various incident types.
In the past, customer account managers led the creation and onboarding of emergency response plans for customers. For this project we were asked to design the digital version of response plans in the customer facing portal for a large enterprise software company.
The challenge
I worked on this project with a team of 11; five program managers (PMs), three designers (including me), two engineers, and one researcher.
I filled the role of lead designer and I also played a supporting role in research.
My role
The timeline
Kickoff: August 1, 2022
To begin, I analyzed business specifications, requirements, and strategy documents and participated in reviews to understand the project OKRs and which KPIs design would be accountable for. I also analyzed artifacts related to the legacy emergency planning process and contributed this insight in stakeholder reviews.
Next we met regularly to analyze tasks in the legacy process, model workflows for the new digital process, and review design prototypes based on chosen workflows.
Afterward, we reached out to the customer account managers of high priority customers to gather feedback and ask for access to customer research participants.
Lastly, we conducted research with customer participants to validate and test our preliminary design and gain insight for future iterations.
The approach
Discovery
Spec review
Artifact review
Stakeholder interviews
Exploration
Task analysis
Workflow modeling
Prototyping & Design Review
Testing
Participant recruiting
Semi-structured interview
Cognitive walkthrough
Listening
Persona creation
Feedback review
How I contributed
Service incidents, as defined by the large enterprise software company, are events that typically cause multiple customers to have a degraded experience with one or more of their services.
The problem
DISCOVERY
But, when the software company says, “this isn’t a major incident”, their customers don’t always agree.
The problem
DISCOVERY
Regardless whether their customers agree, major incidents impacting a single customer warrant a thorough response and communication plan, whether it’s driven by the software company or the customer organization.
DISCOVERY
The problem
The legacy response plan process is based on static PPT documents which are difficult to read and understand. They contains high level concepts, terms and definitions, protocols, and information about support channels and the expected frequency of incident updates. A different PPT document is used for each of the company’s three cloud portals.
Response plan PPTs are specifically tailored for individual customers by the account managers. These PPTs can be as much as 95 pages long and often take hours to prepare and deliver, which ultimately costs the company a large amount of money over time.
Importantly, the document contains forms where customers are instructed to start gathering key information about their response plan, who is part of it, and what their roles and responsibilities will be.
The problem
DISCOVERY
Plans created in last 6 months
4,144
The problem
DISCOVERY
Rate per hour
x $250
Time to create a plan
x 3 hours
Total cost of creating plans of last 6 months
= $3,108,000
The goal
DISCOVERY
Target increase in customer adoption
20%
Target reduction in plan creation time
83.3%
$2,590,000
Target cost savings over next 6 months
Additional UX Goals
Reduce friction and inefficiencies
Benchmark plan CSAT & incident readiness
EXPLORATION
Task analysis & Workflow modeling
After analyzing the existing PPT plan for all 3 cloud portals, I identified eight primary categories and 15 subprocesses within them. Of these, the team identified five subprocesses across 2 categories as critical to a minimum viable experience, or P0 design requirements:
Definitions
What is a cloud service emergency?
What is a cloud service emergency response plan?
Prepare
Identify key resources (subscriptions)
Identify key contacts
Configure alerts and notifications
EXPLORATION
Design review & Prototyping
EXPLORATION
Design review & Prototyping
TESTING
Semi-structured interviews
Next, I conducted three semi-structured interviews, each with at least one customer account manager representing a high priority enterprise customer. During these interviews I demonstrated the proposed design via interactive Figma prototype, asked questions, and recorded feedback for analysis.
Research objectives
Ensure main features are discoverable, understandable, and usable
Identify opportunities for improvement
Gain access to customer research participants
Participants
Five customer account managers representing three high-priority enterprise customers
TESTING
Semi-structured interviews: Key findings
Customers need a way to communicate internally known issues and escalate them to other members of their emergency response teams, even if they are not deemed a major incident by the software company.
People
Customers need to designate and display at least one email distribution list address with their response plan to enable communication with internal customer response team members working to provide their organization around the clock (24/7) emergency coverage.
Repetitive exposure to multiple formats of response plan training material is usually needed for before customers fully understand response planning.
Platform
Customers need to be able to reference all of the categories and subprocesses in the legacy response plans in the digital experience as well.
Process
Cognitive walkthrough
TESTING
Additionally I helped design, run, and observe one cognitive walkthrough with two customer account managers from a fourth high-priority customer. In this study we learned about participants roles and duties, how often they interact with customers, and their current level of satisfaction with response planning. We also posed the following tasks for participants perform by telling us how they would interact with the prototype in each scenario:
Create a workflow with a response plan
Take action on a workload response plan that needs review
Take action on a workload with a response plan that is actively being impacted by a service incident.
Research objectives
Benchmark current user experience for comparison against P0 designs
Ensure main features are discoverable, understandable, and usable
Identify opportunities for improvement
Assess customers’ expectations and behaviors when choosing specific workload classifications
Identify customers’ needs for role based access in response planning
Participants
Two customer account managers representing one high-priority Customer.
TESTING
Cognitive walkthrough: Key findings
Customer account managers expect a specific individual to be the owner of the response planning process for the customer. They expect this individual customer role to be responsible for communicating updates regarding services incidents both internally, to response team members, and externally to the software company incident management department.
People
The customer role that owns response planning is not the same customer role that would own workload creation and management.
Customer account managers update and review response plans with their customers every time the customer adds an application to their workload. This leads to the PPT documents becoming quickly outdated.
Platform
Customers don’t understand the importance and criticality of the response plan document.
Process
Internal proto-personas
LISTENING
Customer proto-personas
LISTENING
TESTING
Recommendations
Make response plan onboarding and training material more organized and suitable for customer con
People
Make response plan training material comprehensive; to include all related processes and subprocesses.
Allow customer users to gather and display all necessary customer contacts and distribution list addresses in the response plan UI.
Make response plans dynamically update in response to customer workload subscriptions, so that customer account managers don’t have to update anything when a customer removes or adds an application to their workload.
Process

Feedback review
LISTENING
Lessons learned
Rally project partners early
Benchmarking legacy customer experience
Need to map customer and internal user journeys
Advocate for better customer experience
Be on the lookout for teams like this one:
Growth mindset oriented
Empathetic
Passionate
Proactive
Brilliant!