Cloud service emergency response planning

Keeping customers prepared, informed, and connected before, during and after major cloud service incidents

Project date: August 2022
Duration: 2.5 months
My Role: Lead UX Design & Supporting Research

Cloud service emergency response plans are designed to help customers identify subscriptions and resources critical to their workloads and prepare for, respond to, and mitigate risks from impacting service incidents, such as unplanned outages. This entails identifying the best suited people in the customer’s organization to be alerted of incidents impacting each subscription and ensuring those people are configured to receive alerts via their preferred channel for any, and all, specific conditions deemed necessary for that workload. It also ensures these individuals have instructions available to follow in response to various incident types.

In the past, customer account managers led the creation and onboarding of emergency response plans for customers. For this project we were asked to design the digital version of response plans in the customer facing portal for a large enterprise software company.

The challenge

I worked on this project with a team of 11; five program managers (PMs), three designers (including me), two engineers, and one researcher.

I filled the role of lead designer and I also played a supporting role in research.

My role

The timeline

Kickoff: August 1, 2022

To begin, I analyzed business specifications, requirements, and strategy documents and participated in reviews to understand the project OKRs and which KPIs design would be accountable for. I also analyzed artifacts related to the legacy emergency planning process and contributed this insight in stakeholder reviews.

Next we met regularly to analyze tasks in the legacy process, model workflows for the new digital process, and review design prototypes based on chosen workflows.

Afterward, we reached out to the customer account managers of high priority customers to gather feedback and ask for access to customer research participants.

Lastly, we conducted research with customer participants to validate and test our preliminary design and gain insight for future iterations.

The approach

  • Discovery

    1. Spec review

    2. Artifact review

    3. Stakeholder interviews

  • Exploration

    1. Task analysis

    2. Workflow modeling

    3. Prototyping & Design Review

  • Testing

    1. Participant recruiting

    2. Semi-structured interview

    3. Cognitive walkthrough

  • Listening

    1. Persona creation

    2. Feedback review

How I contributed

Service incidents, as defined by the large enterprise software company, are events that typically cause multiple customers to have a degraded experience with one or more of their services.

The problem

DISCOVERY

But, when the software company says, “this isn’t a major incident”, their customers don’t always agree.

The problem

DISCOVERY

Regardless whether their customers agree, major incidents impacting a single customer warrant a thorough response and communication plan, whether it’s driven by the software company or the customer organization.

DISCOVERY

The problem

The legacy response plan process is based on static PPT documents which are difficult to read and understand. They contains high level concepts, terms and definitions, protocols, and information about support channels and the expected frequency of incident updates. A different PPT document is used for each of the company’s three cloud portals.

Response plan PPTs are specifically tailored for individual customers by the account managers. These PPTs can be as much as 95 pages long and often take hours to prepare and deliver, which ultimately costs the company a large amount of money over time.

Importantly, the document contains forms where customers are instructed to start gathering key information about their response plan, who is part of it, and what their roles and responsibilities will be.

The problem

DISCOVERY

Plans created in last 6 months

4,144

The problem

DISCOVERY

Rate per hour

x $250

Time to create a plan

x 3 hours

Total cost of creating plans of last 6 months

= $3,108,000

The goal

DISCOVERY

Target increase in customer adoption

20%

Target reduction in plan creation time

83.3%

$2,590,000

Target cost savings over next 6 months

Additional UX Goals

  • Reduce friction and inefficiencies

  • Benchmark plan CSAT & incident readiness

EXPLORATION

Task analysis & Workflow modeling

After analyzing the existing PPT plan for all 3 cloud portals, I identified eight primary categories and 15 subprocesses within them. Of these, the team identified five subprocesses across 2 categories as critical to a minimum viable experience, or P0 design requirements:

Definitions

  • What is a cloud service emergency?

  • What is a cloud service emergency response plan?

Prepare

  • Identify key resources (subscriptions)

  • Identify key contacts

  • Configure alerts and notifications

EXPLORATION

Design review & Prototyping

EXPLORATION

Design review & Prototyping

TESTING

Semi-structured interviews

Next, I conducted three semi-structured interviews, each with at least one customer account manager representing a high priority enterprise customer. During these interviews I demonstrated the proposed design via interactive Figma prototype, asked questions, and recorded feedback for analysis.

Research objectives

  • Ensure main features are discoverable, understandable, and usable

  • Identify opportunities for improvement

  • Gain access to customer research participants

 

Participants

  • Five customer account managers representing three high-priority enterprise customers

TESTING

Semi-structured interviews: Key findings

Customers need a way to communicate internally known issues and escalate them to other members of their emergency response teams, even if they are not deemed a major incident by the software company.

People

Customers need to designate and display at least one email distribution list address with their response plan to enable communication with internal customer response team members working to provide their organization around the clock (24/7) emergency coverage.

Repetitive exposure to multiple formats of response plan training material is usually needed for before customers fully understand response planning.

Platform

Customers need to be able to reference all of the categories and subprocesses in the legacy response plans in the digital experience as well.

Process

Cognitive walkthrough

TESTING

Additionally I helped design, run, and observe one cognitive walkthrough with two customer account managers from a fourth high-priority customer. In this study we learned about participants roles and duties, how often they interact with customers, and their current level of satisfaction with response planning. We also posed the following tasks for participants perform by telling us how they would interact with the prototype in each scenario:

  1. Create a workflow with a response plan

  2. Take action on a workload response plan that needs review

  3. Take action on a workload with a response plan that is actively being impacted by a service incident.


Research objectives

  • Benchmark current user experience for comparison against P0 designs

  • Ensure main features are discoverable, understandable, and usable

  • Identify opportunities for improvement

  • Assess customers’ expectations and behaviors when choosing specific workload classifications

  • Identify customers’ needs for role based access in response planning

 

Participants

  • Two customer account managers representing one high-priority Customer.

 

TESTING

Cognitive walkthrough: Key findings

Customer account managers expect a specific individual to be the owner of the response planning process for the customer. They expect this individual customer role to be responsible for communicating updates regarding services incidents both internally, to response team members, and externally to the software company incident management department.

People

The customer role that owns response planning is not the same customer role that would own workload creation and management.

Customer account managers update and review response plans with their customers every time the customer adds an application to their workload. This leads to the PPT documents becoming quickly outdated.

Platform

Customers don’t understand the importance and criticality of the response plan document.

Process

Internal proto-personas

LISTENING

Customer proto-personas

LISTENING

TESTING

Recommendations

Make response plan onboarding and training material more organized and suitable for customer con

People

Make response plan training material comprehensive; to include all related processes and subprocesses.

Allow customer users to gather and display all necessary customer contacts and distribution list addresses in the response plan UI.

Make response plans dynamically update in response to customer workload subscriptions, so that customer account managers don’t have to update anything when a customer removes or adds an application to their workload.

Process

Feedback review

LISTENING

Lessons learned

  • Rally project partners early

  • Benchmarking legacy customer experience

  • Need to map customer and internal user journeys

  • Advocate for better customer experience

  • Be on the lookout for teams like this one:

    • Growth mindset oriented

    • Empathetic

    • Passionate

    • Proactive

    • Brilliant!