⬅️ Fudge Sunday - Cloud in Public: Engineering SLO 🧭 Fudge Sunday - Cloud in Public: Mean Time To RCA ➡️
Fudge Sunday - Cloud in Public: DevCommsOps
by Jay CuthrellShare and discuss on LinkedIn or HN
This week we continue to take a look at public things for a public cloud.
☁️✅⚠️🛑
This issue is part 3 of a 5 part series
- Fudge Sunday - Cloud in Public: Status Dashboards
- Fudge Sunday - Cloud in Public: Engineering SLO
- Fudge Sunday - Cloud in Public: DevCommsOps
- Fudge Sunday - Cloud in Public: Mean Time To RCA
- Fudge Sunday - Cloud in Public: Impact Mapping
When I wrote about The Perfect Team, I summarized it as one to do it, write it down, and think ahead. We now have a historical perspective and definitions for the status dashboards and the Engineering SLO. Next, let’s talk about how “write it down” can be expressed as various forms of communication in DevOps cultures.
DevCommsOps is best described as a purposeful insertion of change management communications within a DevOps culture and conspicuously expressing change management communications. To unpack that neologism a bit, imagine things we want (need?) to know relating to change that is planned, achieved, deferred, failed, and resulting in an outcome.
Recall that Error Budgets, Uptime, and SLO are simply a way to describe the operational objectives to stay up and running balanced with the innovation demands for developing new features, functionality, and availability for services. As such, DevCommsOps provides a consistent and conspicuous account for the changes planned, taking place, and completed that draw against Error Budgets.
Is DevCommsOps a word soup for Changelog, Release Notes, and Error Budget tracking? Perhaps! In practice, much like the growing depth of status dashboards, a single Changelog is more symbolic than practical as a single page to follow all change.
Is DevCommsOps a word soup for a post-ChatOps world within the context of Error Budgets economic policy? Perhaps! However, ChatOps definitions are likely going to vary from vendor to vendor to practitioner pioneers.
Luckily, there’s always a cat meme ready to help us better understand.
Vive La ChatOps!
Image rare d’une reproduction de la pyramide de Chatops https://t.co/KLEYyqyTkL
DevCommsOps in practice
- Who do cloud companies send “write it down”? Public? Personalized?
- What do cloud companies “write it down”?
- Where do cloud companies “write it down”?
- When do cloud companies “write it down”?
- Why do cloud companies “write it down”?
Let’s take 1-3 in this issue and leave 4-5 for our following issues in the series.
To provide examples, let’s examine where DevCommsOps is found within the hyperscale cloud service providers today using a basic search for “Release Notes,” “Changelog,” “Notices / Maintenance / Announcements,” and “Root Cause Analyses (RCAs) / Incidents.” The list is in no particular order or weighting other than shorter names to longer names.
IBM Cloud
- “release notes” = 224 hits
- “changelog” = 81 hits
- “maintenance” = 9 hits
- “announcement” = 25 hits (round robin) since July 23, 2021
- “incidents” = 9 hits (as PDFs)
Alibaba Cloud
- “release notes” = 51 hits
- “changelog” = 9 hits
- “notices” = 420 hits
Microsoft Azure
- “release notes” = 726 hits
- “changelog” = 46
- “RCAs” = 7 pages going back to November 20, 2019
Amazon Web Services
- “release notes” = ??? (100s? 1000s?)
- “changelog” = ??? (100s? 1000s?)
- “post event summaries” = 14 RCAs for major service events
Google Cloud Platform
- “release notes” = ??? (100s? 1000s?)
- “changelog” = ??? (100s? 1000s?)
- “incidents” = 140 over a 12 months period (round robin)
Oracle Cloud Infrastructure
- “release notes” = 1118
- “changelog” = 6
- “incidents” = 34 over a 3 month period paginated
Notes:
- As of this brief exercise, the only hyperscale cloud service provider that appears to have a “single page” approach to Release Notes and Changelog is Oracle Cloud Infrastructure.
- Compared to AWS’s use of the term major, Google Cloud Platform “incidents,” Oracle Cloud Infrastructure “incidents,” and Microsoft Azure RCAs are more granular and historically accessible IMHO.
- OCI Status appears to be using Atlassian Statuspage.
- IBM Cloud publishes incident reports as PDFs.
While there are variations amongst the hyperscalers in expressing DevCommsOps, it is essential to consider personalization less transparent to public perspectives. Personalization is outside of the examples above because these are not public representations.
At the same time, personalized views are unique to the customer experience, which is a topic for our next issue related to time to published communications and dependency mapping.
At this point, we have established definitions for status dashboards and the Engineering SLO set against the backdrop of communications of DevOps culture in the form of DevCommsOps. Now we have a baseline to look at for comparison against timing and dependencies.
In the remaining two issues of the series, we will examine the time involved in publishing “Root Cause Analyses (RCAs) / Incidents” and dependency mapping value. We will also look at the increasing importance of dependency mapping for the future. The answers to “When and Why” from questions 4-5 above are coming soon.
Stay tuned!
Disclosure
I am linking to my disclosure.
Topics:
✍️ 🤓 Edit on Github 🐙 ✍️
⬅️ Previously: Fudge Sunday - Cloud in Public: Engineering SLO
➡️ Next: Fudge Sunday - Cloud in Public: Mean Time To RCA
Share and discuss on LinkedIn or HN
-
Get Fudge Factor each week