Overview
Neel talks with Laduram about SRE, the evolution of system visibility spanning our careers and how to actually quantify toil.
What We Talk About
- Defining SRE as “Ops with Boundaries”
- WhatsUp Gold
- Has System Visibility Increased Over Time?
- Anti-Pattern: SRE as Keepers of Observability
- SRE Education at the School of Hard Knocks
- Reflecting on 15 Years of On-Call
- Holidays On-Call
- Tracking and Quantifying Toil
- Should Engineering Managers Be On-Call???
LD’s Recommended Reads
“So you have sources of toil that are hitting your team, hitting your engineers, tapping all the time and productivity. What do you do in the beginning? As with anything in DevOps and SRE, we need the data to back up our decision making. So we need to track our time. I’m not saying that we track our time writing code, but everything that is toilsome, we’re tracking our time on it.“
Transcript
Paige Cruz: Hi there, listeners. Before we dive into today’s episode, a quick note. Due to some technical hiccups with our previous recording platform, this is actually the third take of this amazing conversation with my guest today, Amin. So you might hear us referencing earlier attempts or past conversations, and now you know why.
Thanks for sticking with us, and let’s jump in.
Today’s topic is near and dear to my heart, Site Reliability Engineering, or SRE, and I am so delighted to be joined by Amin Astaneh, who runs Certo Modo, and SRE and DevOps Consultancy. Hello, hello! How’s it going today?
Amin Astaneh: It’s good to be here Paige. Yeah, thanks for having me on. Try number three.
Defining SRE as “Ops with Boundaries”
Paige Cruz: I want to start us off with the basics. What exactly is SRE? Because before, you mentioned this phrase that has been rolling around in my head ever since, that “SRE is Ops With Boundaries”. What does that mean to you?
Amin Astaneh: So let’s go all the way back to the beginning of the literature. SRE is what happens when you ask a software engineer to design an operations function.


