Middleware Observability Platform | Cloud-native Monitoring

Overview

Aniket Kanade – he has 20+ years of experience in the Telco industry and has been closely working with cloud native tools and system architecture. He talks with Sri Krishna about his experience in tech, especially around observability space.

What We Talk About

Aniket’s journey from the early internet era to the present day.
Evolution of the technology sector, from bare metal servers to AGI.
Focus on the telecom industry’s transformation and adoption of cloud-native technologies.
The importance and challenges of observability in modern tech.
Future trends in observability, including AI and OpenTelemetry.

Transcript

Sri: Hello everyone and welcome to another podcast from Olly Talks.

Today we have with us Ankit who is a senior DevOps Technologist at HP Enterprise and has been a serious technology professional for the last two decades having done several amazing achievements in this area and of course he is also an active member of the CNCF community and has achieved several things within the facets of the cloud-native industry and Kubernetes.

So welcome to the podcast Aniket, how are you?

Aniket: I’m good. Thanks to Krishna. Thanks for the introduction and glad to be here.

Sri: Absolutely Ankit, absolutely.

So you know I was just going through your profile and you know you started out you know extremely early, you know probably during the internet boom in the early 2000s. So could you just walk us through your journey and how you know you’ve seen the technology sector overall go from what it was when the internet was just a big thing to what it is today when we are heading towards a path of you know someday AGI.

Aniket: Yeah, sure.

So as you mentioned I go back a long way when I started the first decade of this millennium around in 2001-2002 it was completely different than what we see today.

I don’t even so probably if you could rewind from where we are today Dockerized world containerized world before Docker there was virtualized world before virtualized was there was bare metals and supercomputers.

So when I started, you know we used to have all our products and I have been a technical telecom professional for across these two decades so that has been a mainstay.

I have worked on different telecom technologies when I started working it was it was you know beginning of the 3G and we started hearing about GPRS probably you don’t even know that edge and GPRS and everybody was happy that there is internet on your phone.

So that’s when I started and I started with a computer or a bare metal server called HP non-stop another word for that is tandem a lot of real-time systems at that time were hosted on tandem and like stock markets and telecom where you need you need a real time information low and the latency needs to be really low, and the availability really needs to be very high reliability needs to be there.

So at that time we started with our products they were on HP non-stop then around maybe towards the end of 2008 and 2009 at that time we started seeing the emergence of virtualized worlds. So instead of having things on the bare metal, we now have them on VMs and then that was also the time when entire world converged on Linux as an operating system for the back end. So till the time there was everything and non-stop had its own operating system but that was the time when we we moved to Linux and we moved to the virtualized virtualized deployment of our products then towards 2015 we all know that Kubernetes came into existence in 2014 and Docker maybe few years before that.

So everybody had started thinking about containerizing their workloads and maybe around 2016, 17, 18 during that period in my company and in our product also the discussions and work started towards hosting things on Docker.

I would say that the impetus for going to containerized platform was Kubernetes more than Docker.

So when there was only Docker it wasn’t that you know the discussions did not gain momentum that yeah let’s put everything on the containers.

It happened with Kubernetes because Kubernetes provided the framework the availability framework the reliability framework the the things that industry wants just one container doesn’t work.

So that the orchestration that Kubernetes provided and everybody leaped on it and that’s when we also deployed our products into cloud native and at the same time I’ll tell you why this transformation happened especially in telecom and for us world was moving from 4G to 5G and 4G to 5G is not just a change of spectrum in telecom 4G to 5G was a change in the philosophy the way things are deployed.

So even 3GPP which is a standards committee that comes up with telecom standards they came up with this CNF terminology containerized network functions and they came up with the idea of different components fitting together on a so the the framework they developed was perfect for something to be deployed on Kubernetes type of cloud native.

So the idea was even the standards from the standards point of view the committee wanted all the telecom operator to go towards cloud native. And that’s when we also had our products developed especially the 5G products we have exclusively on Kubernetes and cloud native whereas our 4G products are still in transition because they got developed earlier they still work on virtualized as well as containerized workloads but our 5G products are fully on containerized workload.

So I started personally started working on cloud native platforms say around 2018-19 5 years ago and since then I’m working as right now I’m working as a delivery lead more on the operations and that side for our telecom customer in US.

Awesome that was a good trip down now memory lane to a point where I was still in my schooling.

So when we actually look at actually when I started that that was a time when you could actually see the servers running your workload like right now with virtualized and containerized world you don’t see where your code is running but we could see that that’s when I started.

I mean that kind of makes me curious right like nowadays you know DevOps or you know getting into data science and ever ever since a lot of these roles have got that you know massive hype around it I can see why a lot of the younger generation has that kind of keen interest to get into this considering it’s well paid and all but back in the early 2000s what kind of inspired you to get into this line of work especially within an industry like telecom.

So I don’t think I gave a lot of thought to it at that time basically and I did not start with telecom I started when I started working I started working at Bombay stock exchange that’s when I got introduced to the idea of servers that’s where I got introduced to Tandem the non-stop platform that I told you about so stock exchange used to run everything on non-stop at that time and we could when we used to work there we could see servers server room was behind us and we were working on the consoles.

So at that time I think I got fascinated by the idea of tandem supercomputers and then HP was developing things on tandem HP was having its telecom products and then I moved to HP and then I think I just continued there with building my expertise in telecom. But definitely the world was very different at that time.

Sri: Right right and as of today what would you say you know comes under your areas of responsibility for me.

Aniket: So for today we have I told you about we work for a customer in US and that customer has our products installed on several clusters the way it works there is because of security reasons and you know big enterprises are always scared of security breaches they operate everything in disconnected mode and disconnected mode is no connection of internet so you have to just do offline installs.

So my role mainly is to manage the platform everything about the development happens in R&D centers my role is to deploy it on different platforms in the lab test it in the lab making sure that everything happens smoothly in the lab and then gets deployed in production. If something goes wrong in the production then handling it so I have that overall supervisory role of here in deployments and delivery as a delivery lead for the project.

Sri: Right right and within your role what would you say is perhaps some of the most challenging scenarios or projects that you have come across and how did you exactly try to address it.

Aniket: So we saw a lot of transformation even within the cloud native space when we started and I did not start as a leader to begin with I joined the project and at that time someone else was leading the engagement and it was on we had on-prem Kubernetes so we used to install our own Kubernetes and manage it but then we realized that managing Kubernetes on our own especially in the offline mode where you have to keep it up to date with the newer versions you have to keep doing it in all the clusters then you become responsible for anything that goes wrong at the cluster level.

We as an organization decided to go to a managed Kubernetes service that is OpenShift Red Hat OpenShift so that was our first transition from vanilla self-managed Kubernetes to going on OpenShift but then we had different set of challenges because OpenShift is designed.

I would say it designed to work in online mode where you have a connectivity to the Red Hat registry it downloads all the images on your registry and installs it from there in the disconnected mode you need to mirror everything. So OpenShift has its own opinionated framework and everything that is needed also OpenShift is the one that came up with this operator concept for the first time everything in OpenShift used to be CRDs and CRs so having to download everything gigabytes of images on your on-prem on your hard drive then creating a registry out of it then installing it on different clusters managing it that was our first challenge but we successfully migrated from self-managed Kubernetes to OpenShift and then at a later point when everything was stabilized on OpenShift maybe for two two and a half years we got used to doing things in OpenShift in offline manner mirror the components that you need and install them customer started this initiative of moving to hybrid cloud model and that was done on Azure.

So the way Azure Hybrid Cloud works is you have the data centers on-prem but Azure they are managed from a connectivity you would say a link between the big Azure or the public cloud and the on-prem data center there is a link that connects it so everything like in public cloud Azure what you do is you go to portal.azure.com you create the cluster you download the kube config file you manage it similarly there is portal.azure.com linkage except that cluster that you’re connecting to is not hosted on Microsoft data center it is hosted on-prem at the customer side and so that was a compromise that our customer made in termsof their philosophy that they don’t want any internet connectivity so then the challenge was we had to abandon everything on OpenShift because OpenShift has its own framework and way of doing things that we cannot port it into because here Azure was giving us Kubernetes cluster which is completely different than the Kubernetes cluster that OpenShift gave us and then we had to reinvent the wheel and do things differently for Azure it’s called Azure operator Nexus AON for AON and then I was leading that venture when we moved from OpenShift to AON and I continued to be leading that.

Sri: Interesting. Aniket and now you know kind of getting to the crux of you know this conversation that is where you see you know the whole concept of observability fitting into your you know line of work or your industry on the whole we can we can start out with your industry in general and then you know trickle down to you know your specific you know clientele projects.

Aniket: So I think observability for telecom matters a lot actually speaking even when there was no the term observability wasn’t there or Prometheus wasn’t there telecom operators used to have something called NOC network operating center I think it was network observation center or network operation center I think network operation center so I have visited NOC in the reliance telecommunications and it’s huge there will be a huge room and a lot of screens on the wall and people would be watching all the because they have as a telecom operator you have several parts like if you take the example of reliance communications there would be various regions of India various components deployed in the regions of India and they want the central view of everything that’s happening in the network across the country so those screens used to display pain points or the display the latency or number of calls all sorts of things that matter to the telecom that was the earliest the the earliest version of observability I had known now we want to take all this and pack it into one screen in the cloud native world of observability

I mean it’s not going to be easy but the challenges in telecom and you know the observability matters a lot for every telecom operator they need to be able to see if the network is functioning properly or not in the every corner of their deployment sitting centrally somewhere and also may be potentially forecast the ability ability to forecast if some problems are likely to happen somewhere that that that’s the crux of observability for telecom as a as an industry right and I think it applies similarly to even my customer and where we are working on although I would say that if there is this this philosophy that we want everything disconnected or air gap from the internet that becomes a an impediment that’s becomes a roadblock for having a good observability because cloud observability and the things that you could do when you put your data on big cloud is different than what you can do on-prem that’s my personal opinion.

Sri: Right but you know more on that that area and what would you say are perhaps the biggest challenges to achieve observability within your line of work what would you say are the biggest roadblocks that you see from all these years even to the part towards full stack observability.

Aniket: So I would say the challenge for us in the project has been always the changing landscape of the products that we use in observability so because of cloud native space is so evolving you don’t have you know these are the products and these are the things that you use for observability because we start using something and in some time that that goes outdated or gets superseded or industry moves on to something else.

I’ll give you an example so when we started and all of us we started working as a logging framework I think everybody used elastic search kibana now we started with elastic search kibana we used and we deployed things on elastic search kibana suddenly elastic search goes I think it it is not an open source anymore and then kibana is not favored by anybody and then we are moving to graphana look so that way I think what happens is when you have such evolving landscape you have to keep adopting to different products that are coming up and hopefully grafana loki stays stable for a while but that’s the current migration we are working on moving from kibana to the graphana loki kind of a framework.

Even open telemetry I would say you know this whole emergence of open telemetry maybe somebody saw in the industry also failed the need for standardizing things instead of people going from product to product and then having to change everything. Open telemetry probably we have started moving to open telemetry so once we go to open telemetry maybe things will stabilize there.

Sri: Right right but you know that kind of you know the whole idea of migrating from kibana to grafana that kind of brings about a really good question that I often see leaders kind of breaking their heads about which is would you rather choose you know a host of tools like quite a few tools that are good at you know one specific use case or go with you know one proper enterprise tool that can do it all what would be an ideal approach depending on you know the Circumstances?

Aniket: I don’t think one enterprise tool that does it all probably will fit with the cloud native philosophy even if that is will be a good thing to have it’s like because one tool that does it all will gives you gives you you know vendor lock-in probably the vendor that has developed it and then when vendor lock-in comes in you don’t have the ability to control what features you want and you know you are driven by the vendor rather than you driving the vendor.

So having different tools still should be fine having different tools for different things and just that they should be you know we should and probably we will achieve a level of stability that’s why I think the framework like OpenTelemetry is the best idea because it it is not a tool but it is just having a layer to distribute it in more uniform way kind of defining a protocol for observability.

So I don’t think one tool one tool that does everything for an enterprise and especially if that tool works on top of OpenTelemetry it’s fine but if that has very customized interfaces that might not be such a great idea.

Sri: Right right right and in your line of expertise you know what would you say are some of the best practices for observability for somebody who’s just starting out their observability journey right now and kind of buying into the hype of the whole thing where should they start?

Aniket: I think observability is so huge you know it’s like I’m sure that whoever wants to start withobservability should start with OpenTelemetry right now trying to learn all the pieces and nutsand bolts of OpenTelemetry and the way it works but and OpenTelemetry especially because OpenTelemetry has adopted Prometheus framework for metrics so because Prometheus is so pervasive that you cannot You cannot have something else for metrics instead of Prometheus and OpenTelemetry, frankly speaking right now is mainly for the tracing so I guess yeah that is a good place to start learning about the way if you are interested in tracing how spans work and how different layers of spanning is there in OpenTelemetry in tracing then how metrics are stored in Prometheus and how they’re visualized in Grafana these are standard places to start with Grafana Prometheus Jeager and tracing.

Sri: Right right and how should leaders go about this say they do have a specific use case and they’re not new to the concept of observability but every now and then you have you know a newer buzzword that keeps cropping in even within observability itself you know there was LLM observability AI now AI powered everything is there so when when when leaders kind of try to venture into all this how should they go about you know analyzing whether this is a solution that they need or this is kind of you know a kind of a hype that they’re just trying to buy into and not kind of have a fomo effect.

Aniket: I don’t think, honestly I don’t think AI observability is just hype it’s probably real and observability is something where AI has a lot of application if you could analyze, you could train the model based on the data and now if we come back to the telecom questions the way telecom is about information flow you switch on the phone a message gets sent to an entity called AMF that registers the phone into the network there is a radio ran AI ran AI ran is also a buzzword in telecom now so if you could feed all the telecom data that’s there collected over the years and feed it to a model the ability of AI to forecast things or problems before they happen they are going to be really useful for telecom operators or that applies to any industry in general so observability AI in observability I don’t think is definitely is an hype is a it will it will evolve it will mature over the time but definitely observability is going the AI way and they they are they fit together very well.

Sri: Right but the kind of question I was getting to Aniketh is as things stand you know most enterprise leaders want to hop on to the next cool thing that is coming into the market and observability you know if you go the enterprise route is very expensive yes like any like any like vast as per recent report people are shelling out somewhere about 500k to one milionl dollars annually on observability so while that is not a cup of tea for startups probably they they take open source route or you know they kind of dabble around with smaller players the bigger players kind of you know they’re spending a lot so how should they go about analyzing okay fine there are newer platforms that are expensive and you know they offer AI observability but do I need it in the first place.

Aniket: I am I think we are making a good point here that every every enterprise there are two questions do you need it in first place and has it evolved enough to be useful to you right now so I think questions to those answering those questions is crucial for leaders and even if you think as a company that you need AI-based observability you still need to evaluate that has it evolved to a point where you can actually make use of it or you would just have a line in your annual report saying that we have implemented of AI observability and it gives you no insights and you just don’t know how to use it probably even if it gives you insight so an organization needs to take a call on whether they need AI observability and are they ready for it and then see how that fits into their cost model and take a call on it but jumping the train because everyone else says this definitely does not make sense not definitely not for small and medium scale enterprises and even for large enterprises I think they need to when they are they have the budget they know they have the need AI observability The need is clear they have the budget still they don’t have the manpower or the skill sets that they currently have to be able to make use of those AI observability things so they need to work on the re-skilling or hiring of external agencies who could make use of that AI observability and give them meaningful insights

Sri: Right and again and you know at the at the grand scheme of things when you when you converse with you know industry leaders or peers have you come across any common misconceptions that people you know often have about observability um I would say like for instance like just now what you mentioned about you know having the manpower so I know a lot of observability you know tools out there they kind of go about telling that we make your developers life easier we we can make it you know make make them you know debug faster troubleshoot faster that kind of you know we’ll help you with resource utilization or reallocation and things like that but that doesn’t undercut the fact that you need a lot of developers to start things off and neither does it mean that once you have it developers will become replaceable or you know you don’t need them to look into this and the machine can do something on its own so like this are there any other common misconceptions that you’ve come across something that people go terribly wrong while implementing observability.

Aniket: Now I don’t think that type of misconceptions but I definitely feel that people have you know a lot of enterprise customers and they don’t like things in open source like they have this idea that open source means if something goes wrong there is nobody to look after that which is not true for especially after evolution of cloud native that you have to use open source products you can build things on top of it but you cannot say that I don’t want to use open source products because they are not supported and then run after to find somebody to support an open source component which is sometimes not necessary I mean some of the open source projects which are very well community supported there should be and I understand that the concern comes from vulnerabilities like security point of view if there are CVs found out you need a fix for them and but I have seen that most open source products which are quite active CVs get passed very efficiently and in fact if you have the same open source component supported by an some company and instead of taking the supported version if you take the open source version the open source version comes up with the CV fix faster than the supported version so I think that misconception in large enterprises that everything in open source needs to be supported that is probably costing them a lot and also stopping them from innovating faster right and that applies generally not just for observability and you know like in observability I don’t know probably everybody’s doing that that you always use some open source components underneath and you write do things on top of them right and you expect those open source things to work flawlessly because they do.

Sri: Absolutely and you know there is another trend that I see growing too like while you know the common barriers come is either we go the open source route or the managed route but now you have that new idea of build your own observability also like people try as trying to you know develop a capacities in-house what are your thoughts on that?

Aniket: So build your own observability I am not so sure it fits for it is a right approach for everybody because it’s very involved domain and if you want to build everything by yourself then you will be doing that instead of your main business which is something else so I don’t think build your own observability is that probably it works for those who are core technology companies but for others who are into some other domain for them somebody has to build the observability they must be able to use that observability for their need.

Sri: But what would be your solution if somebody is considering all three options what is the best route that they should?Aniket: It depends I think it depends on what type of company is it and what type of skill set and manpower they have and but I would still not count on build your own if unless you’re an observability company but build your own observability has more likelihood of failure in my books than success.

Sri: right right right and now looking into probably you know 20 25 and beyond where do you see the industry going you know the use cases for observability within telco the the overall observability industry where do you see it headed AI is definitely one of the things so but apart from that what other things are you seeing emerging or becoming a new trend.

Aniket: Well as I I even this year’s Kubecon I think I was looking at the stalls and many of them were about observability I think a lot of work is happening around observability because everybody wants to see what’s going on in their in their clusters in their software how things are helping them how what are the pain areas and everything so AI based observability and forecasting based on based on LLMs that I think is definitely the next big thing for observability and generally for the industry and I think at some point another trend which is quite evident is this open telemetry will become more and more that the optimality of open telemetry will grow and currently even though we it has a lot of things.

I probably don’t know how rich the open telemetry SDK is already but in at least a cloud native observability open telemetry has a long way to go in terms of having more framework support maybe everybody uses it for tracing right now but there should be an adoption for logging and metrics going forward and yeah I guess that’s definitely a trend then that adoption of open telemetry and products which are working on top of an evolution of overall ecosystem around open telemetry.

Sri: Awesome and I think with that, I have everything that I need thank you so much today for your time it’s been you know lovely talking to you and you know taking this journey with you

Aniket: Thank you, see Krishna yeah it was it was pleasure to be here talking to you.

Sri: Absolutely thank you absolutely and thank you all for joining in into this podcast and for more, please do subscribe to all the observability talks.

Observability Unplugged with Aniket

Overview

What We Talk About

Transcript

미들웨어로 더 많이 최적화하고 덜 걱정하세요

비교

보안

Observability Unplugged with Aniket

Guest

Host

Overview

What We Talk About

Transcript

미들웨어로 더 많이 최적화하고 덜 걱정하세요