Observability in Mega-Scale Banking with Greg Parker

Released Thursday, 21st February 2019

Good episode? Give it some love!

Observability in Mega-Scale Banking with Greg Parker

Thursday, 21st February 2019

Good episode? Give it some love!

Rate Episode

About the Guest

Greg established and leads the Enterprise Monitoring Services team at Standard Chartered Bank, and together with his team wrote and implemented a strategy and approach to effectively monitor and leverage data from over 1,000 applications, 30,000 servers, 15,000 network devices, public and private cloud, mainframe, tandem, and multiple other technologies in a sustainable and scalable way. Applying Agile and DevOps techniques to the build, engineering, and support of the monitoring ecosystem at Standard Chartered, the team brought together tools across the technology stack and advocated techniques such as monitoring as code in order to improve monitoring quality and make it a mandatory part of the deployment pipeline.

Prior to that he worked at Barclays Capital in Singapore and Goldman Sachs in Tokyo, Japan in various infrastructure and engineering roles.

Links Referenced:

Connect with Greg on LinkedIn

Transcript

Mike Julian: Running infrastructure at scale is hard, it's messy, it's complicated, and it has a tendency to go sideways in the middle of the night. Rather than talk about the idealized versions of things, we're going to talk about the rough edges. We're going to talk about what it's really like running infrastructure at scale. Welcome to the Real World DevOps podcast. I'm your host, Mike Julian, editor and analyst for Monitoring Weekly and author of O’Reilly's Practical Monitoring.

Mike Julian: This episode is sponsored by the lovely folks at InfluxData. If you're listening to this podcast, you're probably also interested in better monitoring tools — and that's where Influx comes in. Personally, I'm a huge fan of their products, and I often recommend them to my own clients. You're probably familiar with their time series database, InfluxDB, but you may not be as familiar with their other tools. Telegraph for metrics collection from systems, coronagraph for visualization and capacitor for real-time streaming. All of this is available as open source, and they also have a hosted commercial version too. You can check all of this out at influxdata.com.

Mike Julian: Hi folks. Welcome to the Real World DevOps podcast. I'm here with Greg Parker, head of enterprise monitoring services at Standard Chartered Bank, way out in Singapore. Welcome to the show Greg.

Greg Parker: Thanks, Mike. I'm doing well. How are you doing?

Mike Julian: I'm doing fantastic. So Standard Chartered Bank, like what is this? It sounds like just a bank, but I've been talking to you about it and it sounds like it's a whole lot bigger than I imagined.

Greg Parker: Well, Standard Chartered operates across 70 countries. There's more than 1200 branches, there's 90,000 employees, and it's just a sprawling financial institution, but it's primarily operating in Africa, Middle East, a lot of emerging markets, and the headquarters for IT is in Singapore, though the bank is headquartered in London. And so out of Singapore we drive the technology strategy and across all of the markets over 70 countries. And we get a lot of diversity in our environment because of the different strategies that we have in each country. Coming from, I was working for Goldman Sachs for about ten years, where IT was very tightly controlled from the center from New York where, the word came down from the heavens around this is how you're going to do everything. And then I went to Barclays, which was a similar model except the word came down from London, and at Standard Chartered it was really Singapore saying, this is what we should be doing and this is how we operated for our group owned applications, but there were 70 other countries saying, this is how we have to do it in Nigeria, and this is how we have to do it Kenya, this is how you have to do it in Pakistan. And so you have all of those issues creep up when you're working across emerging markets and especially in a financial.

Mike Julian: What's your role in Standard Chartered?

Greg Parker: So my role at Standard Chartered is to run enterprise monitoring. And, it wasn't my original role. I came in to drive some infrastructure projects, large infrastructure projects, and when I got there, I saw that monitoring was essentially chaos. There was really no central strategy around how we're going to do it. And when I worked with some people there and we effectively established a central enterprise monitoring organization for Standard Chartered, the problem was there was no central strategy or tool set, or group of tools that we were using for monitoring, and there were multiple vendor deals, negotiated at different prices at different times with different countries. And so there's a lot of inefficiencies that were contributing to massive, MTTRs. Which meant, when an issue occurred, a thousand different teams got an alert, nobody know whose fault it was, and it took all this time to work out, what's the root cause and how are we going to resolve it? And I think a lot of that comes down to, the fact monitoring wasn't precise.

Mike Julian: And I'm sure in no small part do two countries not be able to talk to each other.

Greg Parker: Certain countries couldn't talk to each other and other countries just didn't know to talk to us. And so there was a lot of people working in silos.

Mike Julian: How does your strategy even look when you have all these different entities that are doing their own thing, and like culturally you're not able to say, “This is what we're going to do.” So what's your approach instead?

Greg Parker: Well, we do have the authority to dictate if we want to. And that's one of the things that came along with establishing this central organization which is backed by the CTO and the head of technology services, which is going to say, our mandate is to go out and fix monitoring for SCB. But at the same time it's not something that we want to do, is to just give people a mandate. And I've been saying this the whole time is that our strategy is not perfect and it's never going to be perfect. Our strategy is focused around corralling all the data that's out there, and translating it, and enriching it, and normalizing it, and then exposing it through APIs. And that's really the crux of it. But, it's never gonna be perfect, and our focus was to just implement a working framework, and help to improve monitoring so that we can reduce those mean time to resolve and mean time to detect times, and just give generally a better sense of observability for the company. So we have a sense like that we know what's going on.

Mike Julian: Why don't we talk more about the strategy behind what you're doing. Like what does it actually look like? How did you come to this strategy? How are you implementing it? What is it even?

Greg Parker: So we started with multiple teams that had implemented their own monitoring with the tools that they wanted to work with. We have BMC, deployed across all of the infrastructure. A lot of the application teams had purchased ITRS. There were other tools like AppDynamics and Dynatrace and some open source tools out there with Grafana and Elastic and all of that. And so my first thought was, we're not going to standardize everybody to one tool, and there's not one tool that's going to, be this pan...

Rate