Forecasting AI Demand at Microsoft | Sajay Suresh | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Paws at Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I would love to introduce our featured leader today, Sajay Suresh, Senior Director of Data and Applied Science at Microsoft. Sajay, how are you today? Can you tell us a little bit about you, what you do, and what you like to do for fun?

Absolutely. Thank you, Libby, and thank you, Libby, Rachel, and Pawsit for having me on this forum. Can I say one thing before I get into it? I love the community vibe over here. It's different than most talks or conversations I've attended. I feel like people are a lot more comfortable over here, so congratulations on building an amazing community.

Yeah, so a little bit about me. I'm at Microsoft. I run an Applied Science function at Microsoft. My team essentially runs cloud infrastructure planning. So we are a bunch of data scientists and software engineers who create forecasting models for cloud and AI and figure out how to get the data center infrastructure for that in place. So I'm hoping it's touched your lives in some way or the other. If you have ever used a Microsoft product, be it Xbox or Windows or Azure, AI, whatever you have used, hopefully my team has had some role to play in it.

Background and career journey

You know, my career started around 15 years ago, right? It was a company called Mu Sigma, which was trying something called data science at the time. Data science wasn't really obvious as an industry. It would be surprising for a lot of folks over here, for y'all younger in your career, but data science, it wasn't really clear, but there's going to be a niche small industry or a really big industry that every company needs. So it started back in Mu Sigma 15 years ago. I spent two to three years with them where I myself got introduced to the idea of data science and how decisions are taken using data.

Post which I moved to BCG and BCG was interesting because BCG is a Boston consulting group. As most of you may know, it's a big management consulting firm, but data science was interesting for them, right? It's not the most obvious place for a data scientist to be 10 years ago, but that is exactly the reason I joined them because they were trying to build out their data science practice. What they had was data science as a small service with them, but the idea was to bring folks who are well aware with data science and consulting specifically to build out their data science practice. So it was amazing experience at BCG to not just, you know, be with fortune 500 companies trying to solve strategy problems, but also have like a startup within a large company and try and build a startup out.

So now if a lot of you folks are aware, it's called BCG X. I think it's rebranded itself from BCG MR to BCG X and it is a separate entity within BCG separate from the business management consulting. And that's when I consulted for Microsoft and saw that there was an interesting role in tech, which was around the data center supply chain. So seven years ago, I switched over to Microsoft.

As a data scientist, we were a two member team trying to figure out how to even plan for data center supply chains. Just to give you folks a sense of what a data center supply chain is, right? Imagine if say I'm in Seattle, right? And imagine if I want a new data center in Seattle and that's what I need if I need to deliver cloud and AI, right? It would take me three to four years to get a data center life. So my team essentially is trying to predict what the world would look like three to four years down the line so that we can make the investments today and Microsoft can capitalize on opportunities for their customers at that point of time. So when you see AI being big today, I bet I have caught it three years ago. Otherwise, we don't stand a chance of serving it. And as most things in forecasting, I kind of caught it and kind of didn't catch it. That's what happens in forecasting.

Evolving as a data scientist at Microsoft

The one thing I will say, it's tougher to be a data scientist, early state data scientist at Microsoft now than it was when I joined. I think data science and that is not just specific to Microsoft. I just think there's been an increased maturity in the expectation out of a data scientist over the last few years, which has developed. That means I think it's tougher to get into Microsoft, but the whole industry is that.

The one tip I would give myself if I joined Microsoft again was move from consulting to product very quickly within a tech company. So when you're in a tech company, we love software. We love systems. We love data products. We like consultants. Don't get me wrong. We like consultants. But what the tech companies are used to are system set up to run processes. They like to bring in consultants from the outside. So for me to succeed, it took me, I think for me and my team, I'd say to succeed, it took us a couple of years to figure out that we may be a talented set of data scientists, but for us to consistently add value to Microsoft, we need software and systems and we need to make it a data product that people can rely on and can plug into other processes.

The one tip I would give myself if I joined Microsoft again was move from consulting to product very quickly within a tech company.

Tech stack and deployment

You know, finally, when I started off coding, it was SAS. Maybe a lot of people, you folks right now don't use SAS, but that was a prevalent technology. And then I spent a lot of time in R, right? I love R. I'm a big fan. So I try to find creative ways to do things in R, even though they can be done better in Python. I'm like, nah, you know, I can do this better. But I know heart of my heart, there's some processes that Python is just better off. So I'd say R and Python are my primary go-tos.

We have a lot of statisticians, they're big on R. So we are a R shop too. Python is, of course, something you just have to have as a data science shop, right, with all the packages available there. So Python's huge. C Sharp is another engine that we use pretty heavily for the optimization part of algorithms. That does exceedingly well when you really know what kind of algorithm you want and you want to write custom algorithms. I think Python is really good when you want to use off-the-shelf algorithms. But if you have custom algorithm that you know is right for your business and is different from those traditional packages, think about C Sharp engine because of how much flexibility it gives you in setting up the algorithm.

One of the biggest sales for me for Posit, just using Posit Connect on my team, is that my data scientists have now become developers because they can deploy code and I don't need a separate dev team to go productionalize the code. One of the problems we had maybe like four or five years ago when we had separate dev teams was that a data scientist used to develop the models, including me at the time. We used to then send it to the dev team to go to production, which used to take another two to three months. And by that time, the model we wanted had changed. So that's where I think Posit Connect is a big enabler to convert my data scientists into developers and save me a lot of dev time and value that way.

Forecasting AI demand and data center supply chains

Great question. I'll tell you, I think data centers were always talked about before AI, but they're talked about way more right now as the enabler in the world of AI, because of a couple of things. One, more power requirements, absolutely. But also that, you know, if you think about large training sites for these LLM models, they need to be contiguous. They need to be in one place so that all the GPUs can learn together, run together, train the model, and they slide down together.

So back in, to just take you back in time, like around 2021, we did have something called OpenAI in our systems already, right, training their models. But we had no idea how big it would be. That was a very closely guarded project within Microsoft, including my team didn't know what it was other than being a business project. When we saw AI on what potential, and this, I think that one of the first applications was the image application they had, right, Dolly, which came out, which is publicly available. And I saw that and I'm like, wow, okay, this is game changer, right?

And it was really difficult. Now imagine three years ago, today, at least, you know, AI agents, you can see the applications. Two years ago, when chat GPT has just come into this world, you're like, the optimist in you is like, oh, the world has changed. The pessimist in you is like, oh, you correct natural language processing. Okay, yes, people have been working on it, you just cracked it. Okay, fine. But that's where we were trying to forecast AI demand.

And we don't have much data to forecast, right? As data scientists, we love history, use models, you learn from history and do it, we don't have history for AI. We spent a lot of time researching the right way to forecast it. And you know, funnily, the way we ended up forecasting it at that time was, we thought about historic analogy, an analogy similar to that, right? So what do you do when you have a new disease, for instance, that you don't have data on, you look for similar diseases in the past, and try and figure out how those diseases spread similar to Spanish flu and COVID. The closest analogy we could come up with at that time, and this is now two to three years ago, right? In hindsight, it may look like, oh, what's the big deal? But today, it is obvious. It was iPhone apps and the App Store.

So if you think of iPhone, iPhone, breathtaking, amazing new technology at that time in 2010. But the real economy was not the iPhone, it was the App Store, which added a value to a lot of people's lives, and created that whole economy around it. And that's how we thought about training and inferencing. We said, think of your iPhone as your training model, setting up the baseline for you. But your real value add for customers across the board is going to come from inferencing, and they're going to be applications like the apps, which there are going to be places like the App Store, which will host applications, which will add a lot of value to people.

So if you think of iPhone, iPhone, breathtaking, amazing new technology at that time in 2010. But the real economy was not the iPhone, it was the App Store, which added a value to a lot of people's lives, and created that whole economy around it.

One of the biggest complications that we need to deal with is lack of fungibility. So if you think of supply chains, when you're forecasting supply chain, if you have fungibility, it's amazing, right? We used to have that fungibility pre-GPUs, where a data center is a data center, it can host most workloads, compute storage can host most things. But with GPUs, GPUs are much more power intensive. And the future generation of GPUs need a different form of cooling and data centers, which is called liquid cooling. That means you suddenly don't have that fungibility. So if I plan a data center for GPUs only, that's all that can go there. And if the GPU demand goes lower, I'm done. I'm sitting on an investment which cannot be monetized.

So that is the biggest complexity we are dealing with today. It is, of course, the uncertainty and demand volatility of AI, which inherently will exist in a new technology, but also the lack of fungibility that it leads to in most cloud players. This is the same story for Amazon, Microsoft, Google, or any cloud player.

Take any opportunity you get to aggregate demand up higher. If you are told to forecast the S&P 500 and you want to have a small cone of uncertainty, do not forecast each of those 500 stocks individually. Focus the S&P 500 in an aggregate, you'll have much less volatility with it. So demand aggregation is a huge construct, which is very helpful in our thinking.

She said something very interesting she said the way she thinks about success is can I operate one level higher and one level lower in my role seamlessly. If I am able to do that without getting stressed without having to stretch that tells me that I'm doing really well at that role because I can go help my team members and help my boss as and when needed.

I think for instance I have gaps in going both one level higher and one level lower I don't think I can seamlessly do my boss's job right now I have things to learn similarly I have some areas of detail to learn for my managers report to me so I can't just replace them say they are on a vacation I don't think I'll be able to completely do justice to what they do so that is and now I know the improvement areas for me to get there.

Well we have two minutes left so I am not going to go any further Sajay thank you so much for hanging out with us you have been a fantastic guest. Can I give a call out to ADP list if some of it would help to have one-on-one chat from a mentoring or just a connect perspective where mentoring has a word just connecting with me I'm on ADP list where I'm talking with folks so feel free to schedule time with me over there.

And next week I would like everybody to come hang out with Pallas Horwitz analytics consultant and professional development coach if you would like to have more conversations about professional development in the management space or in the data science space Pallas is a great resource for that so hang out with us next week to meet her and ask her all kinds of questions. Thank you Sajay thank you thank you everyone I appreciate you spending your time over here.