Yu-Hung Chang, Phillip Lear & Brendan Scully | R in Manufacturing & Consumer Products

Transcript#

This transcript was generated automatically and may contain errors.

Everybody, thanks for joining. I am super excited and happy to have this great turnout.

So I don't wanna eat too much into our speaker's time because I think what they have to say is gonna be really exciting and interesting. I'm on the customer success team at RStudio , which means that I work with our enterprise customers to help them leverage their RStudio tooling and make the most out of their investment.

I've been working predominantly with teams in the last year and a half in the manufacturing and consumer products space. And I think it's an interesting space to be in. And what I've noticed is whether you're creating, whether you're creating large machinery or airplanes, or if you're creating little widgets, the problems and the challenges and use case are all similar in this space.

And so what I'm hoping today is that you'll hear from these three different champions in the organization and hear that the types of problems that they're solving and the challenges that they're facing, I hope they'll resonate with what you might be encountering in your day-to-day. And I'm hoping that we can continue this dialogue both at the community site, have some great questions and discussion following today, and make this an ongoing and regular group where we can connect and network and ask questions of one another.

So with that, I want to first kick off over to Brendan Scully from Pratt & Whitney. Brendan is a decision analytics strategy lead for Pratt & Whitney's aftermarket business operations. I'm excited to have him launch us up.

The last issue is one that I'm sure many organizations face and that's inefficient allocation of limited resources. So there's limited funding and limited hours in the day. The obvious answer here that has existed but sometimes isn't always used here at Pratt is developing prioritization methodologies.

Phillip Lear — Kellogg Company

So like she said, I'm the Principal Data Scientist at the Kellogg Company. I've assumed most people are fairly familiar with Kellogg, but just as a brief history, we are a 115-year-old food company.

The slide kind of shows the timeline. What I'd like you to get out of this is that on the left, you sort of see these great brands and historic development of sales and marketing techniques and all that. That is sort of like the first part of Kellogg's history. Most of this probably took place before there were IT departments or, certainly analytics departments. But in the middle, you start to see where IT starts to play an important role.

So in the seventies, we acquire a frozen waffle company. It's very different than shelf-stable cereal. 2000, we acquire Poshy. This is an entirely different supply chain because it's natural and organic. New set of customers. 2001, we acquire Keebler, which is a giant snacks company in its own right. They actually use something called direct store delivery as their distribution model, which means the employees who are the people who drive the trucks and the trucks are assets and the logistics are managed. So the amount of data generated by a Keebler cookie from factory to the store shelf is significantly more than a box of Frosted Flakes would have.

We actually had to adopt Keebler's ERP system because what Kellogg had at the time couldn't handle it. Pringles is a massive global supply chain. In 2015, we expanded to Africa and Egypt with major brands. 2017, RX Bar is acquired, which is a new set of customers and primarily through e-commerce. And then in 2019, we actually spun off the cookie and fruit snacks and ingredients business of Keebler and sold it to Ferrero.

My hypothesis would be that the right side is sort of enabled by IT and analytics, but it's still a 115-year-old company and there were still a number of challenges we've had. I started in 2010, for, if you need some reference, TikTok by Kesha was the number one song. I honestly don't think I know any of these songs. I was surprised when I looked these up.

But our ecosystem at Kellogg for analytics, and keep in mind, we have SAP, we have all these other things, the reporting at the company was well-established, but analytics was sort of niche. Everything was sort of segmented. IT had its own thing. Analytics was primarily in the business, either marketing or sales.

We had our own network of servers, which by that I mean, we had desktops that sat under our desks and had RAID drives attached to them. And if a hard drive or a fan failed on one of these RAID drives, there were more than one occasion where I actually went to Best Buy to go buy a new hard drive. We had a little toolbox and we unscrewed the plates and pull the drives out. Very, very, very, very sort of guerrilla analytics team at the time.

The desktop SAS license, desktop R and Python, obviously. We could not touch SAP at all. So any data we needed to get from SAP traditionally had to go into a transfer server. And then it was sort of like this Rube Goldberg machine of how to get data from one place to the next. And then probably the worst part to me coming out of grad school was everything that was a deliverable was effectively in Excel or Python or Excel or PowerPoint, sorry. And I was an econ grad. I hadn't really ever used either one of those to be completely honest.

So in the first few years of my time at Kellogg, I think I wrote more VBA code than most people probably would ever write in a lifetime because they'd want simulations and some of these other things. The downside of all that was that if you build a tool in Excel and you gave it to somebody, there was zero chance of ever updating any of the data or any of the methods or anything in those tools. So over the next few years, we really focused on how we got data. How we delivered sort of these data products to people.

If anyone who I work with is watching this, they may disagree with how nice and neat these lines look, but it was certainly an improvement over what we had when we started. We'd moved to a SAS server, effectively let people share code and compute time and all these things across the network. We had SharePoint to sort of deliver PowerPoint in Excel to people versus sending mass emails. This helps a little in that we can replace a version on SharePoint, but obviously if someone copies, takes a copy, there's no way to update that.

We got a Tableau server, which I think reduced some of the PowerPoints, but people were stretching the limits of Tableau and what it could do. So we set up some .NET servers where people could build custom applications and tie them in to, short procedures could act on databases or you could run a SAS for procedure. It's hard to do these things, but there's a fair amount of bureaucracy you'd have to go through to get through these.

So in 2018, there was a push that I think what people said is cloud first, mobile first, API first was sort of an IT rallying cry. The majority of the analytics people had been moved into IT by this time. We had multiple servers. We stood up a number of database management systems that people could access depending upon the level of skill they needed access to. And in 2018, I was just sort of sent on endless amounts of conferences as we approached building the strategy.

Just by happenstance in 2018, there was this conference called URL, which is the enterprise application of the R language. Apparently it's not normally in the US. So this was sort of a special occasion and I grabbed a couple of people on the team and we went to this and I loved it. I mean, it was awesome to hear people talking about using R and open source sort of in a production environment and enterprise environment. I think RStudio had a small table there and it was Thomas Mock actually, I remember this. And whoever he was with, he got grilled for like two hours by me and an engineer I brought and one of the other people on our team on what Connect did. Because we were effectively going through the idea of like what we were building to deliver this cloud-based architecture for analytics.

A couple of months later, RStudio conference is in Austin. We go to that as well. And again, Thomas Mock is there and I accost him and talk to him. And I told him I was actually skipping. I teach at a university, teach an honors class that Econ students effectively teach them how to use R in sort of real world applications for econometrics. And I told him I was skipping this and I was teaching. So they got a week off, but I was gonna come back and share all this with them. And he told me about RStudio Cloud, which was in beta at the time.

Building the analytics platform at Kellogg

So the first thing we did is set up these test environments. Like I said, we tried multiple environments. We, I think we somehow provisioned some servers and started setting all these little tests up and doing things. Because as we were having a conversation with IT, which is a fair amount of IT people are like finance managers, program managers. Everyone was very, very interested in switching to open source, which is very odd to me. So I started and I talked about R, Python. Most of the time people thought there was some security risks using open source software.

But somewhere along the line, there was a change of heart. I come to find out this is mostly because they thought open source meant free. Like they thought that switching to R would mean like total cost of ownership would go down substantially. And it was very difficult to say, to explain to people why we would need this architecture, why we would want a product like RStudio Connect. So we just built our own version of, you know, as close to RStudio Connect as we could get. We built, you know, RStudio Pro server. We built a hosting environment for Shiny applications and Markdown applications. And we started converting some of the existing projects we had into R and deploying them through Shiny.

You know, we set up, I think we had a Git lab that we installed on one of the servers and started, really built the entire ecosystem to show people how it worked and kind of prove out, you know, our vision of what this cloud-based analytic system would look like. And it was pretty successful.

Now, unfortunately, what we didn't do was sort of plan for the long-term management of this. So we stood up the RStudio server and we got people accounts. We had all this, we had the POC, we had project managers, we had all these things, right? And then once it was a success, everyone said, yeah, this is successful. And then they sort of handed the keys over to our team.

And while it was, I mean, it was certainly interesting because me and another person on my team started actually approving, you know, approving the requests for licenses and setting them up on the server and, you know, writing back code to handle updates and all these things. That really wasn't the best time for our people. So now we've sort of hired some people who are full-time managers of the system. They can, you know, the compiler needs to be updated. They can update the compiler if, you know, we need to expand or buy more space, whatever, they're sort of managing this stuff.

The second thing we did that worked really well was we went and found all of the analytics people throughout the company, not just data scientists, not just people in IT, people in supply chain, people in sales, people in the regional finance offices, the people who are doing sort of amazing things, often in something like Excel or Tableau. And we just gave them a count. We set up their connections they needed, gave them a brief overview of how to get started and let them go.

We went talking to a lot of the IT people with this idea, they weren't terribly into it. There's always sort of concerns for security managing access and doing that. But I think we knew it was the right thing to do to just get people using the tool. I attached myself to a project building a rather complex application in Shiny, sort of a crud tool that interacts with our promotional data.

Another member of our team actually built our, at the beginning of COVID, built the entire COVID tracking for our factories, which anytime an employee either obviously contracts COVID or reports that they've been in contact with someone, we needed to maintain this. Now, if you think back a year ago, sort of the confusion around all this, like built the entire tool, built some security for it, sort of did all this in Shiny. And I think this really sold everyone that this is the route to go.

Now, unfortunately, we didn't have any user groups or any connections between all these separate people. So we quickly had a lot of searches on Google on how to do things and some inconsistency in the libraries people were using. And we were managing the library centrally, so we'd get all these requests for depreciated libraries and have to manage all these. So what we're trying to do now is actually have a group of people who meet to share what they do and sort of standardize some processes. Not that we don't want people to be innovative or anything like that, but there was just sort of a lot of work, managing people doing very similar things in different ways.

And then the final thing is we really, really pushed the idea of having a centralized analytics platform. That was cloud-based, that if you want to deliver content to someone, it was one place they could go to. It was web-based, everyone could access it. If you remember, it was sort of the push for the cloud, mobile API. Mobile last year sort of dropped off the radar. I assume it's because everyone was working at home, but the idea of the centralized platform on a cloud-based architecture really enabled us to go through 2020 without any real delays in the work we were doing because we sort of had this stood up right at the beginning.

Obviously we work in the, I don't know if anyone else is in sort of the CPG business, but, or I guess if anyone was shopping at a grocery store, you notice this, there were massive supply constraints. Most likely a lot of promotions were pulled that were planned. The way we allocated our production was really to fill shelves versus anything else. All this stuff was sort of built in RStudio and delivered through Connect. So everyone is sort of on board at this point.

Unfortunately, the people we didn't sort of bring into this initial cohort were being asked to build things in R without really knowing anything about R. So I think this is another problem of success sort of causing an issue. We're trying to rectify this now, we're really taking, Katie will know, I routinely emailed her throughout the year last year saying I really wanted to make sure we've got an education thing going. Entirely my fault that it never really took off. Now we're very, very, very committed to doing this. We wanna upscale tons of people, we wanna give people these tools, we wanna see what people can do.

Even today we're going through sort of an architecture change. I was talking to some consultants who are managing migration of some things from SAS to R and they sort of asked about scheduling, like batch jobs. And I took them through this idea of building an R Markdown file that runs on a schedule, generates a report, emails people sort of the changes made in the system. I think when people started to see this, I mean, they really see the power, now it's up to us to sort of make sure they have the capability.

Dr. Yu-Hung Chang — AGCO

So my name is Yu-Hung Chang. My background is actually in aerospace engineering and statistics. And after graduation, I had a great opportunity to join a fantastic team at AGCO. And now I'm the advanced analytics specialist at global quality field in AGCO.

So today I will quickly go through, give you an overview about our company because that will help you to understand our challenges and the difficulties we are facing right now because I believe most of you have ever, probably everyone has ever had Kellogg's product, you know, their food or the flight, either Airbus or Boeing, you know, they both use Pratt & Whitney engines. So you sort of have your customer experiences with their products, but I believe probably no one has ever used our product before, I mean, neither do I. But I believe many of you have had the food, you know, either plant or harvested by our product.

So we are, we are a global leader in the design, manufactured and distribution for agricultural solutions. And we are also the leader and experts in global quality solutions. So these are the four big leading brands we have for different type of agriculture vehicles, including tractors, rogators, surrogators, and anything you name it.

But we are also the only full one product in company in agriculture industry. Why I say that? Because we also has a very comprehensive grain and protein department, which we own these five brands. So you probably know GSI and also CNBRA and other brands as well. So our values and mission stand on, you know, providing sustainability and good quality products to farmers and also improving farm income.

So just a quick overview to our history. As you can see, AGCO is a pretty young company. Last year was our 30 years anniversary. And we officially started in 1990. And in the past 30 years, actually we have acquired over 40 companies. That's why, that's one of the method we grow this company so fast and now become so big. And during, you know, acquiring other acquisition and, you know, it means you have to also acquire that company's data vault as well. It means we kept gather other people's data vault into our own system. That make our data actually very dirty.

I mean, more complex than, you know, the CFD turbine flow data I've ever handled before. And that actually caused a lot of challenges to us, especially for agriculture. Even let's talk about tractors. Even you see different type of tractors doesn't mean we can really just compare them together because we have tractors designed for the maybe common uses. And also we have tractors designed for vineyard. We have tractors designed for grants or for fruit. So how to standardize our analysis tool to make sure when we analyze our data, when we analyze our product quality, we won't make mistakes like comparing apples and oranges together.

And we can provide nice, you know, KPI for everyone, especially now AGCO produce, you know, 133,000 vehicles per year. And we have 43 manufacturing locations. And in the past, you know, some of the manufacturing locations were joined by acquisition. So it means each factories actually has their habits, their know-how. So we have to understand, you know, what they have and what they try to do. And also make sure when we standardize this system, we can balance everything. And we can try to make everyone, you know, not only our engineers happy, but also make our customers happy.

AGCO's data science challenges and the Analyticus tool

And now we're talking about what do we do with R, what we choose R and what are our challenges?

Yu-Hung Chang, Phillip Lear & Brendan Scully | R in Manufacturing & Consumer Products | RStudio

Transcript#

Brendan Scully — Pratt & Whitney

Engine repair forecasting model

WIP management tool

Decision analytics hierarchy of needs

Common challenges and lessons learned

Phillip Lear — Kellogg Company

Building the analytics platform at Kellogg

Dr. Yu-Hung Chang — AGCO

AGCO's data science challenges and the Analyticus tool

Featured software#

rstudio