Data Science Hangout | Alec Campanini, Walmart | Using Shiny to make business decisions
We were joined by Alec Campanini, Senior Manager, Merchandising Operations Omnichannel Business Analytics at Walmart. Alec loves to standardize and scale data in a targeted and meaningful way without sacrificing speed of development. (5:12) - Alec shared an awesome example of the way their team is using Shiny at Walmart: - Their team's main scope is “rest of market” - trying to figure out every item that exists in the rest of the market, no matter if it’s in the store, online, through TikTok trends, social listening, whatever it may be. - We have a lot of market share data, sales information, pricing over time, and bestseller trends that are popping up. We do a lot to find out where merchants would be lacking inside of a brand. - Today I own a Shiny app (which has been alive for about a year and a half now) that is pushed to all merchandising. We have about 12,000 users on it today across all of the merchants and merch-ops. What does it do? - Starting off it was insights on 28-30 different retailers in the e-commerce space. With so many items, there are a lot of different contracted data sets coming in. The biggest thing is normalizing all of those to the Walmart hierarchy. We need to have some way to tell a merchant that something from the rest of the market on a different website makes sense for them to look at. - Our Shiny app and workflow allows us to quickly identify what percentage we have of a certain brand and drill down into things like: 1. What are the top items being sold? 2. Do I want stable items or trending items? Do I want something that I’ve never seen before? 3. Do I already have the item but need to increase what I’m giving the customer? 4. Do we need to lower the price? 5. Is it shipping too slow? 6. Maybe Amazon has 10 images for the item and we only have 5? An example of this Shiny app helping make business decisions: - We released a line of new golf tech products for Father’s Day that came from the Shiny app - We had a lot of the sports & fitness merchants find out that we sell a lot more golf technology rather than golf balls online. These are merchants that used to be in the store, but they got into the e-commerce space as a new merchant. - They think the trends are going to be the same in-store as online, but they found a different trend inside of our app and were able to launch a whole slew of things this past week, which was really cool. Other timestamps: - When making the case for code-first data visualizations over BI tools (25:58) Packages shared in the chat: - bs4Dash: https://lnkd.in/gPUBDW72 - Rhandsontable: https://lnkd.in/gtYedxzY - Bigrquery: https://lnkd.in/gtPM6Pn2" Where to find more? ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Data Science Hangout site: rstudio.com/data-science-hangout ► Add the Data Science Hangout to your calendar: rstd.io/datasciencehangout Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/rstudio
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome to the Data Science Hangout. If you're joining for the first time, I'm Rachel. It's great to meet you. This is an open space for the whole data science community to connect and chat about data science leadership and questions you're facing and really just what's going on in the world of data science.
The sessions are recorded and shared to YouTube as well, um, as the RStudio Data Science Hangout site. So you can always go back and rewatch them and find helpful resources. We do also have a LinkedIn group too, for the Hangout. So if you ever want to continue a certain discussion or meet people there or see a brief summary of a topic you brought up each week, you can do so.
I do want to just say, um, I can't believe it's almost July and wanted to let everyone know that we will be taking a bit of a break from the Hangouts in July, just in July only, um, as we prepare for the RStudio conference and some big announcements. Um, but we'll be back on Thursday, August 4th as well.
I realized I hadn't been mentioning that to people, so just want to let everybody know, um, anyways, we are excited to see all here and always want to create spaces where everybody can participate. So as you know, the Hangouts are all audience led. So there's three ways that you can ask questions today.
You can jump in by raising your hand on Zoom. You can put questions in the Zoom chat and just put a little star next to your question, if you want me to read it out loud instead, or else I could just call on you to jump in. Um, and then lastly, we also have a Slido link where you can ask questions anonymously too.
We love to hear from everyone, no matter your level of experience or area of work or industry. Um, but with all of that, welcome. I'm excited to be joined by my co-host for today, Alec Campanini and Alec, you have a very long title here, so hopefully I don't mess it up, but Senior Manager, um, Omni Merch Ops Innovation, Assortment and Space Analytics at Walmart. So you're definitely going to have to describe that for us too, but Alec, I'd love to kick it off with having you introduce yourself and share a little bit about the work that you do.
Hey everybody. I'm Alec Campanini. Um, like she said, I'm, I'm working with Walmart. I've been at the store side of Walmart for like three years, uh, back when I was in college, uh, went off, did some mastering stuff, and then now I'm back at Walmart for the past, I think two and a half years now, um, I'm working inside of the merchandising area specifically. I'm looking at optimizing our assortment and making really large bulk changes and strategy for our assortment.
Um, this tends to happen, uh, it's Omnichannel, so every single channel you can think of, even drone delivery right now, uh, we've started trying to figure out how we can place assortment in certain areas, what all's available, what's going to make the most money, et cetera, um, as well as like this digs into like supplier. So supply chain comes into it. Um, but most of it is like it's strategy based around innovation, merch products. We build a lot of products for 3,500 merchants, uh, in order to get those bulk changes made, um, as well as reviewed. Um, and so we have a really big kind of like feedback community for the type of work that I do.
What's exciting in data science ahead
Um, and I'd love to kick off with this question here. Um, but what's something that you're most excited about in data science in the year ahead?
Um, I've really enjoyed the flexibility of things that are coming out. Um, I mean, RStudio has done a good job of enabling a lot of that within sort of Walmart. Uh, so we actually have the, uh, the team suite. So we, we utilize that to make sure that we can actually share a lot of our stuff across, um, different teams, uh, as well as have an easy, you know, one-for-one deployment method, uh, for all of our Shiny apps. Um, so I guess in like the next year, what I'd really like to see, uh, at least at Walmart anyways, it's just how my mind's turning is, um, our slot is flexibility around data engineering, especially as we scale and change channels, uh, as well as being able to actually model that in a meaningful way. Um, so it's, for us, it's gonna be flexibility in the next year or two.
The rest of market Shiny app
I know I see quite a bit on LinkedIn about some of the Shiny apps that you create at Walmart and would love to just hear a little bit about some of those and how they help impact the business just while we're waiting for some questions to come in from everyone else too.
Yeah. I think this will probably spur some questions, uh, especially since it has to deal with other retailers. Um, my main scope right now is called rest of market. And so our, what we're trying to do is find out every item that exists in the rest of the market, no matter if it's in the store, online, through TikTok trends, social listening, whatever it may be. And, um, it's really cool that I got it started out as my first project because the scope is pretty unlimited.
Um, cause you've got like Walmart as its business, and then I get to take on like rest of market as kind of its own business. Um, so today I own a Shiny application that is pushed to all merchandising. Um, we have, I think a little bit over 12,000 uses on it today, um, across all of the merchants and merch ops. Um, and what it does is, um, starting with e-comm, which our first version, uh, was insights on about 28 to 30 different retailers in the e-commerce space. You can guess as to who those may be.
Um, with many items at the item level, um, a lot of different contracted data sets that are coming in, uh, so they all come in different grains and forms. Uh, and then the biggest thing is just to normalize all of those to the Walmart hierarchy. So we've got to have some way to tell a merchant that something from the rest of the market on a different website makes sense for them to look at. Uh, so we, we do a lot of data normalization. Uh, we do a lot of trend analytics for them. Uh, and then a lot of that afterwards comes to how much can we scrape and acquire, uh, to enhance our data.
So we have a lot of, um, we have a lot of market share data, a lot of sales information, um, pricing over time. Um, and then a lot of the bestseller trends that are popping up. Um, so we, we do a lot of things to find out where I guess merchants would be lacking inside of a brand, especially national brands, and then we allow a workflow to quickly identify, oh, we're only at 60% of this brand, uh, drill in. What are the top items being sold? Do I want stable items? Do I want trending items? Do I want something else that I've never seen before?
Or is it just that I have the item and I just need to increase, you know, what I'm giving to the customer? Do I need to lower my price? Is it shipping too slow for somebody? Did we mess up the website? Amazon maybe has 10 images and we have five for whatever reason. Um, so just, it just keeps going and going. Uh, but that's my first Shiny application. It's, it's been a lot for about a year and a half now.
What's an example of something that's like a TikTok trend that influenced a brand?
So, so TikTok is pretty new for us. We actually haven't engineered that fully into the Shiny app today. That's, that's social listening that we're wanting to, uh, to bring in. Um, we're thinking about how we want to utilize that. It's, it's obviously gonna be based on hashtags that are trending throughout wherever. But we have to try to parse out a lot of that for what is product based versus just what else is actually just floating out inside of TikTok.
Um, but I think the coolest trend is, uh, on Father's Day. Uh, we released a line of new golf tech products. Um, and all of that actually did come from the, uh, from my app. So we had a lot of the sports and fitness merchants, uh, find out that online we actually sell a lot more technology rather than golf balls. So these are merchants that used to be in the store. They're setting a mod once a year. We're always going to have golf balls on the shelf because we sell all of them all the time. It's a physical asset, but they get into the e-commerce space as a new merchant for like Omni and didn't think that it's the same type of trend. They find a different trend inside of our app. And then I think we, they launched a whole slew of things this past week. So that was really cool.
Audience Q&A: data quality and reviews
Um, Libby, I see you asked a question in the chat. Do you want to jump in?
Um, yeah, I was wondering if when you're looking at the value of a product within a SKU catalog for a brand, like do you incorporate reviews, the number of reviews, the quality of reviews, the percentage rating, stuff like that, when you're trying to figure out what's worth pulling over?
Yeah. Sadly, we haven't gotten into like sentiment analysis on those reviews. We're just cause of the scale of it online. Um, I think Amazon sells like 800 million products. And so we're just looking to acquire those in the first place, much less the reviews. Um, but we do bring in counts of ratings and reviews. Um, and we try to track those over time. Um, so it depends on, we, we really scrape like what we find in the market is the most important. So we do have sales performance on a lot of these items for the past four years. Um, some of that's acquired, some of that's paid for. Um, but we, we try to track as much as we can for those that trend.
Hey, hello everyone. Actually, I wanted to ask like these days in the Walmart, uh, people in Amazon do false reviews also like they did to the paid reviews. So how do you segregate between the paid reviews and the genuine reviews you are getting from the customer? Like what is your strategies in that?
Yeah, that's the, um, well, that's one thing we haven't talked about. There's the sentiment analysis, but that does affect our counts. Um, another thing that affects counts just like that, uh, is duplicate listings online. Um, so Walmart, uh, does a bad job of taking marketplace sellers and making them put it all underneath one item space. And so we even have duplicate listings of like how big our assortment is. Uh, same thing happens on Amazon. Same thing happens on reviews, ratings counts. Um, and that's, that's kind of the cool part of rest of marketing. Part of rest of market is that it's not even our internal data and we have to figure out some way to de-duplicate it.
Um, so we have different methods for doing that internally and then we have different methods for doing that externally. Um, a lot of it is kind of based on different data sets we have. So if we see that one data set is an outlier for a brand's depth for Amazon, uh, then we may treat the other three that we have engineered as more of the source of truth. Um, but it just becomes waiting at that point. We also take it, like I said, a lot of merchant feedback. So if we feel like, uh, there are misguided, um, outputs being shown to a merchant, we allow them to edit a category. If we predicted it to their space and it's incorrect, we allow them to edit like items that we have mapped for it, which helps us understand if we're meeting customer needs. Uh, almost all of these metrics are edits and we'll save and push it back.
Handling data at Walmart's scale
I think 800 million products is really hard to imagine. And I'm curious, how do you handle all that data or what does that look like at Walmart?
Uh, it comes down to prioritization. So the, like I said, there's, there's really only probably five to 10 million that we actually care about at Amazon. Um, same thing happens at Walmart. We, we have 250 million, I think, and the, you don't even sell all of those. So we're really looking for sales samples and paid things within the market. Uh, we'll, we'll actually have a receipt based samples come in from different households across the market. We'll pay for that, figure out what's actually making sense. We may miss some items that way, but we typically know what's in the head and the torso pretty well.
Um, it doesn't tell us like total market value of an item. And that's where we start doing rest of market estimates at the item level. Uh, we'll, we'll predict what the market demand is, and then we're working on predicting what the Walmart demand is, even if, you know, it's not always the same, not the same customer base.
So I see a few people ask the question on Fido anonymously. And one was, has Walmart's relatively new subscription service impacted your work?
Yes. So, uh, we actually break out sales samples for first party and third party subscription and non-subscription sales. And so this, this helps mainly inside of the pets area. Uh, obviously there's a lot of, a lot of different models that you can offer for, uh, renewable things for life. So that makes sense. Um, but yeah, there's, we're, we're trying to identify, uh, trending subscriptions, downtrending subscriptions, um, most paid for most stable over time, anything you can think of about subscriptions. So, um, another thing that plays into that just to piggyback off of it is services. Um, so Amazon offers a lot of installation services and stuff, right? Whenever you buy a product, Walmart has a few of those, but we're also trying to match and compete on services.
Um, Seth, I see you put a question in the chat. Do you want to jump in?
Yeah, sure. Hello everybody. Long time no see. Um, Alec, thanks for, uh, for taking the time. So, uh, you mentioned kind of like these dual listings and things like that. And I'm sure all of us have experienced this thing where, especially when you're trying to do analysis on like products or whatever, there's a lot of coordination that is required from like, let's say the front, front end development or back end development that makes maybe the data ingestion of data analysis much easier. And I'm wondering if communication on like, Hey, like maybe if we didn't orient these things to be two items that would make our lives a little bit easier and kind of improve our, our analytical capabilities, does that kind of communication exist?
Yeah, it's starting to, it's starting up a little bit more nowadays. Um, we have good documentation on the approaches that we've tried, um, with deduplication. Um, so the, the issue with Walmart is always scale and the different types of businesses within that scale, like on the store side, you wish everything would operate sort of the same. Like if you're, if you're Ace Hardware, you know, you've got hardware, but we've got hardware and pets and everything all inside the same place. And so typically for, um, you know, some type of flexible approach, deciles are, are typically good or percentiles, um, scrape everything out from the bottom one percentile, and that's typically your outliers.
Uh, but to actually, I think, move that throughout the organization, you know, we're doing this for ourselves right now at Sortment. Uh, my now strategy that this has been used so much, and there's the business impact already there is we push our datasets up. Um, we're wanting to actually get the rest of market items deep duplicated into our catalog, like our catalog. And so even if we have, uh, unmanaged records that aren't fully complete, like we don't have a unique identifier for it, or we're still missing size or dimensions. Um, we'll have a record there, but it'll be a much quicker setup the next time the item actually comes up for the merchant to put in our space. Um, that'll, that should help with some deduplication. There's still a lot of data science and guessing that goes on behind the scenes, but, uh, we're, we're trying to make sure that it gets standardized for everybody.
Tracking competitor prices
I see someone else had asked on Slido going back to, and you're talking about tracking prices. How do you actually track prices on Amazon or from other competitors?
Yeah, there's, there's a lot of different ways. Um, there's batch loads that we'll get monthly, like I said, from different data contracts, um, so we can actually see the receipt based price. So the thing that somebody actually paid for at that time, um, you won't know daily that's a price over the month. And so we do have to take stats from it either, uh, normally medium, just in case pricing goes wonky on something. Um, so we get some type of batch load. It's a little bit lagged, but we also do a lot of the, um, you know, Amazon posts, their bestsellers. Um, we post our bestsellers. Uh, you really just want to focus in on like what the top items are trending on their page at that time.
Um, and they refresh it. So, I mean, we ref, we, we refresh hours, I think two to three times a day. They do theirs three to four times a day, I believe. And you can consistently, if it's still on that same list, check pricing as it goes throughout the day. Um, one issue is zip code related pricing. Um, so if you go to Amazon, I can go check it now. One of y'all can go check it. We see different pricing for shipping or whatever it may be different ship speed. Um, and that's a whole different ballgame that we still have to get into. Um, how could we, uh, get all this data, uh, not even just at scale of 800 million, but 800 million by zip code in region. Um, and that's, that's something that I actually haven't even dug into yet.
That's cool. When you say like the receipt scans, is that sometimes like those apps that give you like money back for scanning your receipt or is that actually, yeah, I actually don't know the whole process for it. I don't, I don't think our vendors are going to let us know the complete story behind it. Um, but in essence, all I really know is that there's millions of households. It's not really just like people based. It's a household based. Um, and they, they would pay them X amount of money to do this service for them for a certain amount of time. So they'll sign a contract to be part of the household review. Um, I think for a year or two years, whatever it may be. Um, and that's just so that we keep it a steady sample, customer shoppers and insights kind of stay, stay relatively the same, it's spread out everywhere, where we want inside of the region, and then we just kind of aggregate and predict from there.
Vetting external data and showing ROI
Um, Libby, I see you had another question in the chat. I'm going to jump in.
Yeah. It looks like Brittany and I have a similar question. Um, when you are looking at all market stuff, I feel like that's a really big struggle, you have to use external data and it's something that everybody who has a product or service in a market deals with, so is there a vetting process that you go through to like vet external data, do you get a sample of it and get to use it and then decide whether or not you want to purchase it, do you validate reliability of the data, stuff like that?
So luckily enough, we actually have another team that takes on a lot of our data acquisition, uh, as a service. Um, so they'll, y'all have heard of Nielsen, um, you've heard of probably 1010 or NPD or all of these main data contractors. Um, some of those are, are really easy and reliable, uh, like Nielsen's, you kind of vetted throughout the market already, uh, but once it gets into smaller players, we actually do have a team that goes out, um, since we already have three to four data sources already standardized into the Walmart hierarchy, we can kind of guess numbers. You get a new one, you check it up against those numbers. Hey, something was wonky. Go back and forth between who the vendor is, see what the differences are. Maybe their assumptions make sense. And there's a reason why something doesn't make like line up perfectly. Um, after that, we just kind of use it. Um, you, you just have to see with a merchant, um, if there's any issues for their area. Are they missing data? Does it seem like something is being assigned incorrectly? Um, it's a long process. Um, but luckily since we do have a team that does that before it reaches me, um, I can at least use it in the terms that I know that merchants are happy with it.
So do you have, do you have any as a follow-up, do you have any scope into like ROI for those purchases? Like, is that something that you guys have to deliver on? Like we bought all this external data. Here's what we did with it and here's why it was worth it.
Oh, for sure. Yeah. Um, that's, that's been an ongoing conversation the moment I said the word Shiny app. Um, so because whenever, what happened first is, um, there's a, there's a guy named Peter cross, um, he's, he's left Walmart, but he was great. He stood up our entire infrastructure for RStudio within Walmart. And luckily enough, I got to piggyback on that, even though I wasn't on his team. Um, so he gave me a license. I got to deploy some of my work, started doing some testing and as a POC put rest of market out there.
Um, and then whenever it was announced that Peter was leaving, no, RStudio was, uh, kind of like falling by the wayside for any of the teams. So we picked it up. The moment I said I needed to pick it up for funding was the moment we needed to ROI analysis and figure out what it was actually going to do. And even at that time, that was only when we used one data set for e-commerce. Um, so yeah, I wanted to make sure that people knew that there was impact there. And so every, I think it's about every three months now we bring all the merchants that have had success stories and they demo their success stories for the other merchants. Uh, so this will go out to the VPs of their leadership. There's thousands of people on the call, um, or at least thousands in private, which we see who shows up. Uh, but the, the merchants will get information from other merchants that makes them trust the source more. Um, and maybe dig into their e-commerce space a little bit more.
Uh, but my next release will be a lot larger. So, um, we're thinking as we add things on, we want to keep showing them what the benefit is. Um, it's a little bit hard to track perfectly in numbers because there's a lot going on. Like it's not, it's not even just about the business impact. It's really that we're also doing all these standardization and modeling practices that were never there in the first place. Um, it's, so it's, it's valuing each piece of that.
It's really that we're also doing all these standardization and modeling practices that were never there in the first place. Um, it's, so it's, it's valuing each piece of that.
Shiny vs. BI tools
I know that post was around, like, how do you show the value of Shiny over business intelligence tools as well, when your team is already paying for multiple things? Um, so yeah, I'd love to hear like what you learned through that, but also open it up to other people who have maybe had a similar conversation.
Uh, yeah, at least yes. If anybody has anything step in, but at least in my experience with the BI tool, a conversation. Um, we've at least made our, our point. On maybe eight to 10 things about why we would have to pick Shiny over a different BI tool. Um, the first one really is just that we have a lot of people on our team that use R. So the skillsets there, uh, that's a pretty good use case.
Uh, the second one is just that, uh, drag and drop is pretty hard to document, um, steps between each other. Uh, so if something breaks and somebody else needs to fix your visualization, you can't just read through some GG plot steps or plot or whatever it may be and recreate it. Um, you're typically Googling in a forum somewhere and then hoping that somebody put a screenshot of every button they clicked.
And then number three, our scale, uh, most data models, uh, especially like power BI data models don't typically handle our scale. Um, I know there's some data warehouse and data mark stuff that they're doing recently, haven't checked it out yet. Uh, Tableau, um, you know, big, it can connect to BigQuery or something, which is cool. Uh, but the processing and then Tableau data extracts are just, it's just not enough, uh, for the amount of assortment we deal with. So, um, I mean, when we think about historical assortment at Walmart, it's 250 million today for, if you need two years of analysis, you know, by day. And that's just for the items you haven't joined in performance metrics yet. So, um, typically we have to deal with, with staging batch analysis, put inside a BigQuery, expose it in Shiny. And then go forward.
Looking ahead: social listening and new data sources
Um, I guess, uh, there's no question. Um, what are, I mean, this was great in terms of understanding a lot of the like process of like how Walmart works, but what are the things, I guess, that you guys are looking towards in the future to kind of, that excites you? Um, both, both technically or otherwise.
Yeah. So the, um, I think the social listening piece is going to be really cool. Um, uh, I'd like to, I'd like to get into more data sources. So one thing that we've learned is that, you know, Google search and Google trends are normally a, a pre indicator of sales inside of the market. Um, it's, it's a really good indicator. Um, and so the first thing that we've done is like at a brand level, um, you can drill in from our app into Google trends. And so you can click up to five trends. That's their limitation, but it allows you to find a lot of the search results that are trending in the market, the geography of that, and then what the top results are over time, as well as like, you know, check brands just against each other.
Um, and so that's, that was impactful already. Um, and there was already enough for a merchant because they could see, Oh, Timberland trending up what's going on. And there was some collaboration of Timberland and some designer. Uh, and then maybe they want to try to get in on marketing on that or whatever it may be. Um, and so that was, that was full enough much less getting into, like. Uh, TikTok, uh, Like To Know It, um, all these different apps that nobody's really like we've never sourced it at Walmart before. Um, or if we have, it's for specific one-off things that we may do in marketing. And so now this is, this is really our group tie-ins marketing. Um, and I don't know, that's, that's, we're here kind of siloed. We're giving everybody cool stuff, but I want to be with catalog marketing, data science, data engineering, and then push it all back out. Um, so that our systems can utilize a little bit more. I think that's, that's probably the biggest excitement at Walmart right now.
Connecting Shiny to internal databases
Um, I see there's a question on slide 02 that was, how are you able to connect the Shiny apps in R with your internal database information systems?
Yeah. Yeah. So I, uh, I just use, um, ODBC drivers, uh, through RStudio. They have professional drivers for BigQuery. Um, they don't have one, I think for Presto, which is where we have a lot of our layers kind of thrown out there from GCP. Um, so we, we have our own Presto, uh, connector that we use. I think there's the R Presto, uh, Shiny power R package that you can use for connecting. Um, but most of it's ODBC. We used to use JDBC, but we recently converted over. Um, and then I'll use like the database, like pool package to make sure that connection stay on and revolving, uh, through BigQuery, cause there is time out stuff, um, but yeah, I'll just connect through BigQuery in that. I'll use DB plier, uh, either inside of the backend of Shiny or for development of a SQL query, um, just to get a flow down and then me and my coworker usually work together if we need to use like Python's API to optimize it.
Impact on merchants and workflow
Um, I was just curious. So I know this is a tool that you provide to the merchants as well. Is this something that's like greatly improved their experience of working with Walmart, or I'm just trying to like put myself in the shoes of who's using the Shiny app.
Honestly, I think, I think at this point, it doesn't help them process wise, but that's, that's the whole point about connecting to these systems is to actually improve their processes with the data now, now that we actually know what they need and what, what insights they're going for and where else we can build, um, right now it's, it's honestly probably more work. Because they, they come to our app, they find insights and then they've got to actually execute and do something to it.
Um, and in the e-commerce space, that's typically always at all, unless we're putting aside of something specific in our fulfillment center. Um, so it's, it's probably harder actually for them right now because they, they need to drill in to understand what their gap is. Um, there's predictions to their category, so it could be incorrect. So they're giving us feedback. Um, we have items mapped to it that are anywhere from 90 to a hundred percent, uh, similar, but that doesn't mean it's always right. It could be a short item description or something. And so we could be giving them a little bit of false information they have to sift through.
Um, so yeah, I mean, it's, it's four to five, maybe six more steps than they're actually used to for the e-commerce space, but it's data they've never had before ever, and so it kind of outweighs it currently, but that's why we really need to push to getting this, this data inside of our item catalog so that when they come in Shiny, give it, we can recommend, Hey, give us two things and we can set up the top item in your market with the supplier and that's it. Instead of, Oh, here you found your item, go to this tool, set it up in the item process, go to this tool, add the supplier to our area, go to this tool, order some inventory for it. Now it's, it's becoming a little bit mundane.
Challenges at Walmart's scale
I know we have people on the data science hangout from so many different like sizes of companies and different industries, and I was just curious, like, what's the biggest challenge that you've faced working at a company as large as Walmart and, and how do you overcome that?
Uh, there's a lot. Um, so the size of the scale of our organization, just in merchandising means that you can't even send a total merchandising, uh, communication email. So you need to get the word out. It's not through that. What else are you going to use? Um, so it's, it comes down to a lot of word of mouth. It comes down to a lot of trust coming from the merchants. Um, even if you send out a mass communication email, you could be in the wrong quarter of planning for them to even look at your e-commerce space. So they don't care at that point. And then they forget about it. And then you need to send something else out again.
Um, so a lot of it does come down to like, we'll do departmental strategy. So we'll go, Hey, we've got 55 departments. Uh, it's a lot of meetings, but if we can group some of them together, we can make a joint. And then we'll do maybe an hour, hour and a half demo of how you'd actually use the tool, what's coming up next and then the feedback session. Uh, just some type of deep dive with them. Um, and then every quarter, like I said, there, there's the monthly, um, demo from the merchant side. Uh, that's the, that's the larger format one where they, where all departments are invited to come. They're not forced to come. Uh, but yeah, it's just calendar management. Um, being able to say no to certain tasks because you're going to have 300 people ask you about something. Cause you're a SME, um, and then communications and communications is almost always the hardest part. Um, especially if somebody gets ahold of your data asset and starts making, uh, somewhat the same tool as you. So then you have to fight for merchants using one tool or the other.
Um, a followup question that came through about that is how do you decide what to say no to?
Uh, business prioritization. Uh, so since I work on the business side, I don't really operate like tech. It's not, um, like we kind of prioritize our own impact based on current versus long-term strategy. Like if we, if we just decided that we were going to move inventory from all of our fulfillment centers to the store, for some reason, then we would need to focus on that first. The business impact is there. If we feel like we don't have the bandwidth right now, we have to turn it down and try to see if somebody else can support it. Um, otherwise, if we think that it trumped something that we're currently doing, we'll put one or two people on it, kick it out real fast, as fast as possible, and then go back to focusing on something else. But it always just comes down to like what we think is the most important at that time, which is a very generic answer.
Um, you've got like, um, I don't know. It's not even based on dollar impact. It's maybe you have a data asset that you think is going to be more valuable in the longterm. You've got to work on it now, uh, because in next month, if you don't have it by next month, it's going to be a year before the merchants even come back to it again. So it's, it really does come down to like timeline planning. Um, especially at the store side.
Collaborative coding and team structure
Um, I just wanted to ask, you know, all these apps you create seem really great. And I'm like, honestly, it seems like a lot of work too. Um, a lot of us have more of a research background and we're kind of starting to do a little bit more collaborative coding. Um, but I have found that process to be pretty challenging even with GitHub. So I'm curious if you, if a lot of this work you're doing, are you working with a team of people on programming? And if so, how do you kind of manage the different pieces of putting together, um, these applications?
Yeah, I think that's a really good question. Um, yeah, so the, the first app I ever built at Walmart was actually in the tax group, um, and it was full lone wolf. There was no one else on that project. I talked to somebody else on the business side about the current access database that they were using just so I could port it all over and just strip it from the ground up. Um, so that was, that was full lone wolf. Um, I love building it just cause I like the whole process of every part of data.
Uh, then now we're working more collaboratively. So actually I had, uh, the privilege of, uh, getting my, uh, old classmate hired onto our team directly. Uh, me and him just worked side by side. We both did our masters. We both built our first Shiny app together in our master's program. Um, and so he's a lot more back in focus. Uh, he's a lot better at optimization. Um, I'm a lot better at, I guess like design UX and reactivity and shot inside of Shiny, uh, as well as like a little bit more like on the business side.
And so we just try to divvy up things that we feel like, um, we can split out, like I'll build one Shiny module and then he'll work on the backend. And then by the time he's done with the backend, I can pull the table in, modify just the things that I need instead of Shiny. And then that's the full process. Um, we don't do a lot of collaborative things where like he's going into my module specifically while I'm working on it. Uh, and then you have to deal with the GitHub, like merge and make sure everything actually lines up. Um, so we try to dedicate this page is mine. I'm just going to, I'm going to do that. You can give me insights on this page, but I'm going to just own this one. And then that's always going to be my module. And then we'll document all of the responsibilities of the data flow inside of, um, we use confluence. Uh, and then from there, we just always know who's working and owning what.
Yeah, I will say on the data science side, um, we used to have a merchant data science group specific to merchandising that is now upstream. And they're no longer like a pillar. They're horizontal across all of tech technically. Um, but working with them was also a different thing. Like they did also handle most of the, they handle all of the modeling, some of the data engineering process, but in the end, all I really needed was their stage table. So, um, they give me a, they give me a stage table. I put inside the Shiny app. I collect feedback for their predictions. I send it back to another stage table for them so they can retrain on inside of BigQuery. Um, and then once again, we keep most of our stuff separated. Um, I have some testing scripts to make sure that