Resources

Posit Meetup | Shatrunjai Singh, Aetna | R In Insurance

Operationalizing algorithms using Shiny and Flask Presentation by Shatrunjai Singh Abstract: An algorithm is only as valuable as its adoption. Speed to value, repeatability, and low-cost solutions can dramatically reduce software and services budgets and free up valuable dollars for other activities. Open-source tools such as Shiny (R) and Flask (Python) have enabled the creation and deployment of data science-based web applications convenient and manageable. In the healthcare data science world, we routinely wrap sophisticated statistical code into such web-based point-and-click solutions. In this talk, you will learn about real-life examples of how one can rapidly operationalize intricate algorithms using web app frameworks. Bio: Shatrunjai ‘Jai’ Singh is a Lead Data Scientist at Aetna, a CVS Health Company and specializes in data mining, predictive modeling, and data visualization. His work has received several awards including the American Heart Association and the Epilepsy Foundation. He won the Tableau Chart-Champion in 2016 and was included in the ’40 under 40’ for innovation by LIMRA International. Q&A here: https://community.rstudio.com/t/meetup-recording-operationalizing-algorithms-using-shiny-and-flask/102463

Jun 28, 2021
55 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

very much, Jay, for your time today. I really appreciate it. And I'll turn it over to you.

Thank you, Rachel. Let me quickly share my screen.

All right, perfect. So my name is Jai Singh. I work in CVS, Aetna. Aetna is a big health insurance company here in the US. And we were acquired by CVS, which is a big pharmacy company here, also in the US. I work in data science. I've been in insurance for the last six years. And I've been in data science for more than a decade, based out of Boston. And today I will be talking to you guys about how we have been using Shiny to build up these rapid prototyping frameworks, which help us take data science and operationalize it. And I'll be showing you some of the apps that we have been using in our day to day working.

But before I begin, as always, a quick advertisement, if you are looking for a new job, or if you are interested in opportunities in data science, please do consider us. We are a fortune five company, one of the five biggest companies here in the US, great benefits, work life balance. The other thing is that executives, they have a very strong data hole, they love making decisions based on data. That's why there's a lot of visibility on data science. And we are hiring very, very rapidly all across the US. If you are interested in opportunities anywhere in the US to contact us, we are hiring from all different roles, from analysts, all the way up to senior directors. And if you're interested, you can always Google CVS data science and you look at the opportunities, but a better way is just email me. And I can shortcut you through through most of the process and get you in get your foot in the door pretty quickly with interviews.

Rapid prototyping with web apps

Alright, so rapid prototyping, essentially is taking what whatever analysis you're doing. And if you feel like you will do it more than twice, then you build a web app out of it. And this tends to help you in a lot of different ways. One of the biggest things it helps you with is that now you can share the minimum viable product with a bunch of different people and they can iterate through it. And that will help you get the most efficient and optimized analysis. And it makes it very reputable, you can make a GitHub page. But if you make a Shiny app, it's easier for people who are not R or Python, or just in general, SAS users to play with your analysis and quickly get to the endpoint faster.

So that's why rapid prototyping, you would see that it's it's a very common theme in different industry paradigms. People in statistics will be familiar with Chris TM, which is cross industry standard process for data mining, where you start with the data, you perform data pre processing, you build a model, you evaluate how well the model is working, you deploy it. And then usually you go back to the business and they tell you everything you've done is completely wrong. And you start from scratch. So Chris TM, design thinking, agile, you'll see they're all very circular. And essentially, what they're trying to do is get to the minimum viable product faster, and then test and learn.

And to do this, you can use all these different frameworks. In data science, you can build web applications. And if you look at the history of web applications, it started with the HTML in the 1990s, probably some lonely grad student in a in some computer science department made this up, probably not. And then over the years, you saw that it has evolved. And now we got in close to 2000, we got these web applications made in Java. In 2014, we got our Shiny, which changed a lot of how web apps are made. It was developed by a Cho Chang. And essentially, it removes you from trying to have expertise in HTML, CSS and JS. And you can actually build your applications, which are data science heavy. And you can make them with an interface, which is very, very pretty, a GUI, which looks good and easy to navigate.

Now, over the few years, now you have a bunch of different web app tool frameworks like flask, dash, streamlit. But I still prefer Shiny. And I'll tell you why. Now there are trade offs. So you can consider building your web applications in different frameworks. So you can use flask, which is Python based. And you can sort of mix flask with JavaScript. To make a web application. This is throughout a lot of different companies, especially if you see things on websites, this is they use it with Django. And this is this is the way they use most of the web applications. Dash is pretty popular. Tableau is more for dashboarding.

Then you can compare all of these different frameworks on all of these different measures. The three ones that I find very important are just stack overflow support. So if you are a data scientist, you know how much how important it is for you to have support and be able to find help online. And I find that Shiny tends to have a lot of help online just because it was the first to market. A lot of people have failed doing the same things that you are trying to do. So that tends to help. It is popular with data scientists just because R is one of the two biggest languages for data science. You will speak the same language and somebody who comes after you will be able to maintain the apps that you have created. And finally, it's free of cost, which is pretty important. You don't want to pay $3,000 for a software package, which in the end, if it does, if it doesn't add to the free tools that are out there, then you've essentially cost the company 3000 bucks.

then you've essentially cost the company 3000 bucks.

Now, if you're not familiar with, when do you use web apps, which I highly doubt if you're in this meetup, then you would use web apps when you want to revisit the same code for different projects. So if you're trying to do something similar over the different projects, you can make a web app out of it. If you want to introduce a new technique, I've seen this work pretty well. One of the apps that I'll show you today is the comorbidity analysis, which is a technique that I had published a paper in academia, but wasn't used in industry too often. And making a web app tends to help with that. It leads to faster adoption of whatever you're trying to sell.

And finally, the most critical one is if you're trying, if there is an analysis that you do that has multiple steps, and that analysis is pretty common. So multiple groups within your organization are using a similar framework, and then getting different answers or different variations of the same answer, and then presenting it to the business, this leads to a lot of confusion. So if you standardize your whole data science process, and you build a web app that everybody can use, then that tends to shorten the time to analysis. And it also helps in standardization overall. And usually, web apps are more useful when you need slightly more than just dashboarding. If you just need dashboarding, then the best things are Power BI and Tableau.

Build vs. buy

So what do you do when you're trying to make a decision on whether you should build a web application in house? Or should you outsource it to somebody else? Now, there are seven. Now there are seven different things you can consider. Cost. So if you're trying to buy it, that is you're trying to get somebody else to build it for you from outside the org, it will be more costly. The customizations, you might be able to get some customizations, but more additional customizations will cost you even more. And then the biggest one is knowledge gain, you will not have the knowledge gain somebody else will. It'll save you on man hours, build time and support. If you build it in house, which Shiny allows you to do build these rapid prototyping frameworks in house very quickly, then it will be cheap, essentially be free will only be the man hours that the data scientist requires. It'll be very customized. And you will have knowledge gain, it will take some time for your department to build these apps. And you are the support if you build the app.

The clustering and profiling app

All right. Having said that today, I will show you some of the apps that I've built for CVS. And how have they how have they been used? Some of the different ones are the first one that I'll show you is called the clustering and profiling app. It's a web application that does market segmentation, which is pretty standard, you're trying to find groups of people that are similar to each other in in a diverse population. I'll show you a comorbidity analysis app, which is useful for finding comorbidities. Now, in health insurance or in health care, comorbidities are when you have one more than one health condition. So suppose you have diabetes and hypertension, then you have two conditions. So it's a comorbidity. I'll show you a propensity score matching app. So this comes from causal inference. So if you do not have a randomized control trial, so it's not randomized. So you have to do quasi experimental methods, like matching.

Essentially, what you're trying to do is create a synthetic control group. And to find these controls who are similar to your test group, you have to match them on different criteria, the different features. And to do that, you can use this app called propensity score matching app, this is more causal inference. I'll also quickly show you the equal tool that I built for a competition within my company, it's called the competition was called ABC. And they judged you on how innovative your app was, how much money will it save the company and how good looking you are. So obviously, I was in the top five. But what equal does is equal stands for eliminate quantitative unfairness in algorithms. Now you might have heard of how predictive models might be propagating bias in algorithms. So because algorithms or models are built on real life data. And if there is bias in in the data set, there are historical biases in the data set, a model will not only propagate them, it will accentuate them.

So the first app, which is the clustering and profiling app, the big problem was that multiple teams were performing customer segmentation. And they were all using slightly different methodologies, but slightly different variations. And they were getting similar results, but not quite. And then they were presenting it to different business units, which was causing confusion, confusion. And this comes from the fact that people have been trained in market segmentation, but everybody uses a slightly different route. They present their results slightly differently. People are either R users, Python users, or SAS users, and they use different variations, they use different little things like imputations, or capping, they're they're slightly different. And the end result tends to be different. This confuses everybody.

Now, the clustering and profiling app helps you get to clustering insights faster. It's a web application developed in, in R Shiny. At CVS Health, we have an RStudio environment, where you can publish these apps. We also have support for Python and SAS as well. And the way this app works is it uses four steps. It performs exploratory data analysis, it does the actual clustering, it also gives you insight generation. So it'll look at your results, and it'll tell you what are the groups, how do they look like? It's again, again, it's agile framework, you can rapidly iterate through multiple things, you can change whatever you like.

Alright, so this is how the app looks like. You can come in, you can take a quick tour, which will introduce you to the different parts of the app. Essentially, what you do is you can import your data, you can explore it. So you can explore your data, get summaries, missing values, correlations, you can find clusters in your data, you can use different methods for clustering, we use, we prefer create prototypes for mixed data. You can then profile the clusters you found. So for example, you found these three different clusters in your data set, then you can profile them, you can, you can quickly find that these clusters differ in age and their risk.

And then you can also get more detailed insights, you can find out what are the key variables that triage this risk. So the data consists of people who are either below 50 or above 50. If they are below 50, they go in the first cluster, if they're not, they can be further bifurcated by risk. And then they can go into cluster two and three. So you can use decision trees to form these clusters.

To enable, so the one thing that I've learned by making these apps and launching it to a small audience is, the more explanations you can give, the more intuitive it is, the better. And that's why using JavaScript in here for all of these different helper little boxes, that tends to be very useful. And just giving them directions up top, it all, it's very useful for people who are not very familiar with your app. Also uploading a toy data set is highly recommended. If you've not, if you just directly put in an upload your own data, then people get confused. So I put in like three or four different toy data sets that you can use from different sources.

I'll show you clinical data since I work for Aetna. So let's look at this data set, which is just it's fake. It is a data set of members with different diseases. And different characteristics. And you can pick the different variables that you want to include in your analysis. Up here, you can quickly explore the data and get summaries. You can find what percentage of zeros exist within your data set, you can find what how many different what is the different types of factors within each variable, and what kind of variable is it. And it'll give you like missing data summaries as well. Now, this is pretty powerful, you can quickly get summaries, you can download all of this in either of the formats that you want. So you if you want a PDF, you can quickly download this as a PDF.

It'll show you correlations very quickly, you can find what are the variables that are very correlated with each other. So if you spend days in an inpatient facility, you will obviously have very high medical costs. So that's, that's why you have high correlations. And you can view your raw data, whatever you have uploaded. Also, you might want to remove variables which are very highly correlated. So then you can remove those variables, right here going forward, they'll be excluded from the rest of the analysis.

Then you can go into the clustering analysis where you say decide the number of clusters, a very common data science clustering 101 technique using the elbow method. And then you can choose whatever, whatever method you want for your clustering algorithm. And just quickly click to cluster. And for example, if you use three clusters, then it gives you this result. And you can also change clusters according to business needs. So if the business says three is too little, we need something between four to six. And you and if you've done this before, you know that market segmentation is not a pure data science exercise. It's a mix, it's a mix between what the business wants and what you can give them. So you can very quickly iterate on what you want.

So now you have your four clusters, you've done everything. And now you want to understand your clusters. This is where the decision tree, the CART algorithm comes in. And what we have done here is we've we're trying to build a decision tree trying to predict what are the different clusters and how do you get to them. So the different clusters are shown in different colors. So you have cluster number one, two, cluster number three, and cluster number four here. And if you have an inpatient score of greater than 35, age less than 59, then you will fall in cluster number four. So now you've got the rules as well. So now you can identify, there are four clusters in my data. And these are the rules that allow me to get to a population which will be cluster number four.

So this tends to be very powerful. Finally, you're done with your analysis, what it does is now it download an Excel file with all the results and a formatted PowerPoint. So not only you've completed your analysis, it'll give you a formatted Aetna PowerPoint with Aetna, custom fonts, charts, everything. So now you can just take this analysis and show it to your business unit, and they would know what you are talking about. And this tends to be very powerful, you've completed a segmentation exercise in minutes, which usually takes our business units, which were working alone with either their own code and iterating through different steps, weeks to do.

So we've used this for different projects within the org. Some of the examples are here. The one project that I will talk about is this meditation app that we had done a project with. So this meditation app was sent out to a bunch of different people. And apart, some of them got engaged, that is they did download the app and they signed up. So we wanted to see what are the different segments that exist within the people who've downloaded this meditation app. And we found using the app, we found that there are four different segments. And using the insights part of the app, we found that three of those segments, so you can use four variables to understand how these segments are made.

The first is gender, the first three of the segments are female, they are 8800 100% female, the last one is male, then it's risk. So in health insurance, you define risk as how sick a member is how risky he is in the next one year. So within the females, now you can bifurcate on risk. So one of them is high risk to our low risk within the female low risk segment. Now you can bifurcate on income, one of them is higher income, and one of them is lower income. So now you understand that the people who are downloading this app are falling in these four buckets. And in the next iteration of the app, you can send out emails which are more customized, you can target females who are older and lower income, compared to males who are low risk and low income, and you can send out these creatives accordingly. And hopefully you'll get a more higher engagement rate.

So this is how this app quickly led to us getting a more engaged population, more people meditating in general, and you do see impact.

Now the app was a hit. And the way we can call it a hit is we can measure it on three things, we can measure it on what users tell us. So we can use things like how many people like this on our internal social media page, what was the customer satisfaction score, so forth on us. We can also find out things that the customer do not tell us, but we know. So we can we put little trackers within the app, which can help us count the number of people who view the app, the number of people who uploaded their data set, the number of people who ran their analysis, and the number of people who downloaded the end result. And we can use that as a metric to say, hey, this app is actually making a lot of impact because 81 people have downloaded their entire project results, and I've used them. And then there are the unknown unknowns, which is people might have used this app and got inspiration to do other projects and quantifying that impact is a little harder.

Comorbidity analysis app

So the next app I want to show you is a comorbidity or a multimorbidity analysis app. Now, as I said, a comorbidity is two diseases. So if you have two diseases, it's comorbidities that you have. If you have more than two diseases, it's multimorbidities. And the figure on the left, it shows you that if you have a heart condition, or if you have cancer, then your life expectancy tends to decrease by the number of additional health conditions you have. So if you only had a heart disease, your life expectancy after you first got heart disease was say 20 years. Every additional disease, if you got diabetes along with it, then it'll start decreasing your life expectancy. And on average, your life expectancy decreases by two years for every additional condition you have.

Now, the analysis I'll show you uses something called market basket analysis, or association rules. People in retail are must be familiar with this analysis. Essentially, what it is, is if you had these, if you've, if you're sitting at Walmart, at the cashier's register, and you see people make these different transactions. So in the first transaction, they bought an apple, a beer, cocaine or sugar, I'm not sure what and then meat. And these are the different transactions that they had, then you can calculate three essential metrics. The first is support. So four out of the eight transaction contained an apple, so the support is 0.5. If then you can also define confidence, that is the people who bought an apple and a beer among the total number of people who bought an apple. So here, you would just try to find how many people who bought an apple also bought a beer. So that is a confidence. And finally lift is just how much more likely are you to buy a beer if you bought an apple. And if you have a lift greater than one, that that means there is a positive association.

You might have heard of stories where they've used association rules like diapers are now kept with beer. And it's because association rules show that people on I think Tuesday or Wednesday night, they tend to go out to Walmart, and they look they will buy beer with diapers more likely these are parents who are just trying to get a diaper and they'll be like, okay, I need a beer too. And so these are used either they put the things together, or they might put them on opposite ends of the store. So if you go to Walmart, if they found out that you need Apple, if you need beer and diapers, they'll put beer at the front of the store and diapers at the very back. So then you will walk through the whole store and buy more things. So that's how they use association rules in retail.

Now you can employ this the same logic in healthcare as well. You can define support as the number of people who have disease one and disease two overall. So if you have 100 patients, how many of them have hypertension and diabetes? Confidence is how many people who have the two diseases divided by the number of people who have the disease of your interest. So of the people who are diabetic, how many of them also have hypertension? And the lift is just then how much more likely are you to have hypertension if you already had diabetes, if your lift is more than one, then you're more likely if it is less than one, you're less likely.

So, again, in the app, what we did is we put in our generic exploratory data analysis part where you look at missing data, you do correlations, things like that. But specifically the use cases that you can use are you can answer questions like what are the most common multi morbidities in the population? What are the most expensive multi multi morbidities in the population and so forth on. So let me quickly show you the app.

Alright, so here's the app. Again, it's a generic again, it's you can you can use this app for all of these different parts of multi of comorbidity analysis. You can also find literature here on PubMed. And if you have any questions, you can send them to me. I did upload a toy data set again, and you can select the this toy data set is just all the it has like 1000s of members, along with all the different diseases that they have. So if you have say HIV, AIDS, then you'll your row will say one, otherwise it will be zero. So it's just binary data for all the different members.

And then you can very quickly find things like what are the most common comorbidities in my data. So you just sort by support. And you would see hypertension with hyperlipidemia. So if you have more cholesterol and your blood pressure is high, that tends to be the most common. In fact, 58% of the members we see this combination. So this is a combination of two what if you're interested in a combination of three or more diseases, what are the most common three or more conditions. And again, you can very quickly sort on this and you see that diabetes with hyperlipidemia and hypertension is the most common three diseases and so forth.

Another question you might want to use it for is in our most expensive members and by most expensive we define them as the top quartile of all the medical costs in the next one year. So if you are in the top 25% of cost in the next one year, what conditions would you have? And suppose we want to see conditions from two all the way up to five, then very quickly you can find if you have ischemic heart disease with hyperlipidemia and hypertension, you will be very costly for us in the in the coming one year.

So you'll use confidence. And the most costly conditions would be heart failure, renal failure, in general. So renal failure tends to be you have to go on dialysis. So it's more expensive. And if you had had heart failure, more more likely you're more likely to have certain heart procedures which are very expensive. And that's why it's coming on top.

I do want to show you one cool thing. So I do have some documentation. But the cool thing that I added in here is a chat bot. So often people have simple questions that you have to answer again and again, a way around it is to use a chat bot function. I use a Dialogflow API. So Dialogflow is now acquired by Google. And they use natural language processing. And you can specify certain web pages that they can scrape for information. And then they look at keywords and give you the replies from those web pages. You can also input your own answers that you have. So you can even chat with it actually.

So you can ask simple questions that you had, for example, what's the maximum size of files that I can upload? What what do I do if I get this error, things like that, you can make a chat bot, it's easier than having a document that people have to scroll, scroll through. And then as I said, you can add admin tracking in here, I can track what people have been using it and how long have they used it for how many hours they have done.

Propensity score matching app

Alright, so propensity score matching app, as I said, propensity score matching is a is a causal inference technique. So if you do not have a randomized control trial, and you only have a treatment group and you want to estimate the average treatment effect, then you would have to find people who are who can be a synthetic control. And then you have to create for every member in the treatment group, you have to find a matching member who's very similar. And you can use him as a control.

And you can use this analysis to do different things, you can find baseline characteristics. I apologize if this sounds a little foreign to you. But this is more for experimental analysis for it's very useful in when you're trying to run quasi experiments. So the usual process of running these experiments is first you have to find baseline characteristics. That is, at the baseline, how much are my treatment group differ different from the control group, then you match them, you find people who are similar in the treatment, and you find matching controls. And after you found them, you try to quantify have I been able to remove all the differences that were there in before matching.

Yep, and this is an app that created for propensity score matching, you can perform exploratory analysis, you can find covariate distribution, perform the propensity score matching, you can iterate the analysis and download formatted results, again, upload your own data or use toy data set, and find all of these, you can find the missing data, and so forth on. And then you can finally do the propensity score matching, which which is an allows the whole analysis and download your final results. Again, I also have a propensity score matching chatbot, very similar to the one that I showed you before this.

And in the last five minutes, I do want to talk about one other tool, which is equal, which

sorry to interrupt, but just because you were showing that chatbot right now, I know there were a few questions on that. And one of the questions was, can we have real time customer service through this tool? Or is it just based on NLP?

Sure. So the first part of the question, whether we can have a live customer rep who can use this chatbot. So I have not done it, just because I was trying to build this app. And I was hoping to automate all of the questions that people have. I have not. But in Dialogflow from Google, you can also add that that third party integration, where if a question that the customer has asked does not match with any of the pre listed things that then that the chatbot can answer, then it can ping and ping you on a certain address and let you know that hey, transfer this person to a live rep. So you can do that in Dialogflow and a lot of companies here, they actually use that functionality. And the second part was can I share the code? So for the apps, I cannot share the code just because it belongs to the company now. But I have a lot of different other Shiny apps, which use very similar methodology. You can find it if you look on if you go on my GitHub, Shatrunjay on GitHub, you will find all my apps, you'll find the different Tableau dashboards that I built some of the Flask apps that I built as well. Feel free to message me for anything that you're looking for.

The EQUAL bias detection app

Alright, so equal, which is which stands for eliminate quantitative bias and algorithms. This is an app that I built for an internal competition. And what it does is it tries to remove bias from predictive models. At CVS, it's it's now our number one priority. Make sure none of the models have any sort of bias in them and all our models go through this check and we make sure that all the models are fair, equal to everything. And to do this, what we do, what the steps that we take is first, when you come in, you answer these three questions, which help us find out what are the metrics that you should be looking for fairness. So fairness, you if you've heard of fairness metrics, there are things like this part, it impact parity and so forth on. And answering these three simple questions will help you identify what metrics are important for me.

This will pull all the data on on the members you're interested in. So there are 11 protected attributes here in the US. You cannot be discriminated on age, gender, national origin, religion, handicaps, handicap status, status, pregnancy, medical and behavioral health conditions. So it'll pull all of this data and then it'll check all the different models we can build. Predictive models can be regression or classification. It will check all the different steps pre processing during the model build and post processing for bias and then it will save your results at a specific place. It will give you an HTML or a Word document. And again, equal is built as a Shiny app with JavaScript, so it is zero coding language agnostic. It can work with both R or Python models that have been uploaded.

Alright, while this loads, maybe I can just switch to the last slide, which is again. Thank you and I'll take any questions you have. And again, if you are interested in working in data science for us to email me at my email, which is sing s 22 at etna.com. And if you have any other questions on code or any suggestions for the apps that you're trying to build or dashboards you're trying to build, or if you want to collaborate on something, please email me on my personal email, which is just my first name.

Alright, so this is how equal looks. You can come in. You can take again a very quick tour on the different parts in in in the app and what what do they do? You can explore your data. You can evaluate the bias in your data set. You can find out is my input data biased or my models biased or not. And then you can also eliminate this bias. It uses techniques in pre processing, post processing and model build things like adversarial debiasing to remove bias from your data set. Follow some of the same steps that I've shown previously where you can explore your data. You can customize your metrics. You can find different different bias metrics. You can mitigate them and you can download all your results as well.

Q&A

So I'll stop there and and take some of the questions that we have. Awesome, thank you so much day that that was great and so cool to see all the applications too. I know people are right now is still putting in some questions in Slido and if anything goes unanswered in the time we have today, I can definitely send those over to Jade and get the questions answered too. But the top question so far that had been upvoted a few times was what kind of process do you use to What kind of process do you use for updating the application? Do you have any sort of defined deployment process?

So the deployment process that we use is through RStudio Connect. We have a broad version, the production version and a dev version. The process that we use is first we build an MVP, a very simple version of the app which just does the basic end to end analysis and then we upload it to the development server. We give it to a small number of people and we perform test and learn so we would ask them to use it in a way that they would traditionally use it and try to and give us hints on where they got stuck and what can we improve. And this usually takes one month and then we get all the feedback from them. We try to put in the most critical elements into the dev two in the better two and then we launch it to a slightly different bigger population. And if there are no big hiccups, then we move this from the dev version to the broad version where we launch it throughout the company. We recalibrate all our analysis every one and a half months early on and after the first six months we use a six month iterative calibration cycle. So every six months we would revisit it, see how the data changed, how the analysis, sometimes the packages are changed, is the app working as it was intended to work in the original version or not.

Another one of the questions that was upvoted is how do you actually quickly set up that tour of the Shiny app? I think it was in the first Shiny app that you showed. So that's through a package called ShinyBS. You can use tooltips and these introduction videos. There's another package, I think it's called IntroJS. So IntroJS was also, it's a JavaScript library and now a lot of these JavaScript libraries you will find are packages that you can, you don't have to write JavaScript code, they will add it to your UI. And so you can Google things like ShinyBS and IntroJS, our package, and you will find the packages that you can use to set up the quick tour functionality.

Another question, Jay, was how do you position Shiny over Tableau? Like what would you use Tableau for and when do you decide that it would have to be in Shiny?

So actually, that's a great question. And just a little background. So I was a big Tableau user. I actually won the Tableau Massachusetts competition three years ago. So I was a big Tableau user, but Tableau has limitations. The one limitation is that you cannot do advanced analytics in Tableau. You can do visualizations, you can slice and dice data. The most advanced data science you can do in Tableau right now is you can do regression. So you can fit a simple line to your data set. And that is the limit of what you can do. But if you want to use some of the algorithms, for example, if you want to use the segmentation or the comorbidity association rules that I've used here, then Tableau cannot do that. It can make your data look pretty, but it cannot do that analysis.

And to do that analysis, you will have to use either R or Python Flask. Within R and Python, I feel like most of the research community, people like me who were in research, we tend to publish all of our research as R packages. So you would usually see things coming out of research going into R first. And if they're very useful for industry, there'll be a Python package that is now copying that R package. So with Shiny or Flask, you can use the latest and greatest data science advanced analytics, along with visualization, the visualization might not look as neat as a Tableau, just because it's been designed by like professionals, but it will still be more advanced, you can do these analysis, which are more complicated in R.

So with Shiny or Flask, you can use the latest and greatest data science advanced analytics, along with visualization, the visualization might not look as neat as a Tableau, just because it's been designed by like professionals, but it will still be more advanced, you can do these analysis, which are more complicated in R.

Another question, I know, we have just a few more minutes here. But should age asked, Hey, Jay, have you encountered scenarios where you want to run clustering on big data sets, upwards of 10 million rows from within Shiny?

So, so the one thing I would say is that most of the Shiny apps that I've shown you, it tends to run on smaller data sets. For bigger data sets, Shiny for for the current for the knowledge that the limited knowledge that I have in the use that I have had, the scalability tends to be an issue later on. Now, recently, there have been new packages that you can use to