End-to-End Data Science Workflow with the Posit Team Snowflake Native App

Transcript#

This transcript was generated automatically and may contain errors.

The goal of data science is to turn data into value fast. But in traditional enterprise data science, we see a friction between the agility that the data scientists need and the complexity of the enterprise, which leads to a start, a stop, and a wait pattern, which kills momentum. This could be waiting for procurement, security reviews, or waiting for infrastructure, or waiting for deployment, and perhaps always being stuck on outdated tools.

The Posit Team Native App is designed to solve this. Firstly, it provides instant setup, so you get running in minutes, not weeks. Second, it removes the ops headaches. It's a fully managed service, so data scientists get the latest and greatest features with automatic upgrades, and platform teams don't have to worry about managing complex infrastructure or software. It is secure by default by running inside the Snowflake security perimeter, and it inherits and extends Snowflake's data governance. Finally, it delivers improved productivity with AI tools and a seamless development-to-deployment workflow.

And you can see these are not simple select star from and downloading all of the data into your session memory, but they're involved SQL queries that it is running so that the computation can be pushed to the scalable Snowflake warehouses versus doing that within memory.

So we see here that DataBot found some insights, found some missing employment data, and some interesting patterns. And this is interesting. It seems like there is a inverse U-shaped curve in terms of the relation between employment length and default rate, and it provides us some business implications there. And so we can go ahead and go dive deeper into any one of these.

But another thing I could do here is say, hey, all right, I want to re-branch my analysis. This is a hypothesis that I'm not so interested in exploring further. So I just go back and ask it to, for instance, in this case, analyze geographical patterns in loan performance. And it starts doing this analysis as a different branch. And I can switch between different branches and go to one which is more relevant. So this is almost like a get branch for analysis, see which one's most promising and go for that one. So let's go back to the geographical pattern analysis that's happening.

So DataBot has gone and done some analysis around the regional patterns, found some economic correlations, and some key insights around clustering and so on. And this is, out of the two branches of analysis, I feel that the geographical pattern analysis is something which is more promising. And so at this stage, I'm like, OK, I want to make sure that this is reproducible. Maybe I share it with my peer or just share it with myself for posterity.

So that's where I can use something like SlashReport, which is going to create a Quarto file which is reproducible where I have all of my code. And before it does that, it actually provides a report outline because this is the actual output. So I want to make sure that the plan is something which makes sense to me. And I'm going to just go ahead and say, yes, this outline looks fine. But you can definitely edit it if there are certain parts of the analysis that you wanted to highlight. And you see here, the Quarto document has started out. There is a call out that this has been created by AI, and there is a place for human review along with the name, the role, and so on. And this will now create a clean file which is fully reproducible.