Resources

The Power of Snowflake and Posit Workbench (Jonathan Regenstein, Snowflake) | posit::conf(2025)

The Power of Snowflake and Posit Workbench: Macroeconomic Data Exploration in the Cloud Speaker(s): Jonathan Regenstein Abstract: In this talk, we will utilize the Posit Workbench Native App to demonstrate how macroeconomic research can be run in the Snowflake cloud, powered by R & RStudio. Starting with data sourced from the Snowflake marketplace, we will import, transform, visualize, and, finally, model data using the Orbital framework to push tidymodels down to the cloud. This is full-stack, R-driven macroeconomic research in the cloud. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

This talk is really themed around data science and macroeconomic data, really though to talk about the partnership between Snowflake and Posit, the power of the new native integration between Posit and Snowflake. So hopefully we're going to solve a lot of those problems that Mark was just talking about.

We're going to look at a quick demo on using external data, internal data, some modeling in Positron natively in Snowflake, but this is really a very general example of what we want to do.

Commercial framework for data science in financial services

Before I kind of kick off into what's going on with Snowflake and Posit, I start every talk I give, whether it's a conference talk or a whiteboarding session, just the commercial framework around data science that I'm seeing today across the financial services industry. I get to talk to a lot of different asset managers, banks, insurance companies, payments companies, kind of the whole spectrum of financial services, and this is how we start every single talk.

It's very relevant to the partnership between Snowflake and Posit, and I'll explain why. We always start on the right-hand side there, the commercial outcome that we're driving towards. I think Mark articulated it really well, like why are we doing this? Why are we taking on this big data science project? Why are we doing machine learning? Do we need artificial intelligence? Then we kind of work backwards to the data state that we need, right? Is it all structured data? Is it unstructured data? Is it both? And then what are kind of the tools that we want to bring to bear to kind of accomplish that?

What Snowflake is and why it matters

So I'm going to talk about the partnership, of course, between Posit and Snowflake, but even though Mark just mentioned Snowflake quite a few times, which I'm appreciative of, I just wanted to mention what Snowflake is, why it's so powerful. I'll talk about why it's become so prominent in the world of financial services, but this is kind of a complicated snapshot of what Snowflake is.

You can really think of it as a cloud platform with really meaningful separation of storage and compute. So a lot of other platforms now kind of say they have separation of storage and compute. If you run your jobs in Snowflake, though, you'll see that there's really no competition between resources for those two things. So that's why we're the most kind of performant data cloud that's out there right now. And then we've added on to that foundation the ability to ingest all sorts of different types of data, structured data, unstructured data.

We have a lot of tooling to kind of make visualizations better, make data engineering better. But fundamentally what Snowflake is, is it's that platform, that separation of storage and compute. And that becomes really, really important when you want kind of a single source of truth, right? You want your data all living in one place, you want your compute all living in one place, and then with Posit now you can have all your machine learning tools living in that same place. So one kind of governed architecture for this entire workflow.

Data gravity and the Snowflake marketplace

So I want to talk a little bit about why the partnership between Snowflake and Posit is super, super powerful. This is a big reason right here. So Snowflake, like what do we invest in as a company? We invest in the ability to bring more data into Snowflake. So this is all the kind of different data sources you can stream natively into Snowflake right now. A lot of this goes through something we call Snowflake OpenFlow, but this is where we're spending a lot of our native investment dollars right now.

Not pictured here is a recent acquisition we made of something called Crunchy Data, a Postgres database. So we're really, really focused on bringing data gravity into Snowflake. So this, you can kind of think of this if you're working at like a big bank, a big insurance company, this is how your internal data flows into Snowflake. So this can be like interactions with customers, internal research reports that you've written. If it's an insurance company, you know, insurance companies have people sending them all these images of like car accidents and house damage. All this data that you have to like ingest into your own environment.

So Snowflake's doing a ton of investment to make it easy for that data to flow from whatever its source is coming from into Snowflake in like a governed, secure way. On the other side of that, and this is really where things have really exploded for Snowflake, is we've built a data gravity just around the financial services ecosystem. And a lot of this is driven by our marketplace. And that will become very relevant to what we're doing with Posit as well.

These are all external sources of data. So these are sources of data that insurance companies, asset managers, banks, they don't produce these themselves, they buy these from other sources, and they consume them via our marketplace. So we kind of have data flowing in from internal data sources through OpenFlow, and then all this external data that's arriving from providers like Bloomberg and Faxhead and Refinitiv, all the kind of big market providers. There's all these little nuanced providers across like the insurance ecosystem, the banking ecosystem as well. Long story short, this is how the kind of center of data gravity is all moving into Snowflake right now.

Posit Workbench as a native Snowflake app

Also on that marketplace though is the ability to spin up applications. So it's not just a data marketplace, that's how it started many, many years ago. It's developed into much more than that though. So now it's really an applications marketplace as well. And one of those new applications that you can access on our marketplace is Posit Workbench. And I'm going to show you all how it works in real time through my Snow site login in just a second.

But you can kind of think of this as like we have an app store that used to be very, very data centric. So a lot of data providers moved on to our app store, you know, LSAG and Bloomberg and S&P, they were providing data. Then we expanded this to include just data applications. One of those applications is now Posit, so you can log into Snowflake, you can go to our marketplace and say I want to pull down data from Bloomberg, data from FactSet, and I want to spin up Posit Workbench as well, and now I can be working with that data using Posit Workbench all in one governed environment.

Security, governance, and how it works under the hood

One big question I get is, you know, why is this important? Why can't I just do this on my desktop if I can just get a really, really beefy desktop? Isn't that adequate for what I want to be doing? I guess if it's beefy enough, yes, like if you have enough RAM, you have enough compute, you could be, but that kind of second and third line there, robust security and governance, from the perspective of a large financial institution, which is really kind of the lens that I'm coming from, the security and governance can be like the most important thing that you're doing.

So, I mean, I talked about commercial outcomes and how that's super, super important to all the work that our data scientists do. So like when a project is getting spun up, the first call is to the line of business, why are we doing this? What's the commercial value? Second call is always to InfoSec. Do we have access to this data? Can I get this data? Can I work with this data? There's a lot of data vendors out there who have very specific clauses that say you cannot do X, Y, and Z to this data, right? All that governance and security is handled in Snowflake.

So when you kind of spin up the Posit native app, it inherits all of that governance and all of that security, right, because you can have the smartest data scientists in the world who have tons and tons of skills, they need access to the data in order to do their work. So security and governance, super, super important, and the cool thing is that you inherit all of this from your Snowflake policies, right, so that's really, really important.

So when you kind of spin up the Posit native app, it inherits all of that governance and all of that security, right, because you can have the smartest data scientists in the world who have tons and tons of skills, they need access to the data in order to do their work.

How is all of this actually working under the hood? I'm going to show you guys how it kind of looks to the end user, but I do get this question quite frequently. How is all of this actually working under the hood? So you can see it says Posit and Snowflake full SPCS. SPCS is really our container services, Snowpark container services. So you can kind of think of this is Posit Workbench, and in private preview, so coming very soon, is also Posit Connect, just spinning in a container inside Snowflake, right?

So that's what allows us to kind of push down all of the compute that you're doing with that Docker container, right? So you've spun up a container inside Snowflake. It's sitting next to your data, so you get a lot of efficiencies with how you're passing data back and forth. I'll show you guys another cool efficiency that you get with some of the modeling you can do, but you get that efficiency. That's also why your security perimeter is never breached, it's never broken. If you have a security and governance framework set up, you won't get dinged by InfoSec, you won't get shut down because you're inheriting everything from your Snowflake infrastructure.

So you've built your data gravity, you've kind of got your data scientists ready to go, now you're giving them access to a tool that sits right next to their data. So it's going to be efficient, it's going to be secure, it's going to be well governed. You need all of that, especially if you're at like a big bank, if you want to get any value from your data.

Demo: RStudio and Positron inside Snowflake

Another question I get is what about my packages, what about my R packages? So you can also, I believe coming soon, we're already there, you can use Posit Package Manager. It's going to look and feel very similar to how you're doing things on your desktop if you're used to that. It's just when you spin this up, it's going to be running in the Snowflake cloud. So I'll show you guys what I mean by that right now.

So I'm going to take you ahead here to the demo portion of this. So this is kind of what my research account looks like at Snowflake. So this is my data that I have access to, you see I have a database here called Posit Workbench. I have a database called JR underscore DB. If I click on this little catalog button here, it has my ability to explore my database, ability to explore the internal marketplace, which I'll mention at the end of this talk, and the ability to explore my apps. So Posit Workbench is one of the apps that I have in here.

So once I kind of pull this app off the marketplace, it's now living natively inside of my Snowflake account. So I've already done all the login, so you guys don't have to sit here and watch me kind of type in my passwords. I'm going to stick to R for what I'm going to show you all here, but you could just as easily spin up JupyterLab through Posit Workbench. Anything you can do on Posit Workbench, you can kind of do natively in Snowflake now.

This is what RStudio Server Pro looks like instead of Snowflake. Just to kind of highlight what's going on here, I'm not going to do any work in this, I just want to show you all what's going on here. You can see that this looks and feels kind of like your desktop instantiation of this or what you might be used to spinning on an EC2 instance, but instead it's running on Snowflake Compute. So any work that we run here, any R code that we run, it's just going to get pushed down into Snowflake and it's going to kind of have easy access to your data back and forth.

So this is the RStudio version of this, but I wanted to show you guys what the Positron version looks like, because I found some really cool things around Positron. So first of all, Positron is the new IDE, I think there's been a lot of talks about it here, so I won't belabor what's going on here. This is what Positron looks like inside of Snowflake. So same thing, I'm kind of doing everything inside of my Snowflake computing environment.

Since Positron is built off of VS Code, you also get to take advantage of all the VS Code tooling that we've built at Snowflake. So it's kind of like this little bonus. This thing on the side that I'm toggling over here, this is our Snowflake VS Code extension. So you kind of get that for free with Positron, because we designed it to work with VS Code. Now with Positron, you get that for free. That lets us kind of peek in here to what's going on inside my Snowflake account. This is some of the data that I've kind of pulled up here, this loan data that's going to be the example here, I can make that a little bigger for you. So you can kind of peek into your Snowflake, again, you can take advantage of anything we build for VS Code, you get access to that now inside of Positron.

Macroeconomic data exploration and modeling with tidymodels

And from here, I'm not going to go too much through this, this is just an example of kind of working with your data, all running in R. I'm importing some macroeconomic data that I pulled off of our marketplace, so that marketplace where all that financial data comes in. We have tons of free data on there, especially free macroeconomic data, so if you want to replicate this, you can kind of pull down that free macroeconomic data.

Once I've pulled it into Positron, frankly, hopefully this is going to look really familiar to you all. I can just work as I normally would in R. So I'm going to do some really light wrangling on this. I get to take advantage, of course, of all of R's visualization, so I get to use ggplot for all my plots. Almost everyone I know still uses ggplot for all their visualizations, so we've always wanted to be able to run this natively in Snowflake.

So that's kind of some light wrangling. From here, we can add some scale, some lag, some rolling features, and then what I'm going to do is I'm going to join this up with other data that I have in Snowflake, this loan data here. So this is from a Kaggle competition on, I guess, some credit data, you could say. I'm going to kind of marry that up with some of that macro data, do some light wrangling to it, do some visualizations, but what's really, really neat here, and there's a whole blog post on this that came out long ago, is now I'm going to kind of pass that data that I've joined up to recipes via tidymodels.

So I'm going to use recipes for some feature engineering, and then we're going to fit a model to this, a linear regression model. So nothing too fancy there. The really, really cool thing is that once we do all of this, we can use an R package called orbital to turn this into a Snowflake object, and that's going to let us take that model that we just fit and basically store it up in Snowflake and then take advantage of the Snowflake compute engine if we want to do inference or prediction on that model. So super, super powerful.

Again, you're doing all your work in R, everything in RStudio or Positron as you're used to. You can use the entire tidymodels framework, including recipes, and you can kind of push it back to Snowflake for production work.

The really, really cool thing is that once we do all of this, we can use an R package called orbital to turn this into a Snowflake object, and that's going to let us take that model that we just fit and basically store it up in Snowflake and then take advantage of the Snowflake compute engine if we want to do inference or prediction on that model.

Getting R work into production

So I'll kind of leave you guys with this, and I'll put this into a GitHub repo, and you can kind of take a look at how this all works. This is very much for the data science persona who's doing their work, doing their analytical work. What I want to talk about, though, is there's a very important next step after this, and it's something that, I mean, I've been working in the R world for a really long time. We've heard for a long time, R in production, does it work, does it not work, should we just refactor everything into Python?

What this partnership does is it makes it a very smooth glide path for getting your work in R into, quote-unquote, production. I think production means a lot of different things. I bucket being something in production as is it affecting the decision-making of your company in some way? So is it driving outcomes, is it driving decisions, right? There's a few different ways that this work can, from that definition, get into production of what you're doing.

So one we looked at earlier, which is Snowflake has this concept of the internal marketplace. So I talked to you all about the external marketplace where data providers can come share data, places like Bloomberg and S&P Global can share data across what they're doing. We have that exact same functionality inside Snowflake, but internally. So if you want to share any data products that you've built internally with other teams, you can do that via our marketplace. This is massively popular and important inside of asset management right now, because there are so many data teams doing so much work, and they literally ship them CSV files and Excel files, and it's an absolute nightmare. If you use the internal marketplace on Snowflake, all the work that you've done in Posit, you can share back to other teams.

Second way this enables your work getting into production is we have in Snowflake a native feature store. So that's what I'm kind of highlighting here. So if you've just done feature engineering, and you want to kind of take that orbital package which translates what you've done into SQL code, you can take advantage of our feature store. So that's like big thing number two, because I'm sure as you all know, with machine learning, feature engineering is really the most important step in machine learning. So via Snowflake, all that work you've done in R, all the cool work you've done with recipes and anything, any wrangling you may have done, if you want to turn that into production level features, you can use our native feature store.

The third way is that in private preview right now, but coming very, very soon, is an integration with Posit Connect. So if you want to use all that work you've done to drive dashboards that are going to be kind of seamlessly updated as data gets updated inside Snowflake, that's all going to be possible very soon. I think if you can get access to the private preview feature, it's already possible. But that'll be obviously GA very, very soon.

So that's my talk. That's about the partnership between Posit and Snowflake. It's super exciting. To me, Posit's the best data science tooling in the world, Snowflake's the best data warehouse in the world. You can put them both together now, get endless compute with all of your data. So thank you. I think I might have time for one or two questions.

Q&A

Thank you, Jonathan. So yeah, we have time for a few questions. So first question is, in the solution architecture, your slide states Posit Connect with Snowflake access. What about business users who don't have access to Snowflake but want access to the Shiny application?

So you would need to give them access to Snowflake is the answer right now. I think coming on the roadmap is the ability to use probably OAuth to do that, but for now you have to give them access to Snowflake.

Another question, is the main benefit of Snowflake for big data? What is the main benefit for, say, something like survey market research, smaller, and where each survey is unique?

So I think in general, the benefit of Snowflake is having a unified data platform, right? So big data, small data, you still get those benefits. And if you're just working with small data, then it's not going to cost you nearly as much to use Snowflake, but you're still going to get all the benefits. So I think actually if you're using smaller data, you're going to actually get all the same benefits, but at a much lower cost. So I think that's actually a good example of using Snowflake.

One more question. It looks like Snowflake has its own data viewer inside Positron shown at the bottom of your demo. Does the native Positron data viewer also work with Snowflake data?

Yes, it does. So if I was to kind of make a connection through this pane up here, then yes, you would get that up on the right-hand side. So yeah.