Resources

How You Get Value as a 1-Person Posit Connect Team - posit::conf(2023)

Presented by Sean Nguyen Sean, a sole Posit Connect developer, shares his experience in delivering business impact. He narrates his transition from crafting one-off reports to developing and deploying robust data science web applications using Python and R with Posit Connect. Despite its common association with large enterprise teams, Sean demonstrates how Posit Connect can be effectively utilized in smaller settings. He presents his work on creating and deploying end-to-end machine learning pipelines in Python, hosting them as APIs, and seamlessly integrating with Shiny apps via Posit Connect. This talk imparts practical strategies and techniques to foster user and executive adoption of Posit Connect within lean (and large) organizations. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Getting %$!@ done: productive workflows for data science. Session Code: TALK-1093

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi everyone, my name is Sean. So the goal of my talk today is to basically talk about my journey through getting Posit Connect at my firm, because I'm the sole user for Connect.

And so I'd like to start off with an imaginary story where your manager tasks you with running an analysis and you have multiple different data sources. And you know, the BI tools just simply aren't cutting it, right? And so you have this idea, okay, you know what? I'm going to create a Shiny dashboard. And then you stay up late at night, it's a picture of me staying up late at night. I have proprietary data that I know I can't share out publicly, so I can't publish it to the web via GitHub or things like that.

So then I decide that, you know what I'm going to do is I'm going to share this to my manager as a HTML file. So that way it's secure, I email it to them, and then they'll open it up in the morning, and then they're going to be amazed by this fantastic analysis that I have, right? But then I get greeted with, thanks for this, but I don't want to download a HTML file. This simply just doesn't work for me.

And so, you know, when we're working with data, you know, sometimes the friction is trying to share our work with the stakeholders in our business. And so sharing shouldn't be hard. And you know, when you deploy things, there's a myriad of things that you need to consider in terms of making sure that you're not sharing out sensitive information, and then you're complying within your IT infrastructure. And oftentimes, IT, their job is to kind of keep things secure, but then you want to share information to your business units, so then everyone's happy. But then, you know, you're kind of limited in your options, right?

Discovering Posit Connect

So fortunately, Posit has an enterprise solution called Posit Connect, and then it's a way for you to be able to deploy applications kind of with a one-button click inside your ecosystem so that you can kind of share information effectively to your stakeholders. However, one of the things that you have to consider is that this is not a free product. It's an enterprise product you have to pay for, and then you actually need to host it yourself. So this is sometimes a limitation in the sense that it's not a software-as-a-service like Posit Cloud, where it's fully hosted, you just pay for it, and then you can kind of use it. You actually have to pay for it and then implement it yourself on your own servers.

And then at least for me, when I learned about Posit Connect, I typically associated it with large teams, right? And so for me, we have a firm of about 45 people, and I was like, well, maybe this is overkill for us. And so today I'd like to share with you the journey that I had in terms of navigating and actually ultimately getting Posit Connect and how it's actually impacted my work.

And so just a little bit about me. I work at S2G Ventures. We're a venture capital firm based in Chicago, and we invest in seed, venture, and growth stage companies, hence the name S2G. We invest in oceans companies, food and agriculture, and clean energy. And I was the first data science hire in 2021, and I'm actually the sole Python R developer at the firm. Most of the firm are ex-investment bankers, private equity. So they love Excel. This is kind of interesting to chat with them. And I'm definitely not a Linux admin.

The three-step journey

And so the process that I had was about three steps. The first one was the trial phase, and then the second one was actually getting an executive sponsor and having a champion within your organization to kind of get the Posit Connect product. And the last part was the implementation. So after you sign the documentation, now you have to actually get everything connected and working.

And so during the trial phase, what I did was I contacted Posit. I was asking to get like a 30-day trial of the Posit product that's fully hosted so you can kind of see if it works for you or not. And I talked with my manager to get a POC. And what this is, is I sat down with her, and we were going over like a dashboard that she would like. She would want to see the meetings that she has last week, meetings that she has this week, and then kind of action items that you can actually interact with and type in information and log it back to your data warehouse. So this is something you couldn't necessarily do with the BI dashboard.

So then what I did was I was able to use the different applications. Like I built a Shiny dashboard for them, and then we kind of got the buy-in that way.

And so the second step in my journey of getting Posit was actually finding your champion or the person in the organization that can have the ability to sign on the line for the actual contract. So find the person that can make the decisions in your organization, and then make sure you identify what they value and then create that value for them so it's a no-brainer. And then the next part is to also identify your non-advocates, because sometimes, inevitably, whenever you're trying to introduce a new product in the system, you're going to be like, you know, Sean, we already have Power BI or Tableau or whatever, right? And so you want to alleviate any concerns that they have or reservations with yet another shiny tool, right?

And then what I was able to do in terms of my prototype was I leveraged the Blastula package to generate programmatic emails. And so this is kind of an example here is where you have a report, R Markdown report of like a fictitious cookie cutter company where you can send emails of cookie inventory daily, weekly, hourly, whatever, how often you like. And then you can have conditional logic such that if the inventories are high, there's no email sent because everything is working fine. But then whenever a threshold is met where you have low inventory, you can trigger with Posit Connect an email to send out to whoever you want to alert them, hey, something's wrong, fix it.

And so what I did was I took this and I kind of showed the art of the possible within the leadership in my organization. And this is an example of a Posit Alert. So what I did was I created a Markdown report that was able to scan our database to figure out if our third-party vendors were tracking our internal emails. So we didn't want to track them in our data warehouse. We only wanted to track third-party emails. So us emailing someone else, but not internal S2G to S2G emails.

And so what this alert did was this alerted my manager and the higher up management that this third-party vendor was actually tracking emails. And so then it alerted them and allowed them a mechanism to actually act upon this new information. And so this proved quite valuable to them. And then what that ended up doing is it kind of sealed the deal so that the COO was able to sign off on it. And then I was able to outline a plan of action for the first 90 days.

So find the person that can make the decisions in your organization, and then make sure you identify what they value and then create that value for them so it's a no-brainer.

Implementation challenges

And then it was time to celebrate because it was like, yes, I finally did it. It was like a two, three-month process. But then that's actually not exactly the case at all. You still have to implement it.

And so the implementation was rather tricky because I didn't have a full-fledged data team with me. So as I was kind of figuring everything out, I had to figure out, am I going to create a VM? How am I going to create the VM? Install Posit Connect? And then make sure the IT consultants were signing off on all the things that I was doing. And so it felt like I was in this island of despair. It's like, literally, it's just me. I'm trying to do my stuff, and now I have to spin up this new thing.

But I actually think of it as like an island of empowerment. Because when you're kind of solo like that, you can move quickly. You can prototype stuff. But then you can also iterate quickly as well. And then if you make mistakes, you can pivot to try to fix it. So it is a disadvantage being a solo practitioner, but sometimes it can actually help you out. But it is a struggle. There's no doubt about that.

And so one of the fortunate things I was able to do was pick cloud platforms. Because we didn't have servers on-prem, so I wanted a cloud-first solution. And then you have different cloud providers, but I wouldn't get necessarily bogged down on the specific ones, like Google Cloud, AWS, or Azure. We use G Suite, and we have BigQuery. So I was like, OK, let's just use Google Cloud. And just pick whatever tool that you like. And then I try to encourage folks to use cloud providers, if possible. Because it's easier to maintain, it's very secure, and then you can kind of pay as you go.

And then for me, I'm not a Linux admin, right? So it felt like I was just like, oh, into the belly of the beast with this Linux. And so there's no Undo button necessarily. So sometimes I've made mistakes where I'm like, wow, I wish I never did that. And then you're having to navigate the file system if you're not too familiar with it. I was OK, but just handling all these things and resolving dependencies for different Linux distributions and things like that. And then you just basically, for me, I just had to memorize commands. And after you get used to it after a while, but by no means is it simple.

How Connect is used at work

And so how I use Connect at work is we have right here is a depiction of our data warehouse. And then this represents Posit Connect. And so I have R Markdown or Quarto documents that run and take the raw data and transform it. And then I'll actually save this into a pin. So I'll have CSV files or RDS files that are saved as model data layers. So I can have this run every two hours, every day, multiple times a day.

And then I can actually connect this to the FastAPI and host it in Connect. And then from here, I'm able to deploy Streamlit applications and Shiny applications. The great thing about Connect is that you can trigger how often something will run. And then you can actually control provision or provision access to different applications so that only the investments team can look at the Streamlit app. The marketing team can only look at the Shiny app. And it's a fantastic way for you to control all these things.

And because of that, I felt like I can do all these different things just being that sole developer. So I had data pipelines running, automated reporting. And you can securely deploy these applications like Shiny, Streamlit. And then I also used Quarto to create a website for documentation, technical documentation, and hosting our intranet. To have a landing page for all these Shiny and Streamlit applications.

Troubleshooting tips and lessons learned

And some trips for troubleshooting that I had. So sometimes, inevitably, when I was using Blastula, things will arise. So I was using Google or G Suite for our email. And Google actually no longer supported the less secure apps option to send out emails. And so I was sleuthing, trying to figure out why isn't this working. And so I found out someone had posted an issue. And then I was able to find that the Blastula documentation actually had a guide to suggest using an app password within Google. So if you're ever running into issues, just search on, obviously, Stack Overflow and Google. But then sometimes, you can look at the GitHub issues. And you actually might be able to find the answer there.

And some lessons learned that I had was to always create a backup image. So whenever I try to increase the cadence, I used to do it once a quarter. But now I almost do it once a month. And then just using Git, as we learned from Colin today. And always document as you go, because your future self will thank you.

And so this is an actual screenshot from what our VM instance in Google Cloud. So I'm actually running my production. Connect server is a backup, because I want to update Linux. And I totally, like, Connect just did not work. And so I was like, OK, whatever. Let's just go back. And I thank myself for that.

And so the main takeaway message that I have for everyone is just to try to do a trial, see if it works for you, and then identify your executive sponsor within your organization, and then kind of implement it. And then you're going to have struggles. But hopefully, I made some mistakes that you don't have to. And sometimes your greatest skeptics can be your greatest advocates, right? And so I invite the audience to try to collaborate with your different stakeholders, identify what their pain points are, and kind of earn their trust. And then just continuously deploy things and provide value.

And sometimes your greatest skeptics can be your greatest advocates, right?

Q&A

So one question for you. How hard was it to convince decision makers to buy into Posit Connect? Did Connect provide tools to make that easier, or did you wish they did more?

I think for me, it was kind of I had to get Connect and kind of show them the value. It's almost like that. What is that mantra? It's more a beg for forgiveness than asking for permission. So I had gotten the trial version of it, and I implemented all these different reports and things like that, such that they got really happy and were like, oh, how did you do this? And then I was like, oh, I used this thing called Posit Connect. Because you can explain it sometimes, right? It's like the show don't tell. And so I think by creating the value for them, they kind of saw how it was useful.

A very quick question. Someone is wondering about the art in your presentation, where you got that from. Yeah, so I used Midjourney for all these slides. And I just did the styling equals zero. So for this one, it was like a man walking down a mountain path or something like that. And then you say, s equals zero, version five. So stylization. And then I said, in the style of illustrations or something like that.

And since we have some additional time, there's time to unpack one more question. Do you prefer R or Python for data science? I prefer R personally, like for data wrangling and everything like that. A lot of the machine learning stuff I do in Python. But I think Streamlit is great. I love Joe Chang's comment. Streamlit's fantastic until it's not. And it's just like the whole, because you have the cache things, right? So I'm hoping to port my stuff over to Shiny for Python. But I mean, I think I like them both. That's the whole nice thing about Connect is that you can use both Python and R, and Quarto, right? Both of them simultaneously.