Resources

Sep Dadsetan - CONNECTing with our clients

Leveraging Posit Connect, our company transforms client engagement by providing direct support, extensive documentation (built with Quarto), and no-code applications for data exploration and analysis of real-world oncology data. This strategy provides us the greatest flexibility for subject matter experts to deliver client value, provide client assistance, enhance self-service learning, and lower the technical barrier for data insights. Our commitment to client success and innovation is evidenced by our use of Posit Connect, providing tools for a competitive edge and data-driven culture. Talk by Sep Dadsetan Slides: https://drive.google.com/file/d/1let_qEC94x3GS5E_hjLkqp0F4GrqLkwe/view?usp=sharing

Oct 31, 2024
19 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

I'm really excited to be speaking to you guys today. Um, uh, it's my first time speaking at Posit Conference. I've been a long time participant and I'm also really humbled to be speaking on behalf of Concert AI.

Uh, Concert AI is a real world data company. We get, collect and analyze patient level, uh, oncological data from a variety of sources, such as electronic medical records, and then we generate real world evidence for supporting drug development and, and, uh, offerings like that, but don't worry. Uh, the talk is not about oncology or real world data. The hope is here that this is going to be agnostic enough that maybe there's something that you can take away from it.

Gaps in the data science process

So before we dive in, uh, I want to cover a couple things that I think are like gaps in the data science process. Um, and it'll hopefully be informative for where we're, where we're going.

So I'd love a show of hands. How many people have performed, let's say an analysis for a stakeholder, uh, delivered that analysis, and then the stakeholder came back with some revisions of some sort. Yeah, exactly. Um, perfectly common, right?

Um, but there are some simple bottlenecks within this process. One is because we can't really anticipate what questions are going to come around, come about from that particular delivery. Um, there's going to be additional questions. And so that means it elongates the time in which we can then deliver the particular project. And it becomes also difficult to then, you know, generate a timeline. So timelines will slip and that can have knock-on effects to broader project milestones. So it's more costly because it costs more time.

Now that's the first piece of it. The second piece of it is how do we package that information? Um, uh, you know, it could be Word. It could be Jupyter notebook. It could be a variety of things. Right. Um, but again, this seems kind of innocuous, but there's actually a lot of variability within a company, uh, and even between companies on how that information is delivered. Oftentimes what ends up happening, we converge on the lowest common denominator, which is like, let's say emailing a Word document, uh, you know, to, to somebody, and so it goes over the wall in that regard.

Now, data science is a science. It's kind of wholly expected that there's going to be questions and we're answering questions. We're going to get some more questions, right? There's nothing unique about that, but there are some improvements that we can, we can do. So for example, you know, your team or you create this document, uh, you've done all this analysis, you email or file share this thing, your collaborators will then review it and provide some feedback, they say, ah, it's garbage. Do it on this dataset or, oh, I love it. Just change this color or whatever it is. Um, and they come back with questions and we go around on this cycle. Right.

But especially if you were to approach it with this lowest common denominator of emailing. There's a lot of things that you kind of lose, right? You have to make sure are the right people viewing your document. Are they even viewing the right version of your document? Um, you are, it's over time when projects kind of grow, people will add onto the project or leave a project. And anything you've emailed now is kind of like lost, right? It becomes a work for that under other team or for your team to have to kind of make sure all of this is managed appropriately. And so all that time that you spent over here, kind of collecting the data and performing the analysis over time can get stale and can get lost, right?

Addressing the business impacts

So how do we address these business impacts? Um, well, I think for reducing some of the back and forth, self service is actually an option. It's not the only option, but it's a pretty good option, right? If I then, uh, build the product in a way that gives the user the ability to just say, change parameters of their output, I'm certainly going to diminish some of the requests that might come back my way or my team's way. So you get a little bit of resource back in that regard. So we have to carefully engineer these things.

And thankfully we're at PositCom, so we're very familiar with R Markdown, with Quarto. We've seen a variety of talks on nature, so it's perfectly feasible. However, in order to provide that interactivity requires us to have the ability to host that information because it's a web-based product. You can't just email this thing. You could, not recommended, but you, you know, you want to go that. So web-based development would then be the easiest way to add interactivity to that document.

It's not the only way, but it all, you know, for all, uh, for all of our sake, it's probably the easiest because of the nature of the tools that are available. But here's where we have that other problem. You either have to go to knocking on your IT's door to set up some server, or you have to have some kind of, you know, server skills or cloud ops skills, right? And not everybody has that. There's already so much knowledge that we have to use for our day-to-day lives.

And so we went about looking at what's the ecosystem of solutions that are out there, this whole build versus buy. And there are a bunch of solutions, actually. Um, these are just a handful of them. Um, but we had some requirements that needed to be met with whatever solution we were going to go for, right? We're an oncology data company. We have patient data. So privacy is really important. We do have both R and Python users. We need to bolster those skills. And so we needed whatever solution to allow us to be multilingual and continue that.

We wanted a sub, uh, support for a variety of content, right? So it's not just, you know, submitting Quarto documents, but could we do other things with it? Um, cost, which is obviously subjective, but that's an important consideration that we wanted to take into account. It should be configurable. Can we turn things on and off? Maybe there's things that we don't want, uh, to have enabled, or maybe there's things that we want to be able to add onto it. Um, we want it to be scalable. It shouldn't just like fall over if like two people are, are viewing or looking at something. And it should also be scalable in terms of like, uh, expansion.

And so these are a couple of the things that we wanted to. And I think we're at PositConf. I'm not, I'm not surprising anybody with this. Um, uh, but for us, PositConnect was able to resolve a lot of those, a lot of those things, but not everyone necessarily is familiar with PositConnect. So what is PositConnect? PositConnect is part of, uh, Posit's enterprise solutions. And it's a, basically a publishing platform that allows subject experts to be able to, uh, craft a variety of solutions in, in different ways and be able to publish it to an, to a place that allows others to be able to leverage it.

So in our case, where we have users that are both, uh, non-technical and technical, um, as well as have different use cases within those particular verticals, uh, having the flexibility of being able to publish a variety of different types of content, um, is really, really helpful because we can't always anticipate what that use case or what that technical need might be. And so this enables us kind of the faster deployment, um, if, if we needed it, which we don't, real-time decision-making, um, things of that nature. And so this drives the, the faster iteration cycle.

Client-facing Posit Connect deployment

Now, PositConnect is often used internally at institutions. Um, and it's a very handy tool in that way. Well, you know, we're wanting to enable our clients to be able to make decisions. And so the question then would be, well, what would a client facing PositConnect deployment look like? We have had plenty of experience doing PositConnect internally. There's great documentation, great support on that, but we wanted this kind of external facing, uh, situation. So it's not necessarily straightforward and I'm hope, hoping to, uh, guide you through some of that, uh, today.

So our deployment and configuration for this, uh, with the help of, uh, uh, at Taurus Research as well as our CloudOps team, we began kind of like sketching out what this might, might look like, and we were able to leverage some infrastructure as code. So for those that may or may not be familiar, infrastructure as code effectively uses a YAML file, which is kind of like this human readable, but also a program, pro programmatic way of being able to define resources. We would specify a lot of those, um, conditions within our YAML file. We now can obviously put this in version control, very important to use version control.

And then we use a CI CD tool, a continuous integration, continuous deployment tool. In this case called Jenkins to actually set up some of the parameters and then using the languages for Terraform and Ansible, uh, both are very, very handy little, uh, infrastructures code. So for those that may not be familiar, Terraform is helpful for establishing very consistent, like, uh, resources in the cloud, whether there be AWS or, or Google and Ansible is really helpful for being able to configure the software that goes onto those, uh, uh, those resources.

And so with that, we're then able to very quickly from a YAML file, be able to deploy a particular EC2 instance that has all of the software, all of the network, all of the security considerations, the monitoring, um, the, the, the database connections, the, the service accounts, everything's there in a reproducible pipeline. Right. And we can now do this multiple times. And now you have each instance being identical to the next, right.

And so, honestly speaking, this is probably one of the coolest things. This was not something I had a whole lot of experience with. Um, but being able to kind of like make some quick configurations, deploy the pipeline and have like control over this fleet of servers is like a really cool feeling, thought I'd share that.

Um, but being able to kind of like make some quick configurations, deploy the pipeline and have like control over this fleet of servers is like a really cool feeling, thought I'd share that.

Um, so that's great. This infrastructure is really important. This is like the underlying architecture of how everything else gets done. What does this mean operationally now?

Well, so operationally, this would be what would kind of be a typical connect instance on an internal basis. So these purple outlines really represent, um, concert focused internal resources. We have our data in this case, uh, we use Redshift, but pick your data of choice or your data source of choice. It doesn't really matter. Um, we're using Posit Workbench of course, because we've purchased the teams thing. And so we have our users that have an ability to go use Python or studio, whatever they like to build, whatever tools they need to share internally. And then they can publish this to our internal connect server. Right.

But as I showed you previously, we have these now external client facing, uh, uh, client facing ones. And so how do we then do that? Well, it's kind of similar. Um, you know, you still do, you still perform your work here on the left-hand side, uh, but now we've, we enforce that GitHub or version control be used. So the only feature of these external instances is that we use Gitbacks deployment on these connect servers. That means that you can't just directly publish to a particular server. You need to actually deploy through version control. And this is actually really, really cool.

Uh, and I, and I think a nice tidbit to take away from this is that one, it enforces the development, not everybody admittedly, and I've seen this at multiple institutions will be familiar with or use version control one. And so that's a nice little cherry on top too, is it enforces governance, right? So now there's content and code review that's required because these are going to client facing instances. We want to make sure the quality checks are there. And then three to appease the IT gods, um, is the fact that you could do security. It's a nice security checkpoint. So the material that's being published there can go through either automated or, uh, or manual review of security, uh, items. And so this has been very, very helpful.

The added benefit is now because this content is version controlled, um, it can now get deployed to multiple instances, assuming let's say it's the same content and we can go even further and make like programmatic changes to kind of create different types of content for an individual client. And so this is a really nice way for them, us to kind of have a system internally, and then for the clients to then be able to engage with it.

Outputs and showcases

So now that I've showed kind of the architecture and a little bit operationally, how it, how it kind of comes together, what I wanted to showcase are some of the positives or some of the outputs that have come out And so as noted in the keynote earlier, documentation with like Quarto, for example, isn't necessarily like the sexiest thing, but to my, in my opinion, this is actually one of like the coolest aspects of, of this platform.

Um, big shout out to Conrad Svitek on our team who has kind of like single handedly been able to like put all this together. But what we ended up doing was, uh, we took a lot of documentation. So our documentation process involves like 30, 40 different people from different groups, like epidemiology and informatics and, uh, and a variety of different teams, right? So it's a, it's a, it's a big effort. A lot of expertise is required for real world data because it involves clinical data, clinical data, genomic data. Um, it could, uh, claims data and you have experts that come to the table from, from all these different groups. So putting documentation together seems like a simple thing, but it's actually requires a lot of process.

And so in this case, um, what Conrad has been able to do is actually, you know, create a process behind the scenes that allows us to take advantage of like Git and Git flow techniques to allow these different teams to participate and bring their voice to the table. And then what we can also do is build profiles so that if, for example, a client subscribes to a particular dataset, their documentation represents now their data, and this is now all of the data that we have available to us. And this is now all kind of through an automated pipeline. And so it has saved basically weeks of work and has greatly improved our, our ability to determine who's changed, what, when they changed it, et cetera, and give us a lot more providence.

And this is all kind of built on top of this infrastructure and benefits the clients. The clients come back to us saying like, this is great. Now it's a single source. There's no worry about what version I'm looking at, right. As we were talking about earlier, that was part of our hypothesis. And so they can go to that resource. They can find it, their search functionality, all the things you'd expect out of the web. So that's been a really, really cool, cool win.

The second showcase is a data browser. And so one of the teams that works specifically on our genomics product wanted a way to kind of elevate some of the counts that they find in the dataset so that the clients and other people can basically be able to change some of the parameters and be able to pull out what they want. And the reason I like to highlight this one is because originally one of our hypotheses was if we build something like this, if we provide that infrastructure, we can reduce the time that subject matter experts can get towards an output, right?

So now we have subject matter experts, thank you, subject matter experts that are working on this and can develop this stuff with technologies such as Quarto and build these. And so in a matter of like two weeks without having any experience like with Quarto or R Markdown or even JavaScript, they were able to build this and it's multilingual in the sense that parts of it are built with R, parts of it are built with Python. And I thought that's a really cool highlight to make. And so shout out to Prita Ghosh on our team who was able to build this out.

The third one is a favorite of mine because Connect allows hosting of APIs. We can now provide an ability, a utility for individuals to be able to like create this stepping stone. So we create this infrastructure that allows other people to interact with the data in a programmatic way so that they can go ahead and build additional capabilities. And so in this case, it gives us a lot of flexibility, similar to how an R package or a Python library would kind of encapsulate functions. We can now just build this in the language agnostic manner for people to consume that information. And so this is a extremely helpful utility that is now enabled by that. And so shout out to Tyler Lifke and Conrad who helped build this.

Now, the fourth one is kind of experimental. And I don't have a screenshot because it's still early. But Connect, being this kind of hosting platform, obviously supports a lot of the things that I had mentioned earlier. But what we wanted to do is we wanted to kind of push the boundaries and see what can Connect actually do. We have a desire of building web applications, right? But we didn't necessarily want to build it in Shiny. There are certain limitations that Shiny has that was preventing us from kind of being able to go forward. And we want to see, can you build a web app in a more traditional, let's say, like a React framework or something like that and be able to publish that to Connect? I've seen examples of this by other companies doing it.

And so we don't have JavaScript developers in-house to be able to do that. And we came across this framework called Reflex, which is a pure Python framework, basically a wrapper for Next.js, which is a framework for React. And the team was able to actually go through and with a little bit of help from Posit Solutions Engineering, we were able to come up with an MVP and actually deploy this on Connect so that people could actually engage with that content, which I thought was amazing. So this really expands the possibilities of what's there. Now, is it a little bit like, you know, unstable in terms of like, hey, this isn't necessarily purely supported, perhaps, but I just wanted to kind of showcase some of that.

Feedback and benefits

So some of the feedback. So drawbacks, it's a little bit hard. Configuration's a little bit hard to wrestle in the beginning. Management of content across all of these systems was a little bit tricky. It's solvable, but, you know, that's that. And then not all content types are supported. So it's not, you know, all puppies and rainbows, as I like to say, but, you know, we were able to kind of be able to get there and there's a lot of benefits.

So deployment and server management, because we use the infrastructure as code and its possibility is fantastic. We're really excited. We've excited our internal team members because they can very quickly see outputs from what they're building. They get excited and it kind of elevates the company as a whole. We've improved the way obviously our clients have consumed information. We've improved the speed at which we can deliver and make these updates. We've given, we've gotten better business insights. We can not only, we can see some of the activity and who's, are they even looking at the documentation, right? Something that if you were just to email, you wouldn't be able to see. And it's just opened up possibilities as I've kind of mentioned in the last slide.

We can not only, we can see some of the activity and who's, are they even looking at the documentation, right? Something that if you were just to email, you wouldn't be able to see.

So a big thank you to the team at Concert AI. There's a lot of people, but these are kind of core people that have helped out on the project. Eli from Atorus has been a fantastic resource. And then obviously Posit for accepting my proposal to speak here. Thank you.