Live Q&A following Workflow Demo - January 29th!

Transcript#

This transcript was generated automatically and may contain errors.

Hey everybody, thank you for joining us. We're going to give people about one more minute to jump over to the Q&A. Everybody if you've just jumped over here now we're going to give about 30 more seconds here for people to join us in the Q&A room.

Awesome okay, I can see people are starting to make their way over. Let me pull Julia over on stage here as well. Hey Julia! Hello! How are you? Good! How are you doing? Good!

Thank you so much for leading an awesome demo for us. Thank you for having me. It was really great to get to put that together. And thank you all so much for joining us today.

Okay, I can see 30 of us have jumped over into the Q&A room. So let me get started and people can join us as they do. But this demo today actually came as a suggestion from a customer. So I want to say thank you to them as well. So they had just deployed their first machine learning model with Vetiver and created this whole end-to-end workflow in five weeks. When they presented it, internally, they were immediately asked, how are you going to monitor it?

And so they thought model cards were a great idea, but wanted to learn more and understand where those should be published. So thank you so much, Julia, for walking us through that. It's always really exciting hearing people's real experiences using this in their real use cases, like how they actually put these things into practice in their organizations.

And I do, I think it's really, it's quite interesting, what are the questions that come up? Like, what are the sort of immediate, like, questions about more about, like, not some theoretical thing, but like in practice, you know, like in practice, what are the things that you do next? Absolutely.

And I also wanted to share that because if anybody else has suggestions for workflow demos or topics that you haven't seen from us yet, you'd like to see, let us know. So you can let us know in the chat here. You're always welcome to reach out to me directly on LinkedIn as well. I guess I should introduce myself here too. I'm Rachel Dempsey. I lead customer marketing here at Posit.

And so I host a variety of different community events like this monthly workflow demo that we host the last Wednesday of every month. And so there's been over 22, I believe, different workflow demos now. So I'll share the whole playlist on YouTube with you in the chat in just a second as well. But I also host our data science hangout, which we have every single Thursday at noon Eastern time. We'd love to have you join us there as well.

Julia, I know you introduced yourself in the beginning of the demo, but might be good to say introduce yourself here too. Yeah, yeah. So yeah, my name is Julia Silge and I work here at Posit as an engineering manager. So I lead, I've worked on a couple of different kinds of projects. If I think about like, what's the connection between them, I would say they're about people's really applied process, like the data science process, like really applied, like what does it take to have people effective in their, like how can people be effective in their real use cases?

So this was a fun demo to do, because I would say what I focused on maybe for a couple years was Vetiver, like really getting off the ground, seeing like, can we have some support for people getting started to version, deploy, and monitor their models? And what I've been working on maybe more recently, maybe in the past year, two years, a year and a half, something like that, is Positron , which is a new IDE that I showed using in this demo. And so that is what I do here at Posit. I am still the maintainer of the R package for Vetiver. I work with Isabel, who's the maintainer for the Python package. So those are, we do releases and maintenance and like do new features and bug fixes, but a lot of, I would say the bulk of my time these days is focused on Positron.

Thank you. I was actually going to ask you that because I noticed you weren't using the RStudio IDE in the demo and thought it'd be good to call that out as well. Yeah. So I was using Positron. So Positron is a new data science IDE that we are building here at Posit, and it is, there are a couple, there are a couple things that you might want to know about it. One thing is it is currently available for beta testing. So it is an early stage project. If you, it might not be the best fit for you today. If you are, depending on your particular use cases or need.

The other thing to, another big thing to know about Positron is that Positron is a, an IDE built for data science in general, not necessarily for just using R or just using Python. So it's a great, it can be a great choice if you are someone who uses more than one language or if you collaborate with people who do, it can be a great option there. It's, we're calling it a multilingual or polyglot IDE. So it's an IDE that can be used with different, with different data science languages.

Q&A: model card questions

Well, thank you so much, everybody, for starting to add in your questions into the chat. I saw there were a few questions asked during the demo as well. So I thought it might be good to get started with those. But one that came in from Gustavo was, would you include the educational level as a factor after looking at those accuracy results from the model? Really out of scope for this demo, but that might be a question raised by someone going through the model card.

Yeah. So this is, this is a question about the process of developing the model. And I think it's a really interesting one to at least briefly talk about. So the, it was not one of the predictors. So educational level, like, like is someone, how much education did one of these employees have was not one of the predictors to predict their, you know, whether like attrition, like whether they were going to leave or not. But then after analyzing the model, you're like, oh, we do a better job predicting attrition for the high levels of education and a worse job for the low level ones.

So it turns out in situations like this, if you try to put that characteristic into the model, it often doesn't help you predict any better. Like, like you still, it does not improve your ability to predict attrition for the lower level education people. Usually you, the, you know, like why, why does this, why does it happen that like some, some characteristics, it's people with certain care, like demographic characteristics, like the model performs work for them. Usually it is because there's like the, the most common thing is there's less data for those people. So putting that as a factor, like as a predictor in the model doesn't actually help you. And it may actually make the water perform worse overall for everyone.

The thing that you might want to do is to try different kinds of models and to see which one does the most, let's say even job across the characteristics. And that's exactly where fairness metrics come in. So Rachel, I am going to drop you a link for someone who wants to learn more about this. If you are in the process of developing a model and you notice that you have this kind of differential across categories and you want to say, can I minimize that differential so that my model performs fairly across different characteristics, there's support for comparing models with fairness metrics. And that's exactly what that gets you.

But often, often it, it does not help the situation at all to put the demographic characteristic as a predictor. Like it does, it doesn't even like, it doesn't, it's very common for it not to help. Like that's not a solution to the problem typically. So great question. Really great question.

Publishing dates and parametrizing model cards

So I'm going to jump over to one question that was on Slido and I'll use this as a chance to remind people, if you want to ask anything anonymously, you can use Slido as well. There is a question that was, would love to hear any thoughts or discourse on publishing dates on the model card. Would it be possible to parametrize the date to capture data changes?

Yeah. So you may notice that, so a model card is a human evaluation, human documentation about like what is, like how is the model performing at a certain time? It's about the model that you trained. It's less about the model's performance over time. Although of course, like we talked about with the dashboard, like you can kind of get those, you can present those in a combined way with a fair amount of clarity.

I think that I typically 100% parametrize the date so that it is clear that when people look at it, they know how old, you know, like when was the last time this model was looked at? If you'll notice in the Quarto file, like you can click through to the GitHub repo and see like I, in the Quarto file I used for a model card, it said like last edited or let me get exactly what it was called. It's like the date is defined as last modified. And so that means that when that is published, I don't have to manually think about that at all actually. Like I get that information automatically there.

The other thing that I think that is really nice along these lines is in that model details section, we say not only, we automatically read off not only the version from the metadata, but also the date that the model was published. So we get all that quite automatically. And I think it's interesting because like this isn't like when it comes to documenting models, some things can be automated, right? Like we can get that information. Let's automate that. Let's make that easier. But some things do require that like human evaluation, human thought, human writing. And we can make that, like set ourselves up for success for that, but be realistic that it can't be automated.

But some things do require that like human evaluation, human thought, human writing. And we can make that, like set ourselves up for success for that, but be realistic that it can't be automated.

And in fact, I think that that shows a maturity and understanding of how our data work impacts different people in our organization.

Absolutely. So, this might be a hard one to be, like, a very quick question. So, I do also want to give a little shout out to say, like, these questions, like, how to deal with certain stakeholders when you have conflicting business decisions, we talk about a lot at our Data Science Hangout. And so, we have that every Thursday from 12 to 1 Eastern time. And this week, Joe Chang, our CTO at Posit, is actually joining us as the featured leader for the week. So, we'd love to have you join us for that, too.

But Julia, if you want to take a quick... A quick stab. Yeah. This is a really, this is a big and tough one. It is about how do data people function in an organization? And they're, you know, they're, I'm going to say, I think a very common mode for data people to function is in a consultant-type role, like, where you're not the person... It's very common for data practitioners not to be building the product itself, but to be consultants of, like, how is it going building the product? Who is doing well? How are our customers doing?

So, if we... Let's just be concrete a little bit, say, of this sort of imaginary example of in an HR department, we're going to build, we're going to, you know, like, there's a data scientist whose kind of consultant role, their people they are serving is the HR department. How can we help the HR department do well? What if there are multiple people in that HR department that have, like, conflicting ideas about what to do? I think that the, like, what do we do in these situations is very related to the kind of role we have and how do we, like, how do we manage that particular kind of role? So, I think to get an answer to this, we have to have clarity about, am I building a data product for my company to sell? Am I a consultant to make my company, my organization work well? And then how do I move forward from there? So, tough question, kind of a big discussion. I think something that helps us get to that is to have... Be clear-minded about the kind of role that we have as a data practitioner in our organization.

Absolutely. Well, thank you so much, Julia. I was trying to cram in as many questions there. I know we're a little bit over. So, I did just want to add, if there's anything that we didn't get to cover today, please feel free to reach out to me directly on LinkedIn. I also just put a quick link in the chat. If you ever want to schedule time to chat further with our team, maybe you're just curious, like, do people at my company already use Posit? And I don't know that. You can always use that link to schedule more time with us as well. So, I put that in the chat. I'll put it in the description of YouTube, too. But thank you so much, Julia. I really appreciate you taking the time to join us. Thank you so much for having me. And thank you to everyone for your really thoughtful questions and reflections on these complex ideas.

Live Q&A following Workflow Demo - January 29th!

Transcript#

Q&A: model card questions

Publishing dates and parametrizing model cards

Who writes the model card?

Evaluating third-party models

Model cards for Shiny apps and galleries

Positron vs RStudio

Rapid-fire questions