Resources

Live Q&A: March 26th Workflow Demo with Max Patterson @ Suffolk Construction

Join us for Live Q&A immediately following the Workflow Demo on March 26th at 11am ET with Max Patterson at Suffolk Construction. Leveraging Databricks & Posit to Assess Risk & Enhance Safety Demo will happen at this link first from 11am ET - 11:30am ET -- https://youtu.be/yavHEWpgrCQ?feature=shared For questions, you can use the YouTube chat or Slido to ask anonymously: https://pos.it/demo-questions

Mar 27, 2025
29 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Awesome. Well, while everybody is jumping over here, I think it'd be helpful to go around and just do some quick introductions. I'm Rachel Dempsey. I lead customer marketing here at Posit and host a few different community events. You'll see Libby has recently taken over the Data Science Hangout, which happens every Thursday, but we also host these monthly workflow demos, which happen the last Wednesday of every month. Max, I know you introduced yourself in the beginning of the demo, but if you wouldn't mind doing so again, too.

Sure. Hi, everyone. My name is Max Patterson. I'm a data scientist at Suffolk Construction. I've been at Suffolk for about three years now and work on our advanced analytics and data science team here. Awesome. And Blake, would you want to go next?

Sure. So Blake Abnanti, also from Suffolk, a little less than Max, been here just about two years, and I am a director of analytics and data science.

Everybody, I'm Ryan, and I am a data science advisor here at Posit. I'd typically like to hop in, just ask, answer any questions about Posit stuff. So that's why I'm here today.

And I forgot to add where we're based. So if you all want to share where you're calling in from, I'm usually in the Boston area, but I'm joining from Connecticut today. I know Max and Blake are also in the Boston area. And Ryan, you'll be visiting Boston soon.

Yep, I'll be in Boston next month, almost a month from now, but currently reside in Lima, Peru.

How to ask questions

Awesome. Well, thank you all so much for joining us. I want to just remind everybody how you can ask questions today. So you can put questions into the YouTube chat. So if you just type right here, but you can also use the Slido link if you want to ask any questions anonymously. And Hannah is helping out behind the scenes to put some helpful links into the chat, but let me pull this up on screen for you as well. So you can use that Slido to ask anything anonymously, and again, use the YouTube chat.

Databricks value proposition

I see some questions that came in right away from YouTube. And one was, what unique value proposition does Databricks bring for the workflow? They noted the row level permissions, but we're wondering if that was unique to Databricks.

Yeah, I think in terms of Posit itself, I think there are row level permissions connected to Databricks and Snowflake. So I guess it's not unique to Databricks. It offers row level permissions for Databricks and Snowflake. In terms of what value prop it brings for the workflow, I mean, that's just our data warehouse, and that's where we access our data from. So in terms of unique value proposition, I would say, I mean, that's just where our data is.

Why R instead of Python?

Thank you. And I see Chitu had another question first, which was, why did you build the workflow in R instead of Python? And maybe you could talk a little bit about some of the ways your team uses both languages, too. Sure. Yeah. So our team is, you know, we use Python and R on our team. I would say, you know, maybe some people who are more particular about one language or the other, and then a couple who like to dabble between the two. I consider myself sort of in that in-between group. I like Python and R, and they each have their pros and cons and their benefits.

In terms of why I built the workflow in R instead of Python, at the time, I was pretty comfortable in Shiny for R, and I knew I was going to be, you know, building out the dashboard in that language anyway. So I just sort of doubled down and built the model workflow in that way as well. But also, I just like the R model workflow as well. I think the tidy models workflow is really, once you get the hang of it, it's very comfortable, and it's very easy to go from training and deployment to inference. I think it all makes a lot of sense.

Other business problems the team works on

Awesome. Thank you. Well, I know you did a deep dive on the safety model in this workflow demo, but I was wondering maybe to give us all here in the Q&A a little bit more context around some of the other types of business problems the team is working on, too.

Sure. Yeah. So, as I said, I work on the advanced analytics and data science team here under Blake. I would say a lot of our direction that we're given is mostly from our executive committee, based on some of the key initiatives that they want accomplished over the course of our fiscal year. And based off of those initiatives, we tailor our work on that. So in some cases, it might be a standard analysis that could be financial analysis or, in this example, a safety analysis. And it could end up being a tool as well, whether it's a dashboard or just a report. It really depends on the ask.

So basically, we operate off of some of the key company initiatives and then sort of divvy up our work based off of that. So we've done some work in safety, around insurance, around planning and schedule impact for projects. So we sort of move around the company in terms of what we do.

Multi-year process: milestones and challenges

Thank you. I see Amelia had asked a question over on Slido. And it was, could you share more about the multi-year process for building this out, things like key milestones or unexpected challenges and how you dealt with them?

Yeah, I think this was an interesting project because we did inherit it. This was, as I think I mentioned in the presentation, it was a previously built out tool by a third party. And we wanted to replicate that in-house. At the time we replicated that, that was very early on when I started it at Suffolk, we had a different tool suite. We didn't have Posit at that point, actually. And so we built it out using a different tool set that admittedly was not as replicable. And so we had sort of our version one, which was building it out using, I would say, an old data stack relative to what we have now.

And I would say the key challenges were mostly, I would say, about the distribution of our findings. That was certainly an unexpected challenge in that maybe the way we presented it was not, maybe came off a little more combative than we anticipated, not because we were trying to call project teams out for their risk, but it was being interpreted that way. And I could absolutely understand why it was being interpreted that way.

So for us, the challenge in moving into our newer version of it was mostly around the presentation of it and how we can provide as much value as possible where project teams see us as being a resource for them and not trying to point a finger at them. So I would say that was the unexpected challenge and sort of how we've tailored the presentation of the data to reflect that.

So for us, the challenge in moving into our newer version of it was mostly around the presentation of it and how we can provide as much value as possible where project teams see us as being a resource for them and not trying to point a finger at them.

Working with stakeholders

Yeah, I think that comes up a lot in some of the data science Hangouts, too, that it's actually some of the human-centric problems and working together with stakeholders that can come up in these workflows. Would you have any, I guess, recommendations or tips for us, or somebody who's doing something similar and rolling out a new project to business stakeholders?

Well, I would say it's just important to understand their point of view as much as possible. And so I think just having a partner stakeholder who is as invested in the end goal as possible and getting their opinion as quickly as possible. I think the biggest problem that would probably run up is if you build something out, invested a lot of time in it, and it ends up being a lot different than they would expect or presents a different message than they would want it to. And then you have to go back and iterate from the beginning. Whereas if you started out with something basic, show a generic template of kind of what you're doing and then get feedback as quickly as possible. I think it's being open and humble to the feedback that you get and being comfortable with any criticism or any information that they can give you.

Absolutely. Blake, do you have anything you wanted to add to that?

I think Max hit on one of our biggest challenges. I guess another side of that, particularly to the safety model, if you think about it, we're trying to predict something then that we're actively trying to prevent. Traditional metrics of success are, we're not going to run an A-B test on a safety model and not give a project team an idea that they're going to be at risk. So it does present challenges that the predictions of what we're doing, hopefully, ultimately all turned out to be wrong, that there are no incidents on a project. So how we actually evaluate the success of this is, as Max alluded to, less the success of the model, but more just acceptance by the broader Suffolk team about safety as a cultural pillar.

how we actually evaluate the success of this is, as Max alluded to, less the success of the model, but more just acceptance by the broader Suffolk team about safety as a cultural pillar.

Yeah, that's a great point. That's something that I struggle with sometimes with customer stories that I'm writing, is trying to get to what might be the ROI metric that I can highlight in that story. And it sounds like with it being cultural, it's a little bit harder to measure in this example. Yeah, that's right. You have to buy in from the start that safety is, incidents are a risk and safety is something we're going to be focused on regardless of whether we have a model or not. And this is just a tool to help us concentrate where we put most of our effort.

Buy-to-build decisions

I see an anonymous question that came in, which was, have your buy-to-build decisions influenced anything else within the company? Greater reception to doing more in-house than relying on external tools? I don't think I've really seen it influence it too much. I think my higher-ups are very in tune with what products are out on the market for our given use cases. I think Suffolk is very knowledgeable about the construction-related tools that are out there. And so I think it more comes down to, does it meet all of our needs? And if it doesn't, is it something that we can replicate with a reasonably small amount of, smaller, medium amount of effort where we think we can do a good job with it?

Learning to integrate R with Databricks

I see Talon had asked a question over in YouTube and it was, my company is starting to expand its data ecosystem with Databricks. I'm new with it myself. Do you guys have any recommended books or courses for learning how to integrate R with Databricks?

I feel like a lot of the work we've done is mostly just trial by fire and trying to see what we can make work. The most important thing is obviously getting access to the data. And as long as you have a reasonable way to access it, then you can really just utilize the Posit suite of tools independent of what's available in Databricks. So we normally use it just as your standard SQL database and do a lot of our analysis in the Posit side.

And then we use GitHub for our repositories and make sure that we save our analyses that we do in Posit based off of that. So in terms of getting up to speed with Databricks, I would just say, maybe focus on, if you plan on using Posit for your suite of tools, then just focus on getting access to the data and then doing your analysis in Posit. Yeah, the Connect cookbooks have good examples of using the OAuth Connect integration. And then the specific Posit libraries like Sparkler, I think even in their GitHub repo have some examples of not using the OAuth connection, but connecting to Databricks and how you can leverage it through either ODBC or the Databricks connector.

Model maintenance and version management

Okay, one other question, let me copy it over, is around the maintenance for managing updates. Amelia had asked, what's the maintenance like for managing your updates and version upgrades?

Yeah, so I assume this is in terms of respect to the model. So, you know, our plan is to retrain the model on, you know, a quarterly or, you know, semi-annually basis. Once we have a little bit more data to feed in. The nice part about that is we sort of prepared for that and that, you know, the model workflow, if you retrain it and redeploy it, it obviously has the version of the model associated with it in Posit Connect. We've made sure that the pin boards are version pin boards so that you have the model. And then when we actually do inference on the data itself, we store the version of that model associated with the prediction so that we can sort of evaluate, you know, over a period of time, is the model actually performing better than or worse than the previous version? So that's how we're anticipating doing it.

I remember in the demo, you talked about, like, there was this point in time where you did have to go make a bunch of changes to the model. And I was wondering if you could talk a little bit to what was the tipping point to go and adjust the model? Yeah, I think we had a few other ideas for features that we wanted to add in. And so it seemed like a reasonable time to, you know, previously, we did write it in Databricks. And since then we have adopted Posit a lot more than at that point in time. And so it felt like a natural time to move the model over and start, you know, using Connect as our model registry. And so that was, you know, those were the primary reasons. Yeah, new features, and then also wanting to be a lot more consistent in how we were, you know, storing models and, and making sure we were doing it all in one location.

Monitoring models and model drift

Great, thank you. I see John had asked, and maybe this just, you just touched on part of this, but how do you monitor models, including measuring model drift, which parts of the Posit stack are useful for this? Yeah, I think, right now, we're, I would say, you know, that's certainly an area that that we can improve by utilizing the Posit stack a little bit better. You know, right now, we're focused on, you know, we have stakeholders who are very, very tuned in on the performance of the model so far. And so at a very granular level, you know, we're really looking at it on a week to week basis. You know, we are trying to look at it, expand on that, you know, at the monthly level, you know, quarterly level, how are we doing from that perspective?

You know, the one part that we did leave out were model cards, which are, you know, something that, you know, the Posit team has talked a lot about, and something that we definitely want to implement. My understanding is Posit, or the model cards are good for that purpose at evaluating, helping you analyze model drift and everything like that. And that's certainly something that we anticipate adding to, to this as functionality in the future.

User feedback and dashboard design

Okay, somebody else just asked an anonymous question. And it was, did you conduct surveys with users of the dashboard about the expectation or use of data and useful features? What did you do to make sure that the or use of data and useful features, what questions gave more insight and how did it influence the work? So I guess the one benefit is that a dashboard that, you know, predates Blake and me at our time at Suffolk already existed for, for a safety dashboard. And so luckily we had people who were invested in that dashboard and gave us some insight in terms of what they liked and what they didn't like about it. And so did we conduct a, you know, a real survey? I would say, no, we did, you know, connect with our stakeholders about, you know, what they, how they interpreted the, you know, that dashboard and, you know, what they hoped to, to achieve.

And a lot of it was around the, you know, removing the, the idea that we were ranking projects in terms of, of their risk and, and pairing them against each other. Whereas this is really a, a project level, you know, risk assessment, and we wanted to build something that reflected that. And so I think that was a very fair call out in terms of, of how they felt about it. And, you know, we did our, you know, we've done our effort to try and reduce that, that feeling and, and give people more data that, you know, they could rely on specific to their project.

Max, do you want to touch on kind of in that vein, the kind of the retraining of the model, particularly through the view of like actionable insights for the safety team and like feature engineering? Sure. Yeah. So, you know, one part that we did add in a lot was, trying to give our project teams more insights into the features that were causing their, their increased risk rating. And by doing, we did that by, you know, translating the, the features into actual insights that they could, you know, use. So if, you know, for example, if we had low coverage, you know, our, our Suffolk team relative to the amount of trade partners that were on site. And that was a big reason, you know, we started translating that into more plain English, as far as what that actually meant and what they should be doing on project site, whether it's, you know, whether it's increasing the number of staff, which is a possibility, but not always realistic, or trying to increase your, you know, your observations around, you know, more project teams and being more consistent with that.

Personal experience exploring Posit tools

I think there's one other question I hadn't covered yet. And it was an anonymous one on Slido that said, from a less experienced data scientist who recognizes the value of Posit workflows, what did your personal experience look like initially exploring these tools?

Yeah, I mean, initially exploring these tools, I would say, you know, I would say my initial data science experience was more in Python than in R. And so, you know, certainly, since Blake has come on, and, you know, a little bit before then, I've gotten more comfortable with R over time. And, you know, so I, my data science experience initially was more in, you know, the Python realm, the scikit-learn, you know, sort of side of things and looking at the R suite, specifically, like the tidy models flow, it all had endpoints that are, you know, points that really complemented each other, you know, not just building out a model, but how to how to go through finding the best one and all the hyper parameter tuning that goes along with that, and then how to actually, you know, as it specifies to Posit, how to how to actually deploy it and to do it more easily. And so to me, my experience was, it certainly seemed a lot easier and simpler to go from a starting point and make a make a model deploy it and then have a process for inference that made it as replicable as as possible without as much error while reducing, you know, effort later on to actually feed data into the model.

Handling disputes from project teams

Thank you. I think there might just be now one more question we hadn't gotten to, but Yumito asked, did you get any disputes from the project site team based on the model project rating prediction? All the time. Yeah. And I, you know, they know their projects so much better than we do. And so we would be, you know, I think it would be faulty of us to say that, you know, we know, you know, more about their project than they do. That said, I think we also look at the project from a different perspective as them. I think they're very focused on, you know, what's happening on a day to day basis, and they are aware of their project risks. And so in their mind, I would say sometimes they're, you know, implementing strategies to mitigate that risk, but we are, we're merely calling out the fact that there is risk. And so sometimes I think there is a mistranslation of what we're trying to show versus what they think we're trying to show.

But yeah, to answer the question, yes. And, you know, we're trying to be as understanding of that as possible and trying to get that feedback and, you know, hopefully continue to evolve this and make it as, you know, as useful as possible for them. Yeah. Just one other comment on that. I think it touches on the previous question. I think that feedback, I think helped us drive some of the changes Max made where it's, you know, I think the first version, it was all projects objectively against one another, whereas the real value is a project measured against itself. So it's relative risk to how it had been previously is really the value we're trying to drive. So, you know, some projects are just inherently high risk. So, you know, there'll always be higher than the other projects. So really understanding how it relates to itself over time is the value we think we can add with the model.

PositConf workshop and closing

Thank you. I know with the question of somebody had asked about like learning a bit more about Databricks, I know there's a workshop coming up at PositConf. Ryan, would you want to share a little bit more about that? Yeah, absolutely. So in September this year, we're going to have our conference in Atlanta. And the first day of the conference is actually just a series of workshops. And it was added a little bit later than most of the workshops, but our colleague James Blair, who's been, he's one of our product managers for all of our cloud offerings here at Posit, he just announced a new workshop that he's offering. It's called Modern Data Platforms with Posit for R and Python users. So it helps educate data practitioners, R and Python users, that are using not just Databricks, but things like Snowflake, things like DuckDB, and just hoping to improve your workflows with these cloud-based tools. So if you're looking to get more hands-on practice and education using these tools and these integrations with our open source and professional tools, definitely check out that workshop the first day of our conference.

Awesome. Thank you. Well, I want to say thank you so much, Max, for this amazing demo. And thank you to Blake for jumping on here as well. I've really enjoyed getting to learn more about the work that you do at Suffolk this past month.

Maybe one ending question for both of you, Blake and Max. I'm curious, what are you most excited to work on in the next year ahead?

Interesting. I think there are a lot of parts of construction that we haven't touched on a lot yet as a team. I think understanding a little bit more about the project schedule and maybe ways that we can improve our impact there, that certainly seems like some meat on the bone for us. So I would say around the project schedule and how maybe we can do a better job of impacting that process.

I would echo what Max said. I think I touched a little bit on it in the data science hangout. I think we've started to dip our toes in how to leverage LLMs to come up with other ways to interact with data. So move away from kind of static dashboards to more dynamic interactions and interpretations of data. And again, not to plug too much positive stuff, but the Elmer and Chatless packages have been really great for us to start tipping our toes in that area. So I think there's a lot of opportunity there over the coming year.

Thanks for the plug to Elmer and Chatless. Check those out too.

Well, thank you all so much for taking the time to join us today. I do want to remind everybody the recordings of the demo and the Q&A will be made available immediately after. But if you ever have suggestions for different workflows you'd like to see, let us know in the comments. We'd love to hear your feedback. Have a great rest of the day, everybody.