Resources

Joe Cheng @ Posit | You have to be able to reason about it | Data Science Hangout

We were recently joined by Joe Cheng, CTO at Posit PBC to chat about all things Shiny - a web framework for data scientists, career journeys, being vulnerable, and so much more. At 39:53 - Joe shared what it means when he says, "you have to be able to reason about it" When you write software or do an analysis or whatever, there's a level of “I got it to work” and then there's the level of “it works, and I can reason about it.” Complex pieces of software are among the most complicated things that humankind has ever devised. What other human-made constructs can have hundreds of millions of pieces and yet they're expected to all fit and work together so precisely that, if one token is off, rockets explode. I don't know what the limit is, but even the smartest humans can only hold some small number of variables and operations in their head at any one time. When we work on software that's non-trivial, we work on software that in its totality is more than any one human mind can hold at any one given time. The main challenge in software engineering is about, how do you take all this complexity and break it down into smaller pieces, each of which you can reason about, each of which you can hold in your head, each of which you can look at and say, “Yeah, I can fully ingest this entire function definition. I can read it, line by line, and prove to myself, this is definitely correct if the functions that it's calling don't have bugs and if it's called in the right way.” So with those caveats, if the things that this is calling are correct and are called correctly, then the result will be correct because the logic here is correct. Software engineering, at all but the most beginner level, is a lot about this. How do you break up inherently complicated things that we're trying to do into small pieces that are individually easy to reason about? That's half the battle right there. The other half of the battle is - how do we combine them in ways that can be reliable and also easy to reason about? So it's these two pieces– small pieces reliably composed– if you can achieve that, that's what I'm talking about. That's software that you can reason about. This has implications for data science as well. With data science, you're doing some kind of analysis on some data, and it starts out as, oh, I'm just doing these simple things. I'm doing this manipulation and then I'm doing this visualization. But then as you get deeper and deeper into it, it grows and grows and grows. You're at the point where you're at the end and you don't remember where these variables came from. You don't remember what's the difference between this data frame and this data frame. And you go back and hopefully start breaking it into functions or somehow dividing it into smaller pieces that each focus on a thing. Then you join those pieces together in your overall script. That's this principle of small pieces individually able to be reasoned about. When you think about other rules that you might have heard about software engineering – we know when you're writing functions, using global variables is bad. That's another one of these things where that hurts your ability to hold the entire function in your head and to prove that it'll work correctly. Because who knows who is setting that global variable to what value? You can't prove to yourself this function is definitely correct. I think if there's anything that you're working on that needs to be correct and you do care that the answer is right, I try never to stop when I have an answer but I can't reason about the code. I always try to go back and do the refactoring that's necessary just so I can prove to myself that the answer is right. The other big benefit to this is that those individual pieces– if it does turn out that there's a mistake somewhere, you can individually debug, test, unit test those individual pieces. When there are problems, you'll much more easily be able to find them. ______ ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We'd love to see you!) Come hangout with us! The Data Science Hangout is a gathering place for the whole data science community - including current and future data science leaders - to chat about data science leadership and questions you're all facing that happens every Thursday at 12 ET

Aug 4, 2023
59 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So happy Thursday, everybody. Welcome to the Data Science Hangout. Hope everyone's having a great week. If we haven't met before, I'm Rachel Dempsey. I lead our pro community at Posit. If this is your first time ever joining us at a Data Science Hangout, welcome. This is our open space to chat about data science leadership, questions you're facing, and getting to hear about what's going on in the world of data across many different industries and companies.

And so we're here every Thursday, except for July, at the same time, same place. So if you're watching this recording on YouTube later, you can also add these events to your own calendar using the link below. But together, we're all dedicated to making this a welcoming environment for everyone. So we'd love to hear from everybody, no matter your level of experience, the area of work, or industry. It is also totally okay to just listen in and just hang out here with us.

There's also three ways you can jump in and either ask questions or provide your own perspective on certain topics too. So you can always jump in by raising your hand on Zoom and I'll keep an eye out. You can put questions in the Zoom chat. And feel free to just put a little star next to it if you want me to read it out loud instead, if you're maybe in a busy coffee shop or something. And then third, we also have a Slido link where you can ask questions anonymously.

But I am so excited to have my colleague, Joe Cheng, joining us here today as our featured leader. Joe is the CTO and first employee at Posit, well back then RStudio, where he helped create the RStudio IDE and Shiny web framework along with countless complimentary tools and packages. And Joe, I'd love to get things kicked off here by having you also introduce yourself and just kind of share a little bit about your role and what it means to be CTO at Posit.

Yeah, thanks for having me. CTO is actually an honorary title. There are a lot of different kinds of CTOs in the industry. And one of them is basically an engineer that joined very early and it's an honorary title. And that's what it certainly it started out as for me. Started out as purely an acknowledgement that there are strategic conversations that the board and the executive team would like me to be part of. But my day-to-day was still 100% software engineer and then team lead.

And these days, I think probably 90 plus percent of my time is as a team lead on a Shiny team. It's a team of about 11. And every once in a while will do more CTO stuff, which is having conversations that are more sort of at the strategic level with the other execs to talk about the direction that we're taking the company in. So for my day-to-day, I try to do, you know, 50% coding and 50% leading the team. That usually is more like 15% coding and 85% leading the team. There's always emergencies and, you know, last-minute conversations that come up.

So I do a lot of code reviews. I do a ton of sort of helping to decide feature directions. I do some talking with customers. I do some support. Just really anything that comes up. And we try to run our team that way for the most part. Most of our roles are relatively loose and flexible. So we really value having everyone on the team, or certainly all the software engineers, to be really thoughtful about what kind of features would be interesting to add to Shiny, to be able to design their own features to implement and to be responsible for the quality and support of their own features.

So yes, for me, it's Shiny all the time. Shiny day and night. Shiny for R. Shiny for Python. Packages that complement Shiny. HTML widgets. For the last month, I've been working on a sort of the equivalent to the DT package for R, which is a way to view data tables, data frames, and data tables very quickly with a rich HTML interface. I've been working on that for Shiny for Python, where we have not had that capability in the past.

But yeah, for sure, most of my time is in meetings and helping other team members achieve what they need to.

Thanks, Joe. And I forgot to ask you this, but I usually ask everybody, what's something that you like to do outside of work too? For me, that's cycling. I don't do it nearly as much as I'd like to, but that's my favorite way to blow off some steam after a long day.

What is Shiny?

Thank you. Okay, so while we're waiting for some questions to come in from everybody here, thank you so much for the intro. And I'm wondering, if somebody had never heard about Shiny before, how would you describe it?

Yeah. So to me, Shiny is fundamentally about letting data scientists create interactive web applications to take their analysis that they're already doing on their own machine and taking the insights that they are gaining from their data on their own machine and being able to expose that to usually other people, sometimes to themselves, but usually to expose that to other people in a really dynamic and interactive way.

In the past before Shiny, I think you would see, I mean, it's still very common to issue a static report and send it to someone. And if there were lots of variables in your analysis, if there were lots of different ways you might want to slice and dice the data, you would either just add more and more pages to that PDF or HTML page. And we had a lot of people tell us that they would exchange PDFs with thousands of pages and good luck trying to find the page that you want. Or you would do this back and forth of like, okay, here's my best guess at what you want. And then you'd have this endless email back and forth of, how about this? How about this? How about this? Until they find the needle in the haystack that they're looking for.

So that's just one example of when it might be useful to take insights that you can easily gain for yourself on your own console and your own notebook, and to be able to turn that into something that empowers other people in your organization or that you're collaborating with, or just that you think might find your data interesting, to be able to interact with that data in a more dynamic and modern way.

That's how it started, that and teaching statistics. That was the other big thing that we were sort of very focused on in the beginning. And boy, it's been interesting what people have taken Shiny and used it for. So in terms of what it's for, everything from BI to machine learning and large language models and games and fantasy football draft analyzers, all sorts of interesting things people have done with it.

But one of the most important things for us is that it is accessible for anyone who calls themselves a data scientist. So you do need to code. That is very important to us. That's one of the core tenets of our company, is that we really believe in code first. But you don't have to be able to code in web development technologies. You don't need to be a web developer to create Shiny apps. You just need to be conversant in the language of data science.

Shiny vs. BI tools

Thank you. And I see an anonymous question that came in a little bit ago, which was, what's the elevator pitch? And then maybe further discussion points for investing in Shiny rather than using point-and-click BI tools.

Yeah, absolutely. That's a good question. And it's interesting. I had one thing in mind when I created Shiny, and then that has sort of evolved as I've gotten feedback from people who have spent a lot of time using both BI tools and Shiny.

My elevator pitch in the beginning was that BI tools are great, and you should use Shiny when those run out of steam. In the beginning, that was really about advanced analytics. So if you want to just draw some scatter plots and line charts, I mean, BI tools can do a lot. But if what you're trying to do is completely within the wheelhouse of BI tools, great. Like, no problem. Those are awesome tools. And if you find yourself wanting to use a predictive model that they don't support, and there are a lot that they don't support, or you want to pull from some data source that is more dynamic than a BI tool is designed for, then use Shiny.

And I actually had a number of people tell me that it was actually not like that for them. That, in fact, numerous times I've heard people have to deploy to BI tools within their organization, or if they're consultants and they're at a client, that the only deployment target they can feasibly use is, say, Tableau. They would actually build it in Shiny first and then port it to Tableau. And I was like, you're going to have to explain to me why you're doing that. That is very surprising.

And what they said to me was that a number of the advantages of just using code in the first place were just such big advantages that they didn't want to give them up when they were using Tableau. So first of all, it was very common, they would say, for them to work really hard and come up with this very sophisticated Tableau application, or Tableau dashboard. And then the stakeholder would say, oh, that looks great. Please just change this or that thing. And this or that thing, because of the way these point and click tools work, you would have to start the whole thing over. Whereas for a Shiny app, you could just replace a single line to use a different data source.

Or sometimes they would be like, that's great, and we want the same dashboard now for these 10 other departments, these 10 other scenarios. And again, you'd be in this position of the reuse was really a lot harder than when you're dealing with code. So that was one. The second people told me was that the extensibility was really frustrating for them. To build a BI dashboard that serves you well for a while, but then to know that there is always the possibility of some new requirement arriving that they would not be able to satisfy, whereas Shiny having a ceiling that's essentially, by comparison, it's infinite, was very comforting to them.

And the third is that the feedback I received was that if you know how to code, that doing a Shiny dashboard in code is just more fun than the point and click. Especially when you know what you want to do and you know how you would code it, it's frustrating to try to figure out how the point and click interface wants you to express that.

And then finally, just being pragmatic, people told me that it's like a better resume builder to have built a bunch of Shiny apps than to have built a bunch of Tableau point and click dashboards that it led them to better career opportunities. I'm just being real with you guys. That's not something like we put on the website, but I've heard that too many times to think that that's not a real motivation for some people.

Shiny for Python

Ethan, I see you had asked a question in the Zoom chat. Do you want to jump in here? Yeah, sure. Hi, Joe. I actually haven't tried Shiny in Python yet, but just want to get you some high level thoughts when you're developing Shiny for Python. Is there anything that you've learned from developing it for R first? And in terms of feature parity, is it a asset or liability for the engineering team when developing Shiny for Python?

Yeah, very good questions. I'd say one of the biggest things we learned, I'll say there's two main things that come to mind when thinking about the process of developing Shiny for Python. One, well, I'd say this isn't something I learned, but definitely some other people on the team learned. Jumping to Python was not as scary as it sounded. I think for a lot of people that are on my team, R and JavaScript, but mostly R, was the language that they learned to program in and have invested many, many, many years of their lives and careers in. And there was a lot of assumption in the beginning that we are going to have to hire a bunch of Python experts.

I don't want to oversimplify because I think it is not true that you can just pick up a book and learn Python really well in just a few days. You really need to spend time with a language to learn beyond just the syntax, learn the true Pythonic way to do things, and that takes some time. But when you are working full-time on a project with a bunch of team members that are going through the same thing, it actually did not take very long for our devs to be quite productive in Python. So that was one thing, is that on the technical side, it was not really a very difficult transition for the team to become bilingual.

And I think the second thing is that in Python, this might not be that interesting for data science, but for software engineering, Python in recent versions has gotten better and better at static type-checking, so the ability to mark up your variables and function arguments with not only the name of a parameter, but the type that you expect it to be, and then having tools that will analyse your code and make sure that every function is being called in an appropriate way. Those tools, I think, are mostly seen these days as a liability in data science, that they slow you down because you have to write this extra code. But when you are building something like Shiny, it's a huge help.

And that's one of the things that I think coming away from shipping the first versions of Shiny for Python, it's like, it would have been really nice if we could have these same tools on the R side to help us maintain Shiny for R.

Yeah, yeah. So do you mean feature parity between R and Python in general, or trying to achieve feature parity between Shiny for R and Shiny for Python? Shiny in specific. Oh, okay, yeah. So I think it is definitely a double-edged sword. It's super helpful to have built all this once before, because so many problems we not only have a solution for, but we thought about multiple solutions before we decided on one. And we can go back and reanalyze very quickly without having to invent too much stuff from whole cloth.

But the tough thing, I think, is that sometimes we are a little bit, we try not to, but we feel a little bit stuck with the decisions that we've made in the past. So the biggest one that we've been discussing to death, not, sorry, we've been discussing is Shiny modules is a feature for Shiny for R where if you want to, if your app is getting too complex, there's too much stuff going on in one server function or one UI, you can break it up into modules and those modules can be reused. They can be, they're namespaced. It really helps you reason about your app as a collection of smaller pieces. In Python, we have that feature as well. It's also called Shiny modules. Well, Python modules are a thing in Python and the name modules, like for us to move away from that for Python, it means now we have two. There are Shiny modules for R and then there are Python components for Shiny for Python. And that is super annoying for all of us.

Career reflections

Libby, I see you had a question in the chat. Do you want to jump in here? Yeah, my question is if you could go back in time to like 15 years ago, 20 years ago and tell yourself what you're doing right now, like what your career looks like, what your life looks like, what do you think would surprise your younger self the most?

Everything. Everything. I've been unbelievably fortunate in not only my career, but in my family life, in my relationship with my extended family. I mean, my life has, yeah, it's been an incredible run. And I think there were a set of things that I really wanted to accomplish by the end of my career. And I don't know, I sort of find myself in my mid forties, like needing to cast a new vision because all the things that I wanted to do, I have just had the great privilege and been so fortunate as to have those things sort of happen to me in my career.

And I think in particular, 15 years ago in my career, I felt like I had stumbled into this incredible industry, just talking about tech in general, that back then had a very different feel, like it felt like we were, it was not quite as mainstream. And I felt like I had stumbled into this, I mean, so fun, like such a fun way to make a living, to be writing programs all day with smart and interesting people. And to feel like I had like all my natural talents aligned with the skills and interests you need to be in software.

And the thing that scared me 15 years ago was, what if all of my career contributions through luck or happenstance add up to not very much? That was my fear, was that I have the education for this, I have the natural talent for this, I have the opportunity, but there's so much luck involved, will I get the opportunity to actually do something that is interesting and meaningful? And when I was working on RStudio, I was already feeling like, oh, wow, this could really be it. And then Shiny felt like, okay, this is overdoing it a little bit.

But I mean, it was like, I never had, I never pictured myself coming up with like a truly, I mean, I was going to say truly original idea. I mean, the idea is like 99% building on the shoulders of reactive work that was done in JavaScript. But even like the insight to apply it in this new way on a server side to data science, I never aspired to have an idea like that. I just aspired for somebody else to have an idea and I would be able to help and for it to be like really impactful.

So in our little corner of the world that is data science, Shiny has had a bigger impact than I was ever hoping for out of my career. So I think that would be the most comforting thing for me 15 years ago to find out it's not all going to add up to nothing. Like there is going to be some good that comes out of the stuff that you do that these ideas are going to impact people's work and impact the design of other packages that have been created both in R and in Python. So I'm just super, super grateful for that because that was like it was really something that I was very, very anxious about that it would all add up to nothing. So I just got a little bit real.

And the thing that scared me 15 years ago was, what if all of my career contributions through luck or happenstance add up to not very much? So in our little corner of the world that is data science, Shiny has had a bigger impact than I was ever hoping for out of my career.

On being vulnerable on stage

Thank you, Joe. And I saw it and thank you for the question, Libby. I see Lisa said, please answer this without making us tear up again like you did at comp. And it was just reminding me like I love listening to you present Joe and how you tell stories and how you can be vulnerable and put emotion into your presentations. And I was just wondering, how do you learn that? How did you learn how to get up on stage and make us feel all those emotions that we all did at our studio conference?

You know, it's funny, I can tell you exactly how it happened. I mean, I'd given a lot of presentations before that one, but that was the first one that I was like, what if I made this like 90% about the technology, but 10%, you know, share something that that cost me a little bit.

And it was actually because I was, I attended local church. And there was this pastor who got up, he and his wife, and they, in front of all these people and being live streamed, talked about how he had cheated on her in the first year of their marriage. And I mean, it had been, I don't know, 20, 30 years, but still like that the two of them would get up and talk about, like the most painful thing, the most painful betrayal that he ever inflicted on her that they would both get up and talk about it. I was like, holy crap, like, that, the courage that it takes to do that. And I like, I just got so much value out of that, because who shares that kind of stuff in, you know, live in person.

So I'm not trying to compare me saying, oh, I was feeling a little bit bad about my career to that. But I was thinking, like, this energy is so, like, it was so meaningful, you know, like, it was such a beautiful moment. And there was just no, like, pure head insight that could compare. And it so like, it made me feel close to them that they were sharing something that was so close to their, you know, I don't know, so vulnerable.

So yeah, I brought it up with my speaking coach. She was a little bit like, wait, why are you gonna do this? And I was like, I don't know, like, I just really love this community. And I feel like, you know, whenever we like the best parts of this community are not, it's not about the technical exchange. It's when we do have those opportunities to get real with each other. Like when people talk about imposter syndrome, from the stage, it, those are some of the most powerful moments. And I felt like, like, what if I did that from a keynote, because that story is true, you know, like that, that, that was not something I had to conjure up. I think about that moment all the time.

So yeah, I thought, oh, and actually, I have to give credit to Jessie Mostopak. I told that story to her. And she was like, get out of town. Like, as soon as you said, like, that was supposed to be my last day at our studio, I was, I was like, tell me the rest of this story. So she really, she, I have to give her I think the most of the credit for recognizing that that story was one of those, it could be one of those moments.

Debugging Shiny apps

Dan, I see you, you put a question there about how you're starting to develop in Shiny coming from Python. Do you want to jump in here with that? Yeah, sure. Sure, Rachel. Hi, Joe. I am a, you know, more of a Python developer, although I started years and years ago, many blue moons ago with R and really fell in love with it, and kind of coming back to it a little bit and trying to develop with Shiny. One frustration I have is just around debugging. And when you've got, you know, all these panes and windows and windows and UI widgets, can you share your thoughts around or best practices, tips, whatever, on the best way to kind of go through the debugging process?

Yeah, absolutely. And can you just clarify, when you say debugging, you mean in a more general sense, not just using an interactive debugger? Yeah, correct. Like, I do everything in RStudio at the moment. And I run the app, all of a sudden, it just opens up and then closes. Or I run something, it spools an error. And, you know, it's not always obvious where the error actually is. It could be in the output side, it could be on the server side. And just maybe the pattern for debugging in RShiny is just so different from what I'm used to.

That could be it too. Yeah, it absolutely is. Well, especially if your work on Python, was it around, you know, like notebooks and scripts? Yeah, you got it. Yeah. Jupyter notebooks, scripts, that kind of thing. Yeah, absolutely. So I think for those of you who did not use Shiny, especially, when you are doing sort of normal analysis with a notebook or a script on your desktop, your code generally executes from beginning to end, right? Like you might have some functions, but those functions are called when you call them. And you have loops, but they loop when you loop. It's very transparent, like where the interpreter is at any given moment.

With Shiny, and almost every other form of interactive framework, you provide code to the framework, and the framework decides when it executes. So it makes it much more, much less intuitive to debug, because like, okay, this code is executing. Why is it executing? Like, how did it get here? It's much less transparent when code is executing, why it's executing. So there's a couple things that I'll tell you from a debugging perspective.

Number one, I think that Hadley's book, Mastering Shiny, has a chapter on this. And we definitely have at least a couple of video resources where we've given talks about this. So I'd highly recommend those. But I have a couple of sort of principles that I always go back to as well.

Number one, and maybe this less applies to you if you're coming from Python, but stack traces are very important. So when you... Often when you get an error in Shiny, you get a big bunch of lines spit out at the console that say, you know, from this file, this function name, from this file, this function name, this file, this function name. And they often are like multiple lines. And often people find that scary, because a lot of the function names that are in there are function names that they might not recognize, because it's not code that they wrote that's executing. It's code that's inside of a package that's inside of a package that's, you know, calling base R, you know.

So it's very important to not be afraid of that stack trace. It is very important to take the time to learn how to read those and understand exactly what's going on. And what the stack trace is telling you is that at the moment that this error occurred, what were the functions in order that had been called? So if you think about, you know, at any given moment when code is executing for R, that code is probably in a function. And that function was called by another function. And that call was called by another function. And so on and so forth. And it could be 20 levels deep. That is called your call stack. And when an error occurs, one of the most useful things to know is what was the call stack. And when that's printed to the console, now it's called a stack trace.

So you can take that stack trace and you can read up it and just ignore the lines that you don't recognize and find the line that you do recognize that says this was the line of code in your app.R file. It was line 49. That was the thing that was executing when this error happened. And I think that, you know, for the easy cases where, let's say, I don't know, you use the wrong function name or you indexed into a data frame using an invalid variable or something like that, just understanding the stack trace, that will be enough for you to fix the problem.

So that's one. The second thing you can do that's super useful is, especially in Shiny, never forget that there is a JavaScript console also that might be printing errors. And this is, like, I feel really bad about this. We should do things in Shiny to surface this more prominently. But when you're in your browser looking at your Shiny app, if things are not behaving the way you expect and you don't see any errors in the R console, you can show the JavaScript console in your browser. And if you see a bunch of errors there, that's a big clue for you as well.

Often that means there's some kind of bug in a component that you're using, whether it's in Shiny or some kind of third-party component. But, you know, if you're raising a GitHub issue or whatever, that's some of the most useful information you can provide. The third, and I'll move on after this, is to so Shiny does demand that you write your code in a certain way. We have reactive outputs and reactive expressions, and you have to put your code in a certain place to get it to execute at the right time. That doesn't mean that your sort of data science logic, that your data analysis, your data manipulation, and your visualization, that logic doesn't have to live there. You can write functions that live off to the side. They can even live in a separate function. They can even live in a package if you want.

So you can write functions that are not Shiny. They're just R functions that perform the tasks that you want, and you call them from Shiny. You call them from your Shiny outputs. The advantage of doing it this way is that if something goes wrong, you can test these pieces in isolation from the console. So you can make sure that each of these functions that you create, that do your data manipulation, that do your data visualization, you can ensure that they are working correctly. And then your Shiny app collapses to just like a very small number of lines of code, and then it's much easier to reason about when things are being called and why.

You have to be able to reason about it

Yeah, the browser and breakpoints are super helpful when you need them, but if you, like on my team, I'm constantly using the phrase, can you reason about it? I mean, they must be so sick of me saying that, but like when you write code, it is not enough that it works. If it's important for it to be right, it's not enough that it works. You have to be able to reason about it, and I find like browser is the most helpful when you have lost the ability to reason about your code. So definitely use it, but also think to yourself, like if this was really the only way for me to figure out what was going on, maybe it's time to refactor my code a little bit also.

To go back to the question right before this, there was a follow-up question about, like, could you say a little bit more about what you mean by can you reason about it?

Yeah. It's my favorite topic. Yeah. So I think just to sort of reiterate, when you write software or do an analysis or whatever, there's a level of I got it to work. And then there's the level of, like, it works and I can reason about it. And what that means is, like, software, complex pieces of software are among the most complicated things that humankind has ever devised. You know, like, what other human-made construct can have, you know, hundreds of millions of pieces and yet they're expected to all fit and work together so precisely that if one token is off, you know, rockets explode and, you know, you get the wrong answers.

When you write software or do an analysis or whatever, there's a level of I got it to work. And then there's the level of, like, it works and I can reason about it.

And I don't know what the limit is, but, like, you know, even the smartest humans, there is some small number of variables you can hold in your head. There's a small number of operations that you can hold in your head at any one time. So when we work on software that's non-trivial, when we work on software that in its totality is more than the human mind, more than any one human mind can hold at any one given time, the main challenge in software engineering is about how do you take all this complexity and break it down into smaller pieces, each of which you can reason about, each of which you can hold in your head, each of which you can look at and be like, yeah, I can fully ingest this entire function definition.

I can read it, you know, line by line and prove to myself this is definitely correct if the functions that it's calling don't have bugs and if it's called in the right way. So with those caveats, like, if the things that this thing is calling are correct and I'm called correctly, then the result will be correct because the logic here is correct.

So software engineering at all but the most beginner level is a lot about this. How do you break up inherently complicated things that we're trying to do into small pieces that are individually easy to reason about? And that's half the battle right there. The other half of the battle is how do we combine them in ways that can be reliable and also easy to reason about? So it's these two pieces, small pieces reliably composed. If you can achieve that, that's what I'm talking about. Like, that's software that you can reason about.

So this has implications for data science as well, right? Like, when you have a data science, you're doing some kind of analysis on some data and it starts out as, oh, I'm just doing these simple things. I'm, like, doing this manipulation and then I'm doing this visualization. But then as you get deeper and deeper into it, it grows and grows and grows and grows. And you're at the point where you're at the end and you don't remember where these variables came from. You don't remember, like, what's the difference between this data frame and this data frame. And you go back and hopefully start breaking it into functions. Or somehow dividing it into smaller pieces that each focus on a thing and then you join those pieces together in your overall script. So that is, that's this principle of small pieces individually able to be reasoned about.

And when you think about, like, other rules that you might have heard about software engineering, like, we know when you're writing functions, using global variables is bad. That's another one of these things where that hurts your ability to hold the entire function in your head and to prove that it'll work correctly because who knows who's setting that global variable to what value. So you can't prove to yourself, like, this function is definitely correct.

So I think I, like, if there's anything that you're working on that needs to be correct, like, if you're just, like, zipping off some analysis and, like, it really doesn't matter, you just want a pretty picture or something like that, whatever. But if you do care that the answer is right, I try never to stop when I have an answer, but I can't reason about the code. I always try to go back and do the refactoring that's necessary just so I can prove to myself that the answer is right. Oh, and the other big benefit to this is that those individual pieces, if it does turn out that there's a mistake somewhere, you can individually debug, test, unit test those individual pieces. And when there are problems, you'll much more easily be able to find them.

Golem and Rhino frameworks

Sam, I see you had a question in the chat earlier. Do you want to jump in here? Yeah, sure. Yep. Okay, so I just had a question regarding the various development frameworks and what your thoughts are, development Golem, Rhino for production, great development. Has Posit considered releasing their own framework once upon a time before their release?

Yeah, the question is, if you guys are not familiar, there are a couple of packages that have been created to sort of provide more opinionated structure around how Shiny apps are written. So Golem and Rhino are probably the two, by far the two most popular. I have to say, I'm a little embarrassed, like this is a very important topic and I know people have like really strong opinions about this and in particular, I think Golem has huge proponents among people that I know and respect. I personally have not spent that much time with these frameworks.

I think just the tradition that I come from in my background in software engineering, I feel like a lot of people, sorry, this is not the only reason people use these frameworks, but I do think that part of one thing that I've noticed that people really like in a lot of parts of data science is, especially if you didn't like set out to become a programmer, but accidentally became a programmer. I think a lot of people feel like if they're doing what it says in the tutorial, if they're doing what it says in an article that Hadley wrote, then they're doing it right. And if they're doing something that they sort of invented or a pattern that they just came up with, then like that's a hack or that's a workaround or whatever.

So I've had a lot of people ask me things, you know, like on the Shiny issues or Shiny forums and say like, hey, how do I solve this problem? Or it would be great if you had this feature. I solved it this way, but it would be nice to have it for real. And it's like, the way you did it is perfect. Like what about this solution does not satisfy you? And they're like, yeah, well, but I did it. So I would really like a grownup to do it, you know?

And I think in the beginning, it felt like a lot of people were looking for a lot more guidance for how to structure their Shiny apps. And Golem provided a set of opinions that people could look to, you know, Vincent and Colin and say, those people are clearly experts. If I use Golem, then I'm like, I'm like allowed to do that. Whereas I had much more of an attitude of like, it's R, so you should feel free to package these things however you want. And you don't need anybody to tell you that that's, you know, okay or not. I do totally reserve the right to judge you, but you should do it anyway.

So I think over the years, though, it has, like, more and more people have told me, like, this is, it is not just about that. It's not just like a set of opinions that are sort of nice to have that they actually do try to provide like a lower friction environment for you to do development and have been very thoughtful about how you sort of run and reload your app. And I totally believe that it has value. And I just, I have to admit, I have underinvested in spending time with those.

Entering the Python community

Alan, I see you had a question in the chat earlier. Do you want to jump in here? Yeah, sure. Thanks. Hi, Joe. I'm really curious about, you mentioned earlier that you and the team had to think a lot about, and you were sort of nervous about, like, how do we learn Python? How do we learn what we need to know here to do Shiny for Python? I'm really curious now that it's like you're trying to gain momentum with it, trying to establish it, et cetera. Like, have you had to come up with a different way to approach the Python community than maybe you would have taken for granted in our community? Like, we know that community has changed so much over 15, 20 years. Like, what's it like to come into the Python community now? And how did that feel? What was successful? Was it hard? Curious how you come into that space.

It is 1,000% different. It is completely and utterly different. And, I mean, it's hard to enumerate how many differences that there are because there are so many. But the biggest by far is that, I mean, Python is just such a noisy space. Well, that's a really uncharitable word. It's a very vibrant and active space. I think in the early days of RStudio, the company, there was a lot of interest and activity around R. But it was not like Python is today. I mean, Python, just, you can't think of any niche and not find 15 packages that do that thing. And there are two more venture-backed ones coming out next week. I mean, it's just, the level of activity is on such a different level.

And it's quite, it was quite a wake-up call to come into that. You know, going to conferences and things like that and realizing, like, okay, not only are we not a well-known entity in the Python world, especially after we rebranded the company, but, like, the amount that you would have to scream to get over this noise is just incredible. So, that was one big takeaway, is that we really don't have the luxury of being heard or taking for granted that we're going to be heard with any particular message that we have. And that makes it so much more important that when we tell our story, that we know exactly who we're telling it to and that we know exactly what's going to resonate with them.

Because I think in the R community, when Shiny came out, people were willing to spend the time to find out, like, does this really make sense for me? And they were willing to do some work. And for Python, it's like, there's, like, 15 other options. So, like, why would you? And there are so many very, I think, because it's such a big community and there's so much, at least it's perceived that there's so much money to be made being content creators for this community or whatever other reasons people have for doing content creation. I've heard that there is sort of like a content treadmill that people get on that adds to the noise where it's like, you know, in order for the algorithm to take me seriously, I have to create a new, you know, Python package shootout every, you know, twice a week. And you can only do so much research, you know, in three days. So, you end up with just this incredible volume of, like, very shallow noise. And that's just not something I ever remember being a big part of the R community.

And I know it's, for Python users, it's both a blessing and a curse, right? Like, it's awesome to know that whatever it is that you want to solve, somebody is out there with either a free or paid solution for you. But now the work is figuring out which makes sense. And a lot of the analysis out there is not going to be too helpful.

I will say, though, that in terms of the people that I've met in the community, there are, like, two really big takeaways that I have. Number one, that almost everybody that I've met has been very kind and generous to talk with us and, you know, pretty nice. And I don't know, like, I had heard people in the R community say, like, you know, I have been in the Python community, and boy, it's so refreshing to be here and whatever. And at least the particular conferences that I've been to, and, you know, being 2022, 2023, I don't know how it was, you know, 10 years ago. But, yeah, I really saw a lot of the same things that I like in the R community happening in the Python community, maybe without the sort of slightly more tightly knit feeling that comes from, you know, being a smaller community for so many years.

But the other thing is that it's such a big community that I feel like it is very, very hard to generalize what people, like, what a Python data scientist even is. It is hard to even divide it into four or five groups. There are just, there's so much diversity. There are people who are, like, so much more commercial than the average R user is, who are so much more focused on, like, you know, this is a purely commercial sort of interest for them, to people who are so much more extreme free software zealots than I have ever met. And that's coming from someone who obviously, like, really, really believes in open source software.

I had, like, a couple people ask us at PyData Seattle, okay, so, like, you have all this open source stuff, but you also have, like, these complimentary, you know, commercial offerings. Like, to me, you don't count as open source. It's like, what? Or people who say, like, because your governance model is not open and you don't have, like, a democratically elected board, you're not open source. I'm like, like, cool that you have that position. Totally respect that. But to say that open source is the word that means the thing that I like, I'm like, wow, this is not something that I have ever encountered in the R world. So, again, the whole spectrum has been represented at each of the conferences I've been to. And I was, I guess I was a little bit surprised that it's not just, like, such a big community, but so incredibly, incredibly, incredibly diverse.

Security and pen testing for Shiny apps

I see there was an anonymous question from actually towards the beginning, I want to make sure I get to as well. And it was, I get questions from IT and leadership about data security with Shiny apps. What training or tools do you recommend for testing our Shiny apps? For example, like pen test?

Yeah, that is a really good question. I used to have good answers for this because we would receive the pen test reports from people. And I will say two things about those pen tests, I think. One, I was surprised how well those pen test tools worked against Shiny in the sense that, like, the tools didn't break, like they did actually run their tests and treat Shiny as no different than any other web app that they would test. So that was cool to see. And there were, you know, results that, like, once in a blue moon would actually be relevant.

But on the other hand, I think, like, in my experience, the false positive ratio to, you know, true security issue, that ratio is, like, it's really surprisingly noisy. And, like, noisy in a scary way, because, like, they sound very, very plausible. Like, these reports come back and they sound very, very plausible. But, like, nine times out of 10, they're just using heuristics and don't really understand, like, what the parameters they are looking at and what value they have.

So I would say if your organization says to use them, I would expect whatever pen test, whatever web-based pen test tool that you're using to work. But if you have red flags come up, expect to have to go through our support and talk it through with them. In terms of actually improving your security, I think a higher value thing is to read the, there's a very standard, I think OWASP is the acronym, O-W-A-S-P, the Open Web Application Security Project or something like that. And they have an ever-evolving list of, like, the top ten vulnerabilities, classes of vulnerabilities you need to worry about when you're building your web applications. And I found those to be consistently excellent advice. Like, every one of those vulnerabilities is one you should at least be aware of.

And if you're a more experienced Shiny developer, hopefully you should be able to at least be able to connect the dots between how they're expressing the problem and how that might affect your Shiny app. Like, you know, code injection is one example where you take a parameter from a user, and the classic one is you just take whatever name they typed in and you just, like, glom that onto the end of a SQL query string that you have, right, and put, like, quotes around it. But that's, that user, that input is coming from the user. They could put, you know, a single quote in their input, and that would finish your string, and now their input is being interpreted as SQL code. There's a famous XKCD about this. If you Google for Bobby Tables, you'll find it. So that's an example of, like, a SQL injection. That's totally something that if you weren't paying attention, you could introduce into a Shiny app, right? You could form a SQL query that you're about to send to, you know, Oracle or Postgres and add user input to the end. So there are things that you need to do to not do that. So for sure, run the pen test if your organization does that. But I think more important, read through the OWASP list of vulnerabilities.

Looking ahead

Thank you, Joe. I see that we have gotten to the top of the hour very quickly. This might be a record for how fast this time is going by. But I see a lot of people saying thank you so much, Joe, and Javier just said, Shiny single-handedly changed the trajectory of my career. So thank you so much. That's so nice. Something I wanted to ask you before we go is, what's something that you're most excited about as you think about the year ahead?

I think there's a lot of things to be excited about for Posit, the company. Oh, actually, okay. I'm going to start with my team, but it applies to the company as well. I'm really excited about some of the people that we have added to the team over the last few months. And I know, like, anytime you hire someone, it's, at least for me, I really feel like that's a, you never know how it's going to go, and it can really improve or hurt the team. And I