Tom Schenk & Bejan Sadeghian | Making Microservices Part of Your Data Team
Making microservices a part of your data science team Led by Tom Schenk & Bejan Sadeghian at KPMG Timestamps: 1:09 - Start of presentation 4:00 - Challenges and trade-offs of a growing team (how I stopped worrying about hiring) 8:14 - What are microservices? (help separate out the different layers of an app) 9:36 - Hosting other web technologies on RStudio Connect (ex. React) 12:25 - Simple Hello World example of microservices 16:00 - Reason to separate out logging 17:15 - How to design & plan microservices (moving from a monolithic Shiny app) 17:51 - Challenges to getting started with microservices 21:02 - How do you address getting started? (domain driven design) 23:17 - Applying cloud design patterns 25:37 - Separation of development duties 27:22 - Addressing any risks that come with microservices 29:17 - Considering costs and benefits 30:11 - Microservices in action: demo of KODA app (making changes to the organization) 36:21 - PowerBI interacting with the same microservice from Connect 38:30 - Growing teams face a trade-off of complexity and simplicity (KPMG's path) Questions: 43:28 - Can you use a Shiny front end together with a microservice backend? 44:09 - Do you hire separately for back-end data science development and front-end Shiny UI development? 46:00 - Are all microservices managed by a centralized unit? 47:02 - Who can access RStudio Connect in your organization? 48:29 - When you decided to go the microservice route, what was your first step? 50:47 - What roles are you hiring for? 52:20 - Might you suggest some web service servers that host R-based or Python services? 53:33 - Are apps build with microservices as responsive as those that adopt a monolithic architecture or do microservices introduce a lag? 55:09 - Can you show the back-end response data through developer tools? 56:45 - Can you speak more about the logging microservice? Did you build it ground-up or did you adopt an off-the-shelf package or app? Abstract: Whether or not you’ve heard of microservices architecture, you may want to know how microservices can help you scale R-based applications across an enterprise. As data science teams—and their applications—grow larger, teams can experience growing pains that make applications complex, difficult to customize, or challenging to collaborate across large teams. This meetup will discuss what microservices are, how it compares to Shiny, how it can help a data science team, and how you can deploy microservices using your RStudio Connect environment. This meetup will help you understand several key items: • The basic concept of microservices and benefits, such as making your code modular, domain-driven design, and reducing the complexity of application development, and facilitate larger development teams. • How to use the Plumber package to deploy APIs as part of a microservices architecture. • How you can work with front-end development teams using their preferred framework (e.g., React, Angular, Vue) using RStudio Connect. We will show a widely-used application built using a microservices architecture and hosted in RStudio, including before-and-after comparisons to show the strengths of a microservices framework leads to a better-looking and better-functioning application. Our team will discuss the journey and growth to arrive at the new approach to make development easier within a quickly growing group. Speaker Bios: Tom Schenk Jr. is a researcher and author on applying technology, data, and analytics to make better decisions. He's currently a managing director at KPMG. He has previously served as Chief Data Officer for City of Chicago. Bejan Sadeghian is a director of analytics at KPMG and leads data science development, which spans from advanced analytics to machine learning engineering. For upcoming events: rstd.io/community-events-calendar Info on RStudio Connect: https://www.rstudio.com/products/connect/ To chat with RStudio: rstd.io/chat-with-rstudio
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, everybody. Thank you so much for joining. Welcome to the RStudio Enterprise Community Meetup. I'm Rachel Dempsey. I'm sure I've met many of you before at meetups like this. Thank you for joining again. But for today's meetup, we will learn how the team at KPMG is scaling their data science applications across the enterprise and working with front-end development teams using microservices and RStudio Connect. So Tom will be kicking things off first. Tom is a researcher and author on applying technology, data, and analytics to make better decisions. He's currently a managing director at KPMG and previously served as the chief data officer for the city of Chicago. Tom will get us going here and then turn it over to Bejan. Bejan Sadeghian is a director of analytics at KPMG and leads data science development, which spans from advanced analytics to machine learning engineering. I'm so excited to turn it over to both of you. Thank you so much for joining us today.
Thank you very much. Thank you for having us here. It is exciting to talk about R in production. I'm a longtime R user and developer. It goes back to my time in grad school when Hadley Wickham and I were both at Iowa State in grad school. I was using early versions of Reshape and ggplot, well before ggplot2. And so it goes all the way back to then where it was a statistical language and now seeing the language mature to talk about being used in production and going to talk about how we use R in production at KPMG.
So I just want to be clear. KPMG is a consulting firm, but this isn't about what we've done for clients or a use case that we're advocating for. This is how we are using R within our environment and particularly about using microservices and able to scale out data science across the enterprise and the reasons why we opted to go down the microservices route. And there's been meetups here and there's been conversations elsewhere talking about using Shiny and how to scale Shiny across the enterprise and the tactics and techniques to do that, which is absolutely a potential way of doing it. We've taken a different path by trying to use microservices to be part of the data science team.
And we're going to talk about why we went down that path, the benefits that it's provided us, some of the challenges that you should anticipate to see when using microservices, and absolutely giving you at least a couple of demos today to show how this actually works in production. So I'll hop back, I'll hop right into it and then Bijan will take over for a good chunk of the conversation and hopefully have a good opportunity for Q&A at the end of this presentation.
Challenges and trade-offs of a growing team
So as I mentioned, we're going to talk about a few things, six things in particular. What are the challenges of growing a data science team? I've built a number of data science teams at this point in the commercial sector as I do today. I've worked in the public sector when I served as chief data officer for the city of Chicago. So talk about the challenge and trade-offs of what it looks like to grow a data science team and why microservices start to enter the equation. Second, talk about microservices. What are they exactly? There's a lot written on the topic of microservices. I've got a couple of my favorite textbooks here that I have read around the concept of microservices. And then we're going to dive into a demo, a hollow world, very simple example of a microservices. Then talk about how to plan and design for microservices and take a look at a demo, an application demo, and then finally time for Q&A and a recap.
So the challenges and the trade-offs of growing a data science team, or what I call is like how I stopped worrying about hiring, because this is something as the team is growing, you start getting to these concerns about how do you scale the applications and how do you scale your team to work across one or many different applications? So as a getting good presentation, we're going to summarize this as a graph. And so there's always this trade-off that we have in our work as data scientists or software developers between complexity and what I just call being hackish in terms of trying to implement something. And we probably have all felt it. We're working on trying to do something that's more complex, but we're trying to do it without hacking around, being too clever on trying to implement a solution, trying to back in engineering something to make it work, because that creates long-term liability.
You might be the only person that knows how to deal with a particular solution that you've built. So in that upper left-hand quadrant there, you see that sort of programmer's lure where you do a lot of complexity, but you don't do it in a very hackish way. In the lower right-hand corner, that Rube Goldberg zone of doing something that's not very complex, but doing it in an absolute hackish way. And that 45-degree line is really that balance and trying to recognize, okay, how do we do more complex things again without being too hackish on the solution? And this is something that Bijan and I and our team think about quite a bit. How do we do our work well is essentially what this summarizes to. So we're going to talk about how microservices helps you do things with greater complexity or complex needs, but without going into too much of a hackish zone.
So this is something that we often see within growing our analytics tool set. So as you see in those big bubbles, we talk about the progression of user needs over time. So when you build a data science application or some sort of solution, you know, immediately you make folks happy. You make your data scientists happy. You make your developers happy because you say, hey, somebody needed something. I got a version one out there. Everybody's fantastic and everybody's happy. And then the user needs increase. And sometimes that's just, okay, need a few more graphs. Okay. There's more complexity, more interactivity that's needed. There's other things that need to be polished, nicer looking buttons. Then there's things that start to kind of grate at you. Like, well, I really like these graphs. Can you have them so they can be exported into PowerPoint? Like, okay, we can. Don't want to do that, but sure. If that's what you needed.
And then after you get all that done, a user might come back to you. He's like, actually, no, my priorities were different. I need something completely different. And that's, this is that progression where you're trying to balance with your data scientists on their happiness and your developer team's happiness. So they can be engaged. They continue to get that satisfaction working in projects and try to avoid some of that grading aspects on progressing and changing your applications and solutions over a period of time. And when you do this in Shiny, there's a number of different issues that pop up. One of which really is it becomes more and more difficult to have multiple developers on working on the same piece of code. Because the way the Shiny applications and the application structure tends to work, you tend to work on one or two individual files. Now there's workarounds to this. Certainly you can source other files, you can do other tricks, but it gets complicated over a period of time.
So you might having two coders who are trying to work on the same segments of code, somebody who's working on a very large segment of code, other people might be waiting on that other segment of code to be done. And there's oftentimes this conflicts that start to arise. And we're going to talk about a project today that was getting to 15,000 lines of code in Shiny. And it was creating a large number of conflicts as the development team was trying to work on new features.
What are microservices?
So it has led us to microservices. If we take a look at those different challenges and those balancing needs, we want to start talking about microservices. So we're going to describe this at a high level and then we're going to dive into it quite a bit. And I will say there will be a forthcoming blog post that dives into the technical details a lot more. So we're going to address this at a cursory level for a period of time. So in short, this is a very short version of it. Microservices really help separate out the different layers of an application. So there can be a web or user interface that allows somebody to navigate information, but that's separate from the underlying logic of the application. Or in the case of Shiny, those things get mashed together into a same or what we call a monolithic code structure because it separates it out because underneath that interface, you have a series of APIs or web services, RESTful APIs, things like that, that allow you to build APIs that control that interactivity.
So when somebody is clicking on something in a web browser, it's communicating via APIs behind it. And behind those APIs is a data storage or database technology that allows you to query and bring in data. And that separation between the user interface and APIs and web services helps simplify your code structure and also allows different people to work on the same bits of code. And the reason why we're talking about this here today is because that web interface and those APIs and web services can be completely hosted on RStudio Connect. RStudio Connect basically is a web server and you can host web files on there. If you upload index.html, it will render that in addition to Shiny and everything else. So it allows us to host things such as a React or Angular or other web technology within the RStudio Connect environment.
Microservices really help separate out the different layers of an application. So there can be a web or user interface that allows somebody to navigate information, but that's separate from the underlying logic of the application.
And for us, we use the React technology. And again, that's hosted on top of APIs and web services, which could be done in either Plumber, which we have done in our team, or using Flask in the Python language. So what is a simple example of a microservices look like? Let's say we have a bunch of time series data and we want to do a forecast of that time series data. And you're actually going to see a demo of this here in a moment. But to explain the architecture, you have data that's holding on to historical time series information. So all those entries about historical entries for the time series. Then you have a model. Let's say it's an ARIMA model that can do the forecasting that is written by data scientists. And then on top of that, you deploy APIs and web services that communicate back to that analytical model, that in turn communicate back to that database, that will actually do the forecast. And then that forecasted data is then presented up to the website or the dashboard that allows the user to see that in a visual format.
And so the reason why we separate that apart is because then the front end, that user interface, can be developed by an independent web developer or a full stack developer. And you can have web designers working on that. So you don't need to hire that shiny developer anymore and focus on that very particular skill set, but be able to bring in web development technologies that are more broadly used in other environments as well. Meanwhile, the web services can be programmed by your data scientists because it can be done in R, the Python language, in addition to the data engineer or backend developer or other full stack developers to be able to build out those web services. And then again, this is in contrast to a shiny application architecture where really that front end and all that logic a lot of times is buried in the shiny piece of the application. And so that restricts the number of individuals who really can do work on that piece. And the takeaway of why this is so beneficial, it really allows that front end development technology to flourish because you can tap into the entire ecosystem of development technologies that can be brought to bear with your web developers.
Hello world demo of microservices
So I talked about a time series example. And so it'd be great to then share an example, actually take a look at a code base of what does that look like? What does Microsoft services look like within an application using a very simple hello world example? And for that, Bijan is okay if I turn it over to you.
Absolutely. Yeah, I will take it on from here. Like Tom mentioned, this is a very simple demonstration of a microservice that would do some prediction. I'll get into a little bit of the details of the back end in a second. But this took not more than a day to put together. The front end is in a React application and the back end is hosted on RStudio Connect using Plumber. This demonstration is pretty simple. A user would come in, they would say, I want to see a forecast of some time series data over the next 48 periods. We'd hit submit. What's happening now is the client side is making a request to RStudio Connect to our API. It's making that prediction with our pre-trained model, and then it's responding with all of that information.
The benefit of having those two separate, as opposed to having Shiny kind of handle it all at once, is that Shiny has to render the front end and do the computation on the back end. That's fine for one person. But when you try to scale things, that puts a lot of load on the server. Having this separation lets the client side do some light calculations and rendering, and then the heavier calculations on the back end. Again, very simple demonstration of what microservices can do. One thing that I want to also kind of call out here is that one of the benefits of having them separate like this is that you can actually touch the back end without the front end.
I'm not sure if I can zoom in on this one. I'll just speak to it briefly. I'll get into a little bit more on the testing, which is what is using Postman, which is what I'm showing here. But one benefit of a microservice back end is that you can touch those back end points without having to go through a web interface. So one thing that I just did is I made a prediction, but that prediction service is actually calling another service that does our logging. So I can also make a request and see, OK, well, I just made a request two minutes ago for a prediction.
Just being able to touch the back end from different systems is a huge benefit for microservices because you don't have everything encapsulated into one code base.
So what I'm showing here is the architecture of that relatively basic microservice app. So we saw the web interface, which is a React based application, and I made a request to one microservice, one service that does our forecast. That service also made a request to a separate service that handles all of our logging.
Reason to separate out logging
This may not, it may not make sense to, you know, why do you have these two separate until you start thinking about when you start building other products, you probably don't want to use the same logging service. So you get that benefit of not having to build in every single app and instead call to the one instance that you have from every single one of your apps.
One example I can give you is about a year ago, we had a fundamental change to our logging code where we were using a package at the time. Every single one of our Shiny apps, which was roughly about 20 of them, had that code installed when they when they were published to RStudio Connect. When we had to make that change, we actually had to have all of our developers pause their work, spend about two weeks making that change in their code, testing out their code, and then republishing their applications, every single one of them. Had we had a micro logging service, it would have been as simple as changing the one service, doing the testing as well, but it would be substantially less work.
Had we had a micro logging service, it would have been as simple as changing the one service, doing the testing as well, but it would be substantially less work.
So that's the reason for the separation and a good reason why you'd want to to separate those duties instead of having it all on one code base. And then the database is very similar to Shiny, you would have some sort of storage that you don't have your your app save data or put it on a file system or something like that.
How to design and plan microservices
So now I'll get into a bit of the lessons learned that we came across while transitioning over to a microservice architecture. About a year, year and a half ago, we started the first application on this this type of architecture. So we were moving from a monolithic Shiny application, as Tom mentioned before, it was the application that grew to about 15,000 lines of code. There were some challenges with the monolithic architecture that that more or less pushed us into making a decision of do we do we make the switch now? So a year and a half ago, we started that switch. And the intent of this part of the presentation is to talk through some challenges that we ran into and how we overcame them.
Challenges to getting started with microservices
So there are four pretty overarching challenges with getting started with microservices, and the first one is getting started with microservices can be very daunting. There's a lot of material out there. Microservices are defined very similarly, but slightly different depending on which source you look at. And on top of that, planning for your application is really a very open ended problem because you're thinking of a service that could potentially be reused and a future use case that you have no idea about. And so what you want to do is you want to plan for that service that's general enough to where it could be applied in the future and you wouldn't have to change the interface at that point in time, but not too general to where you're trying to build for the world.
On top of that, because these services can be used with different user interfaces, it becomes a lot more important to define what that interface is, because if you do have to change it in the future, you have to go back and you have to test all the older versions or handle version control very carefully. So getting started is quite a challenge and we'll talk to that a bit here.
Separation of developer duties comes with a lot of benefits because you can hire for specialized skill sets, but at the same time, coordination amongst your team is even more important in that case. The application that we built over the last year and a half was actually a worldwide team from literally all parts of the world. We had maybe a two hour window of everybody being online at the same time. And so we'll get into a little bit of how we overcame that challenge in a second as well.
The third thing here is that microservices are great for scaling things out, but they do come with their own set of risks that monolithic apps don't necessarily have. I'll talk a little bit more about that in a second. And then finally, the whole microservice architecture, it generally as a data scientist, there was a big knowledge gap there. The way to kind of address that is to attend meetups like this and later take a look at our blog posts, but do your own research as well. Data scientists traditionally may or may not know a lot about design patterns or HTTP requests. So there's generally some knowledge gaps. On top of that, smaller teams and smaller projects may not get as much of a benefit out of switching to a microservice architecture versus a monolithic app, just purely based on the fact that microservices may take a bit more upfront cost. And if you're not planning on reusing those services or have no intent to ever reuse those services, it may be cost that you don't necessarily want to take on. So that's a fourth challenge. The cost benefit of it is something to always consider.
So how do you address getting started? So one thing that we started with is using domain driven design. If you search microservices, this is very commonly what pops up first. Domain driven design is using the business domain. So knowledge about the business to conceptualize what you actually need to be building. And then you take that and you build those as entities in your system. So I've listed some steps here. The first thing is that you want to identify your entities in your business domain. In this graphic at the bottom left here, I've just put up a simple kind of e-commerce sort of thing. So you have customers who are their own entity, you have products that are their own entity, and you have orders that are also an entity. Those are three entities in a very simple system or a very simple business.
The second step would be understand how those entities relate to each other. So customers will search for products, they'll make orders, orders will have products. Relatively simple relationship. But the point of these two parts is to then get to this third part of what services do you need to do? What actions do you need to take on these entities? So on the right hand side here, for each one of the entities that I have, I have user HTTP API verbs. So you get a sense of, OK, for a customer, I need to be able to get the list of customers. I need to be able to create customers. I need to be able to edit customers. I may need to remove customers. And so that defines the endpoints that you individually would need to create. And those endpoints create that service for customers. Similarly, on products, you may not need to be able to edit, which is the put request or delete. And so you may have a little bit less work for products. But again, you create that service for products and then similarly for orders.
The key point for this slide is that following these three steps gets you from this kind of wide open field to these are what I need to be focusing on. There may be things around them, but these are the things that you need to be focusing on.
Now, going into the second part of planning, which is applying design patterns and specifically cloud design patterns. So I mentioned that these entities in these relationships define the kind of areas that you need to focus on, but there are things around them that may pop up as well. That's what design patterns help you identify. So there's plenty of material out there as well on design patterns. But three that we've used are known as sidecar patterns, anti-corruption layers and backends for frontends. What these do is they categorize your services that may not necessarily apply to products, for example, but may serve a very particular frontend that needs to know what the list of products or have products formatted in a very certain way. You can write a service that services that frontend specifically.
Very commonly, we have websites that are designed for web and mobile, and they unfortunately have some differences. Being able to write a service that supports each one or handles one versus the other helps you kind of abstract that out of having one service handle too many things. Similarly, sidecars are things like the logging service that I pointed to earlier. There are things that your application or your client will never touch, but there are things that your backend architecture will touch or things that are used behind the scenes that are never seen by the user.
The third one here is anti-corruption layers. I bring this up because while we were switching over from a monolithic app to a microservices app, there was a transition we had to take. We had a legacy system that we had to interface with our newer application. Anti-corruption layers are designed for that. The intent here is that you serve from the legacy system to your new application, but you don't have to include that kind of information inside of your app services. The idea of microservices, starting with micro, is that you have them very specialized, doing one thing very well, but have logical separation to where one service is not stepping over another service.
The next part here is separation of development duties. I'll just go into a bit of what we did. We defined a very structured documentation format for every one of our endpoints or every one of our services, and we also utilize Slider pretty heavily. So a lot of these details will be coming in the blog in the future, but to go at a high level, every single endpoint that we built that comprises a service started with documentation, this exact documentation on the right here. And the intent was to have front-end, back-end, every developer on the team be on the same page and know exactly what is going to be the interface. Like I mentioned before, if you change the interface, everybody else has to change the work that they're doing if they're touching that. And so establishing that up front was crucial for us to be effective.
We found that this kind of format helps in a lot of ways because we define what's required, both in a request and a response, or what comes back in your response, as well as types. And an example generally helps go a long way. On top of that, one of the great things about Plumber is that it builds your Swagger documentation automatically. And so while you're building your microservice, you can kind of validate what you're building, or anybody can validate on the team what you're building, against the documentation. They just simply have to go to the URL on RStudio Connect and click on the right endpoint.
Okay, so the third part here is talking about how to manage the newer risk that comes with microservices. So one of the big risks of microservices is that they can be distributed. A monolithic architecture is generally one virtual machine or one machine that's running everything. So all the calculations for the back end and for the front end are all from the single node. A microservice could be on a Lambda function, it could be on another virtual machine, it could even be external to your company. It could be using a third-party API that serves for a certain reason. All of those come with risks because every degree of freedom you give to your system will introduce a potential failure point.
And so being able to do functional testing and regression testing easily is crucial. And microservices and APIs being a very fundamental thing to the web, there's a lot of supporting documentation and software that will help you do this. I showed Postman earlier today. Postman is what we used for every single one of our endpoints. Every endpoint that we developed, we, during our pull requests, reviewed the test as well, made sure we had adequate testing, adequate coverage, and made sure that it was working when it was supposed to work and then caused forced failures when we were supposed to force them. Postman is also great for performing schema tests as well. That's something that's in a loosely typed language like R or Python. It's not very common to check your variable type, or if you do, it kind of clutters up the code. Postman can do schema tests as well. So you can expect a number when you can test if there's a number when you expect it.
And finally, the last point here, I don't have a slide on, but the point being is that there's a lot of research out there, and even you attending this meetup today is a step in this direction, filling in the knowledge gaps and considering the costs and benefits. So I just want to call that out as that's something that if there's some overhead with microservices, there's not a lot, and there's multiplicative benefits in the future if you start reusing these services. But for each team, it depends on the size of the team, the amount of repeatable work that you do, as well as the knowledge gaps that you want to overcome.
Application demo: Coda
So now I'll switch over to a demonstration of the application that we built using microservices. This application is called Coda. It's the KPMG organization design analyzer. So Coda is a tool for users to be able to analyze a company's census, so the entire organization structure. There are a couple of views of the organization that we provide and some statistics as well. The intent is that you can do a quick analysis if you need to. You can reorganize the organization if you see people that are in finance reporting to people in sales that may not be so efficient for your organization. The intent of Coda is to be able to assess, organize, redesign, do some quick analysis or comparisons of your design, and then finally be able to ship to your customer.
So what I'm showing here is actually the reason that we switched into a React-based front end. This is what we call a radio plot. And this is an organization where, in this case, the CEO is in the center of the org, and the rest of the team is in the back end. So the CEO is in the center of the org, and then their direct reportees are the next layer out, and then the reportees outside of that. Doing this in Shiny became quite cumbersome, and we'll talk about that in a little bit, because you're working with two languages that are kind of speaking interchangeably within the one language.
But Coda allows you to do things like, for example, if I wanted to select this person and reassign them here, I could reassign their entire organization, everybody underneath them here as well. I can just do some simple things like reorder these dots around. And I can do things like, I won't run them here, but we can create cuts of the same visual and produce any number of charts, because generally our clients do ask for PowerPoints.
Again, this is an analysis tool. So R&D is colored over here. We see that this person seems to be the lead of R&D, but there are also some people over here that maybe they shouldn't be managing, or maybe we need to separate that into another person, because it's such a wide span of people.
I won't go into the details of every one of the features. There's some comparison abilities as well. And then also just a relatively simple dashboard that shows your organization at different levels, broken up by the departments that are at each level.
Now if I go over to the implementation side. So this is where users start to make changes to the organization itself. Again, same organization, same CEO. If I wanted to, let's say I want to search for a certain person. Let's do Ferguson Page. I would find her, which would be over here. If I wanted to reassign them, I could move them to this person here. And so now I've kind of modified the organization. Also can add in new people to the org by clicking here. And one of the features that I personally love is the fact that you can actually track all the history that you're doing here and roll it back. If I made a mistake here, I can remove that.
The Coda application started as a shiny app. We were embedding D3 visuals, especially the radial chart inside of the application. We actually had an instance of this as well that I think was really JavaScript, not necessarily D3. But those two things, we started getting into the habit of writing code. If a user clicked on this node, if a user clicked on a node here, what was the piece of information that they were looking for? And so we started to figure out how to do that. JavaScript messages from the back end to the front end to trigger a callback to see if a user clicked on this node, if a user clicked on a node here, what was the piece of information that they were looking for? So we were data scientists writing in R, writing in JavaScript to get this all to work. And so that's why we transitioned over to have React developers write this in JavaScript and D3, but communicate to them using the language that we know, which is R.
That was the genesis of Coda and why we made this big switch. On top of that, the styling and everything was much easier to do in a framework like that. The four screens that I'm showing here, I touched on the assess and design. So the radial chart and the design of an organization. I didn't go into the compare the percent just for time's sake, but the idea here or the intent to show this is that any of the visuals that we have, we have people with the right skill sets doing this. And the people like the data scientists working within their own professional areas as well. It makes for that kind of separation of duties makes for a lot faster development because you don't have a data scientist learning how to write JavaScript.
And then just a very basic kind of application level diagram of the application I just showed Coda. So very similar to the simple demo that I shared earlier, we have a React web app, but we also have a Power BI dashboard that interacts with our microservice. So I mentioned earlier that microservices could be reused. This is a perfect example of that. Power BI is interacting with the same microservice that our React app is. And so now we have two interfaces and one that is quick to, we now have the opportunity to say, okay, well, we want to build this in React because we need high interactivity. We want to build this in Power BI because maybe we want to ship that out to the client as a dashboard. And so now we give us the freedom to use the best tool for the right problem or for the problem.
I mentioned sidecars earlier. Similarly, in our application, we use that for logging some user authentication and for monitoring. And on top of that, one additional kind of major benefit is being able to test the endpoints either in a production server or in a staging server. And we do that. We do that regularly. So every single release that we do, we test the entire app, the entire set of services. And we do them periodically as well. Netflix actually has an idea of a chaos monkey, which is a testing node, but a testing node designed to try to find weaknesses in your system. That's something on our horizon as well.
Recap and benefits
So going back to what we showed earlier, what is that tradeoff between complexity on users needing interactive graphs, which we're trying to do in D3? And that was creating a lot of complications, just debugging that to being able to export things into images, into PowerPoint, where you can actually export things. Where these new things were coming along and it was creating some discord within the data science because they're needing to learn more. And that just that's very taxing. That's very tiring. So this allows us to have people focus on front end technologies versus the data science piece of it versus the back end piece of it.
So this is our path where we first deployed in Coda and other applications that we have. Coda is the example that we use here, that monolithic Shiny application. So it allows us to do a lot more complexity with not trying to hack around that. But then we got into JavaScript messaging and then we got into trying to get D3 to work within Shiny. And then it felt like we were doing too much hacking around. And over time, we knew that that was going to be a risk to the maintaining of the applications themselves. So by switching over to microservices back end and React front end, it allowed us to increase the complexity of what the application was asking for. But get us out of that danger zone of trying to hack things together to be able to make it work.
So we talked about some of the benefits. So those benefits is we allowed that division of labor and specialization, bringing people to focus on what they do well. To be able to program in more advanced user interface. What you saw here running today was all running off of RStudio Connect. So I think subjectively, qualitatively, is a very different looking thing than what people typically expect to see running within an R-based, largely an R-based application. Has multiple consumption points. Developers can hit those APIs. Power BI, BI applications can hit those APIs. Custom front end applications can. Other programs can hit those APIs. So people can consume those. They're reusable. And you're relying on a wider universe of tools and libraries.
The R ecosystem is amazing. There's hexagons everywhere that allow you to show interconnectivity. But all those are really trying to wrap in some other technology. This allows us to directly reach out to those other technologies. So again, avoids that sort of hackish region. And as Bijan mentioned, there's a cost benefit there. You just need to see, does this make sense or can I rely on the R ecosystem exclusively? And also the Python ecosystem. It becomes more code agnostic and really allows you to adopt formal DevOps procedures. Continuous integration, continuous deployment, code versioning, Git. All this is underlying everything that we've talked about today. Because you need to be able to do that very well.
But Bijan and the team, we've mentioned a number of challenges. It does require more planning. There's a cost benefit. For every project, do we do pure Shiny? Do we do this microservices approach? Requires the paramount, the strong team coordination. So because it's not all in one thing, it's in different pieces. You just need to be coordinated on those different pieces. I describe it as building a bridge starting from both sides of a river bank. You start on both sides and you need to be able to meet in the middle. And if you don't, that bridge isn't going to meet in the middle. And that's always what this boils down to. And so there's a unique risk within those distributed services. And additional skills and resources may be required. But if you're a growing team, this is kind of the point of it. If you're a growing team, you can make hires in front-end development and other specializations to allow them to really flourish and focus on that. And again, you're looking at industry standard technologies.
I know we got one more slide, but I think we can really wrap it up. I think we're two and a half minutes over for what we're going for. But thank you for having us here today. We're really glad to contribute to the ecosystem of knowledge, which is the R community. Hopefully this is helpful for you. We had to do a lot of research when we were trying to implement this themselves. As we mentioned, we'll do a technical write-up as well to get into some of the nitty-gritty of what makes this work versus not make it work. But Rachel, I'll turn it back to you if we have any time for Q&A, if there's any questions.
Q&A
Yeah, there are a lot of great questions here. But I just want to say thank you so much, Tom and Bijan, for an awesome presentation. It's great to see APIs and microservices in action. I always say we're all clapping. You can't hear us if we're all in the same room clapping for you. Thank you so much. I'll go over to Slido. And just a reminder, if you want to ask questions, you can use the Slido link, and you can put your name in, and I could call on you, or you could just ask anonymously too. But one of the questions was, can you use an R Shiny frontend together with a microservice backend?
Yeah, I can take that if you want, Tom. Yes, the answer is yes. So you could use any frontend technology. Shiny is a perfectly fine frontend technology as well. You would simply just be making the same calls that a Python script or React app would do.
Awesome. Thank you. I see Rahul asked a question on Slido, and Rahul, feel free to jump in if you want to add any other context too. But it was, do you hire separately for backend data science development and frontend Shiny UI development? We do. So we have a dedicated frontend development team now. So as the team grew, we separated the responsibilities of the data scientists who originally did some pieces of frontend development. We separated that out, so now there's a specialized frontend development team. That frontend development team both consists of React-focused developers, but also UI UX designers. So there's an entire suite of design tools that are used to mock up of how an application should look like, which is fantastic because instead of having to program something to show a user, you can show them wireframes or pretty advanced mockups, and those can be immediately exported into a structure of a webpage, which allows the React folks to just start populating everything so they don't have to take that sketch and then kind of say, what color did you use? All that kind of automatically imports into the HTML CSS that allows them to be able to do our work.
And our data scientists focus right now on two things. One is API and backend development, and then also data science. So they still have some development components that they do, but they're programming in R, and soon now Python. We've been focusing on R. We're going to be doing more Python coming up. Allow them to create those APIs, but also do the pure data science work as well. So we've now separated those responsibilities. When we began, we were a team of five individuals. We're now 45 individuals. And so as our portfolio has gotten very large, Coda and two dozen other applications, that has allowed us to separate those functions.
Excuse me if I missed this, but are all microservices managed by a centralized unit? So they don't have to be. One of the big benefits of microservices is that you can launch them in any service that makes the most sense. So right now we do a lot on RStudio Connect because it does a fantastic job of not only hosting, but load balancing too. That's a big reason why we were able to move so quickly. But you're able to use Azure functions and again, any sort of third-party services as well. So if there is a service that your firm decides that is more beneficial to purchase and it's cheap enough, you can also integrate that into the same suite of microservices that you're doing and you save yourself the headache of developing it. So they don't have to be on the same compute node. They can be as spread out as it makes sense.
Someone else asked, who can access RStudio Connect in your organization? So right now we protect it by a two-factor environment in the Azure cloud. So only KPMG practitioners can access it. Eventually we're looking at getting to the point of managed service or something like that. But it's right now KPMG employees.
Another question was, how difficult or easy is the testing given the very nature of microservices? So for example, with the distributed environment. Relatively easy. So in the distributed environment, there's still a URL that is assigned to each one of your service. All you have to do is know that URL and if it's like a RESTful API. So you would need to know that URL and the verb that you would use. So a GET request or a PUT request. Postman does a pretty fantastic job of letting you kind of set those tests and also create a collection and share that with the rest of your team too. But you just need to know those two pieces of information. And then from the user perspective, you don't need to know where it is. It just, it's there. So relatively straightforward.
But when you decided to go the microservice route, what was your first step in building that architecture? So I think Bijan, you and I, I flew down to Austin and we got into a whiteboarding session together to talk about what is the approach? How do we implement the microservices within our application architecture? How does it overlap with our code and version control management and our continuous integration, continuous deployment? And what did that look like as a portfolio? So the first bit was, is understanding where our challenges were. We, in the application development, as we're working in this case of Coda, we're trying to do a lot more noticing that there was a lot more bugs that were happening, development timelines were harder to plan for. And just, you could, as somebody who's programmed myself in my history, you could tell it was just, it was getting very difficult for the team to navigate forward, even though the team was entirely dedicated to the project.
So understanding where those challenges were really allowed us to have a very productive conversation about how to implement microservices in a way that was designed to benefit the team and not just to do something a little bit different. From there, in terms of implementing the microservices themselves, Bijan, and this is where you touched on in your area, about domain-driven design on this, of then designing those microservices within the application that made sense. What becomes interesting, and now we're starting to tap within our team, is within a given application, there's domains that you can think about how it works in the application. But now we have several related applications that might reuse some of the microservices that were originally built in other applications. So there's this sort of metadomain that we're now considering of, this application does something interesting or something that we want to consume from how did that interlap with this application that sits over there. So now we're having that conversation of design around the entire portfolio.
Okay. I had a question I wanted to ask the team because I heard you were hiring and I'm just curious to know a little bit more about what roles you're hiring for. So as I mentioned, we're growing large. We're at 45 individuals now across four different countries, predominantly in the United States. And right now we're hiring for a machine learning engineer. So we're looking for somebody who actually has a good understanding of the technology, for instance, of what we talked about today and how to implement that and how to continue