Resources

Profvis - Profiling tools for Faster R code | RStudio Webinar - 2016

This is a recording of an RStudio webinar. You can subscribe to receive invitations to future webinars at https://www.rstudio.com/resources/web... . We try to host a couple each month with the goal of furthering the R community's understanding of R and RStudio's capabilities. We are always interested in receiving feedback, so please don't hesitate to comment or reach out with a personal message

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, so as Bill said, I'm going to be talking about profvis and profiling tools for faster R code. So thanks already for coming, I just want to say that. So here's a question that probably comes up a lot if you work with R a lot, which is how do I make my R code faster? And that naturally leads to another question which is why is my R code slow? And my strategy for this talk is I'm going to walk you through a couple examples and show you some of the tools, some of the tools that can help you diagnose these problems and and speed up your code and hopefully we'll also learn a bit about a few particular things in R that might be slow and surprisingly slow and ways to speed it up.

Okay, so I'm just going to start with this example here. Let's say you've got you've got a data frame that has 400,000 rows and 150 columns and I'm just generating some random data here with our norm and I you know this this data is I picked it because it's a good size for this demonstration so it's there's nothing special about this it's just it's just a good size and good timing for this for demonstrations here. Okay, so one thing you might do is well I'll write a function that will normalize the column so let's say I want to take each of these 150 columns and make it so that the mean of that column is zero and so I take, sorry, so the mean of the column is zero so I take the original data for each column and then I find the mean of that and then subtract that mean from from each column so so I've got this function norm calls and what it does is it takes it takes the mean of every column by using the apply function like this and and then it subtracts the mean that I calculated from each column and then returns the resulting data frame so so this is just my you know my first a first pass attempt at normalizing normalizing these columns.

Okay, so then after I define this function then I would say norm calls data on the data that I've generated. Okay, so let's let's try that. So I've got my code here this is the same code that you saw before first I'll generate the data, this takes a few seconds, maybe more of a few seconds, but fortunately we don't have to do this part once. Okay, so it's done, find my function, oops, sorry, I typed an extra character there accidentally, sorry about that, so to find my function, oh I'm sorry, okay, I just trying to print out the data right now and that there's a lot of it so it's very slow so let me just try to stop this and restart my R session. Okay, all right, so starting over let's create the data, okay, and then we'll define our function norm calls and then we'll run the function and this time we'll wrap it in invisible so it doesn't try to print it out because that is very slow. So if you saw that took a few seconds to run that. Again, I'll do it again so you can see it takes a few seconds, it's sitting there and then returns to the console.

Using system.time and the sampling profiler

Okay, so that's, you know, this is a, it takes a few seconds and if we want to take a closer look at how to, the amount of time that it takes, sort of the first pass thing that you might do is just run system.time, so you have all the same code and you wrap it in system.time and this tells you how long it takes to run the code inside of it. All right, so let's try that, I have it right here and the total time that took was 2.6 seconds. Okay, so if you're trying to figure out what, you know, parts of your code are slow or how long things are taking, this is the first thing that you might do is just to see how long does it take, but then after that, you know, typically you're left to your own devices and you have to, you have to take a look at your code and think about what, sort of take guesses at what parts are slow and what parts are not slow.

So, and that can require, you know, to get an accurate idea of what the parts that are expensive, that takes a lot of, that can take a lot of experience and sometimes even, you know, what you think is slow is not, that's not the right, it's not actually slow. So in order to help out and figure out what parts are slow, so you can speed them up, you've got, you can use profiling tools.

So R has a built-in profiler, it's a sampling profiler and what it does is it will tell you what the R process is doing over time. So to use it, it's really simple, you just say R prof, then you run your code and you say R prof null and that stops the profiler. So what happens is, is this is, let's say this is time along the x-axis here, the R process does a lot of computation, it does a lot of computation and then periodically every, I believe the default is 20 milliseconds, every 20 milliseconds the profiler stops the R process and it queries it and says what function are you in right now and our code is very simple so it'll be in the norm calls function and then it records that, it saves that and then it goes back and the R process does all sorts of computation here, lots of computations, it's and then it gets stopped again by the profiler and again it's in the norm calls function so it writes that, it saves that again and it keeps going, it does this every 20 milliseconds until it's done.

So now obviously in this case, if this is all that it recorded, this is not particularly helpful information, we know that we're in the norm calls function all the way in this entire time here, but it actually records more information than that. What it does is it records these, not just the sort of the outermost function that's being called, which was norm calls, it will also record other functions in the call, what's called the call stack. So norm calls, it calls apply, all right, so this is our norm calls here, this is our function definition and that it first called apply and apply it turns out will call as.matrix, as.matrix.dataframe on list, so this is the call stack and then these are all R functions and then at the top of the call stack usually typically this, the functions that are at the top of this will call it to C code, which the profiler does not record and does not give any insight into.

So that's what happened, what might happen at the first sample and maybe the second sample, the third sample there's something different that happens like norm calls, calls apply, so apply then calls mean, which calls mean.default and so on, and then eventually after it does all the apply stuff then it will call the minus function because minus, you know, these operators in R are all functions. So this is the information that the sampling profiler will record.

So this is the information that the sampling profiler will record.

Now the data that comes out of our prof, the built-in R profiler is, it's a little bit, it takes some practice and some expertise to work with. So we have a tool, oops, oh yes, actually I should, oops, let's go back to here. One important thing to remember is that the, if this is like the full complexity of all the stuff that you're, that the computer is doing and all the processing that it's doing, this is what the profiler is going to tell you. It's a very low resolution view of what the computer is doing, but that's actually, but that's okay because, you know, for this for this sort of thing we're trying to, you know, we're trying to find the large expensive things, large slow expensive things and speed them up. So we don't need to look at all the fine detail typically to, in order to make improvements. We can see the large features here like the continents and the oceans and that's, you know, that's sort of like those are the large things that we need to look at for speeding up our code.

Introducing profvis

Okay, so as I was saying, the data that comes out of the profiler and the built-in tools for R, they do take some practice and expertise to use. So I've worked on this package called profvis which makes it a lot easier to interact with and interpret the profiling information that gets recorded. So to use it, you'd say install.packages profvis and then library profvis and then you just take your code and the same code and you wrap it in profvis, similar to how we ran, we used system.time before, but this time we just use profvis.

Okay, so I'll show you what happens if we do that. We've got this, this is our code here, profvis there, so let's run it and I'm running this from within RStudio and when you're in RStudio, it just pops up in this panel right here. If you're running it outside of RStudio, it'll come up in a web browser and the interactions will be exactly the same. So okay, so there's two parts here that show up in the beginning. There's the, this is the code here, right, this is all the code that we ran and then on the bottom there's a flame graph. So the code is pretty self-explanatory, well the code itself is self-explanatory. On the right there's two little graphs for each line of code that are useful. So the first is that, well the one that's most important is the time. So there's about 1700 milliseconds spent on this line of code and about 280 milliseconds spent on this line of code and as you can see, so in this line of code there is about 2,000 milliseconds. Now it actually turns out that this 2010 is the sum of these two lines of code. So it spent time in this line of code but this line of code all called out to this function and so it's ended up spending time within these these other two lines of code as well. So that's the time column. The memory column shows how much memory was allocated and deallocated within each line of code. So in the call to this norm calls function there is 2,000 megabytes of memory allocated and 900 megabytes deallocated. I'll talk a little bit more about that in a bit.

Okay so that's the code and there's this corresponding thing in the bottom called a flame graph. This is really important. So in the flame graph on the x-axis is time. So this is at the beginning when we started and it goes across over time as time was progressing and this is where it ended. So all of the time was all the time was spent in the norm calls function and norm calls called out to apply which we saw before in that diagram that I showed you and apply called as.matrix which called you know a bunch of other functions as.matrix.dataframe and unlist. Anyway after it finished doing after apply finishes doing as.matrix and it calls aperm and then it calls this function which happens to be the function we passed in which is the mean function and and then after apply finished then it the minus function was called the minus function and this whole thing took about about two seconds.

So there is this well there's this GC thing which is a garbage collection event this is this is when R allocates a lot of memory and then and then it when it decides it needs to free up some of the memory that it had previously allocated it does a garbage collection which stops everything all the processing and it just spends time clearing up memory and that's where that's where the memory deallocations happen so if that minus 903 megabytes that was the garbage collection event right there.

Comparing approaches for column means

Okay so now if we talk about this code a little talk about the code a little bit and what's expensive here so the apply function first it converts our data frame to a matrix and then it calls a perm on it which actually it's it's essentially transposing the matrix and so it does all of that work that takes you know almost what 800 milliseconds and that's just to transform the data into a form before it can start doing the actual work which is taking the mean of each column. All right so that's that seems like that seems kind of like kind of a waste so let's you know let's if we think about some other ways of getting the means of columns that might be helpful.

So here's one way that we can do it we can we can just find some other ways of doing this and then run profvis on them so here's four ways of doing it so the first was to apply the mean function using the apply function the next is to do column means there is a built-in function in R for taking column means and it seems like that should be pretty fast right if there's a specialized function for that and and there's two other ways that we're going to do here using L apply and V apply so it turns out that this is using the fact that data frames in R implemented as lists so a data frame is actually a list of vectors each column is one of those vectors so we can apply L apply the mean for all of those columns just by by calling it like this and the V apply does basically the same thing except it enforces a return type which might be useful but they're essentially the same thing so if we run this code let's go back to here we run it then this will give us the ability to compare these four different ways of getting means so the first is a apply we knew this was slow this time it's took about 1700 milliseconds the next way that we we tried was call means which is also it's much faster it's about four times faster but it's not the fastest way to do it which may be a little bit surprising and then L apply and V apply are both very fast so these are both over each of these is over ten times faster than our original method of doing it now again this profiler gives us some useful information not just it tells us not just that you know call means is slower but it also gives us some insight into why well oh yes and by the way you can zoom in here with this thing graph you want a closer look so the reason that it's slower is because again call means like the apply function converts the data to a matrix first and it actually you know if you read the documentation for call means it says that that's what it's for but it is for matrix data but if you give it a data frame it will of course it for matrix and that's that turns out to be an expensive operation and then after that this little empty block here this is where it's calling out to C code to actually take the means and this is then it's very fast but I will plan V apply or you know they don't have that that matrix conversion step and so they're they're overall much faster than calling call means or applying.

Okay so so then we can go back and have a more optimized version of the code in this example I used V apply just because it's a little bit more convenient to work with the the data that comes out of it but it's could have used L apply here as well so we just make a copy of the data take the means of each column and then we would modify that data frame by subtracting the means all right so let's let's take a quick look at that okay so so this is our original thing we can see you know we worked on the most expensive the the slowest part of the code first which is this line and now that part is much faster this is the corresponding part in our new code and in this part this if now it's this line that's slower so so there's actually some room for improvement here but this is already you know this is much much faster than our original code our original originally it took let's see how long was it about two seconds to run to run a like to take a do the normalizing and now it's taking us about 400 milliseconds so it's about five times faster just by doing that one optimization but again there's a room for a little bit more improvement here but I'm not going to go into depth about that.

Second example: parsing profiler output

Okay so I want to show a second example this is this is something that actually this is a problem that came up while I was working on profvis and haven't working on a process in the data so the the data that comes out of the that's recorded by the profiler looks like this it's actually good stage for a text file and this is a each line of text represents one sample one moment in time and this is the call stack so it's actually in the order so applies at the bottom of the call stack and then as a matrix as I made it for matrix that data frame and then unlist and the next samples like that and then you know a sample further down the line might only have might be calling apply and then a perm etc etc and there's about 750 lines of this and we need this to be transformed into a data frame that looks like this where each each row in the data frame is represents one of these samples and it says and it records the original row that it came in which is essentially the time which time step it was at and then the column which was uh well it's in reversed order here but this is this is essentially the depth in the call stack and then the label so the way it does this is it takes each line converts it to a data frame and then into a small data frame like this this you know this four row data frame each line gets turned a data frame and then it sticks all those little data frames together that's that's how it how it did it originally and I'll just I'm just going to walk through this code real quick and you can look at some of the things that might seem like they're slow all right so first we read in the read in the the text file we create a list to store all this all those those data frames that we have little data frames that we create and then for each line we process the line of text so first we save the line and variable we split up the line on spaces then we create a data frame from that and then we serve a data frame in the list and then we just loop over and keep doing this for each line and then at the end we combine all these data frames together by you calling do a call and our bind okay so let's just run this code real quick and let's see this is okay so it takes about 400 milliseconds here and we can see what's expensive.

So well let's actually let's take a look first why don't we take a look at some of the things that seem like they should that they might be expensive so so these are these are typical things that typical things that everyone in our community well not everyone but people in our community are rules of thumb that are for things that are slow so first you know this list that I have here it's not pre-allocated I'm using a for loop which is you know people tend to say to avoid for loops because they can be slow and then I'm growing this list in the loop so because I did not pre allocate the list every time I add something to it it has to allocate a whole new list so these are things that seem like they should be expensive but if we take a look at the the actual profiling result here that's not necessarily the case of expensive so again took about 40 milliseconds to do all this and most of the time as we can see was spent here creating data frame there's some time spent splitting up the string but more much more time spent creating the data frames and at the end putting the data frames together with the do call and our bind.

So so that's that's interesting it's you know actually storing the data and growing this list in the loop that was actually not very expensive it looks like that took about 20 milliseconds total so it's that's a negligible amount of time compared to all other stuff that's that's happening so so yeah so let's just goes to show that sometimes it's counterintuitive what the expensive parts are creating data frames it turns out is expensive in R.

Let's just goes to show that sometimes it's counterintuitive what the expensive parts are creating data frames it turns out is expensive in R.

So now I won't go into detail about this but the way that you can solve this or that I solve this particular problem was instead of creating a data frame for each iteration of the loop is to put the data in a list because that's essentially what a data frame is it just it's it's essentially a list but a data frame enforces a few a few things like each of the vectors needs to be the same length but if we're careful we can just use a list instead and then I create a data frame at the end by calling data dot frame and then extracting out each of these columns from the from the list of lists that I have but again I won't go into detail about this but I just wanted to show you that now that we since we know what the actual expensive parts are here instead of spending time trying to optimize the wrong thing like converting it from a for loop to you know I'll apply or something like that it's been time fixing the things that are actually expensive which is the data frame creation if I run this this faster version of the code the whole thing took takes 20 milliseconds okay as opposed to 400 and that's again that's that's we can do this because we know we know what that but the actual slow parts of the code are and that's that's what the profiling tools give us information about.

RStudio integration and sharing profiles

Okay so those are my two examples and and as you can see I've been using profvis with RStudio and I believe this still requires a recent daily build of RStudio which you can find if you just google for it RStudio daily build and RStudio has some other nice integration features which I want to show you so first is that there is a profiling menu so I've been running all of this stuff by you know wrapping these call all my code in profvis but you don't have to do it that way if you're using RStudio so what you can do is you can say select your all of your code that you want to run this is again this is not wrapped in profvis this is like if you're running your this is your you know your code in your whatever our file you're working on you can say profile and then profile selected lines this takes a few seconds and then it shows up here like that so you don't have to you don't have to modify the code and add profvis in there anywhere and once you have it like this and you want to see if you want to save it for later you can just click on the save button here and let's just say call this temp so let's save it as temp dot our profvis it's if I close that and then I find it the file here so it's temp dot our profvis and if I don't click on it it'll open it up again here and also it turns out that if you want to share this with people there's a couple ways of doing it if you can actually what you can do is this our profvis file is actually it is really just an HTML file so if I if I rename it to temp HTML and double click on it it opens it up in a web browser and and I can interact with it the same way that I was interacting with it before okay because it lives just a web page so that's so that's one way of sharing it is that you can save it as an HTML file and send it to somebody but you can also there's a publish button here and if you hit that you can publish it to our pubs so and if you're not familiar with our pubs it's just a free hosting service that that RStudio runs for for documents so I'll just publish it and I'll say this a profvis example continue and then it's online at this URL here so then you can share this URL with people and you can you know if you were trying to if you want to show them how the profiling of your code and and you know what's fast and slow you can just send them that URL so that's it's a nice convenient way of doing it.

Okay and one other thing I want to show you is that you can also do that's previously I selected my code and I ran profile profile selected lines another way that you can do it is you can say profile start profiling and then run your line then run your code so well I guess I only have to run this one here so I started profiling then I ran that line of code and it's still profiling but then if I hit stop it stops and then it pops up the information here but this you when you do it this way you don't get the source code showing up here so that's it's it's not quite as nice that way you don't you don't get to explore your the code that you are running.

Oh yes and it's awesome as I mentioned before you can use profvis without RStudio you know if you're using our in the terminal or our GUI or say are from emacs and you run process it will open it up in a separate web browser instead of instead of popping it up in that in the RStudio panel like I was like I was working with there.

Wrap-up and Q&A

Okay and alright so that pretty much wraps things up so some things to remember hopefully from this you've learned how the sampling profiler works and you've got a feel for the profvis interface and and also we've learned that sometimes the performance bottlenecks are counterintuitive and it's really useful to have visualizations tools like profvis in order to help you know figure out what those bottlenecks are so that you can fix them now if you go to the the website for the package it's RStudio github.io slash profvis there's a lot more information there's a lot of details that have glossed over in this talk and if you're curious about those here's about learning more about it you can visit the website there.

Okay and I think I can take some questions now so first question is I run code of our cluster is there a way to pull the profile result to local I I don't think that the profiler will give appropriate information if you're running code on a cluster or you know in a distributed system so I don't think unfortunately the the profile records the information that's necessary in order to to give you that the information that you want there.

Let's see there's another one do you have to define the functions in the profvis call or can you just call the function that's that's a really good question so in my examples I was defining the function inside of the profvis call and what that gives you is that when you do it that way you can view the code in the code viewer if the code is if the function is defined outside of the out of the profvis call then you won't be able to won't be able to see it in the code viewer so that's what that gives you now there are ways of loading if you're say developing a package and you want to load a package that you can see all the source in the package there's ways of doing it which are in the website but but generally speaking it's it's best to run all the functions in profvis so that you can you can explore them.

Okay when I tried profvis the spring in RStudio on a Linux machine it did not work it only worked in Windows RStudio any progress that is surprising to me so I it should work in RStudio in Linux and if it doesn't that that's a bug so now it's possible that if you had an old version of RStudio it might not have come up in RStudio if that if that was the issue but profvis itself should work fine across all our platforms.

Okay how will this work with shiny that is a really good question um why don't I try to find an example here okay oops sorry it's RStudio okay so I have an example here of a shiny application and I can I can run it interactively too but on this website there's an example about profiling a shiny application actually I'll just run this code since it's right there see my studio okay so I'm just using run example to run one of the built-in shiny examples no very shiny let's do that okay so this is our tab sets example the slider on a few times summary okay and then when I close the shiny app profvis well you know that it's finished you know process the data takes a little time to process it okay now here we go let me pop this out so it's a little bit larger.

Okay so this is the shiny app there's a so the whole thing was within this run example which then called run app and the important things to look at are this these outputs here is this so this is blue or well I guess it's might be hard to tell it's a it's actually a light blue output dollar plot here and you notice I moved that slider a bunch of times and so every time it calls output dollar every time I move that slider it would re-render this plot and so this gives you a view of like what's taking time and I happen to have installed shiny in the way that it displays all the source code for shiny here so you can actually look at this rule this it's showing me all the internals for shiny when I when I mouse over it which is that's actually the reason that this is a little bit the inner interactions are a little bit slow here so I'll put dollar plot calls a bunch of stuff you know and then it called end up calling hist which was in the in the example and I call plot you know and so on and so on zoom in and get a little bit better look at this but this gives you an idea of like what's you know what takes time in the shiny application now this is a you'll probably notice that well you'll notice that the the call stack is significantly deeper than in those simple examples that I showed earlier but it's actually far more complicated than that there's this high to internal function calls checkbox here and you split it.

Okay so it's I have it split vertically instead of horizontal yeah instead of horizontally so this is actually the full complexity of what's happening in in shiny and you can see there's just you know there's a lot of there's a lot of function calls and so we you know normally we try to hide all that unnecessary information by using hide function calls and so you get you see something like this that's all it's a little bit easier to interpret and again now this is also in the if you go to the the website for profvis there's it has a same example and and on the website there's all this all these things are interactive as well so that because because they are profvis just as generates web pages so it they can be embedded into our markdown documents which is how this is all generated so I hope that that's helpful that gives some understanding of using it with shiny.

Okay let's see so here's a question I don't see the profile item in the top command list it says I'm running with the current version of RStudio do I need to use the daily update so a daily update of RStudio so I'm using RStudio about RStudio this is version 099 1281 these daily these daily updates there's like sort of the release version of RStudio and then there's these daily builds the release versions will say they're up-to-date but if you want all the cutting-edge featured all the cutting-edge features then you want to use the daily builds so we can just you know say RStudio daily and and then it'll come up here RStudio daily builds I'm on a Mac so I'd say desktop Mac and then they're just right here so I would just download and install this.

Okay here's here's a question we discussed why 2,000 megabytes is allocated but only 903 was deallocated did the resulting new data frame need one gigabyte of storage and that was that was back with the it was this example here so what happened was it allocated a lot of memory while it was taking these means it didn't deallocate it all because as you can see during this processing there was a garbage collection event that happened part way through but at the end there wasn't another garbage collection event if there was another garbage collection event here then I believe the total memory used would have been it would have deallocated the same amount that it had allocated well not exactly because it's still needed to allocate some memory for the means but that's a that's relatively small so and there's ways of triggering the garbage collection you can there's a function called GC which will just trigger garbage collection so if I had done that at the end it would probably come up that way.

All right so here's another question why is matrix sizing a data frame so computationally expensive it's it's actually not computationally expensive it's it's just takes a lot of memory it uses it takes requires a lot of memory allocation and and moving stuff around memory that's the expensive part in fact a lot of times in our when when you have things that are that take a lot of time but they're surprisingly you know they're things that are slow in our but are surprisingly slow it's often because they're moving a lot of memory around in a way that's inefficient so some of this stuff that I did here you know using instead of convert to a matrix and then transposing with a perm which is what apply does just using L apply directly that that saves time because it didn't it didn't have to move stuff around in memory a lot.

Okay another question here can I save the profile file within our code so instead of using the other RStudio interface and it is possible so if you want to be able to share this with people without you know outside of RStudio or doing a programmatic way you can do it so there's a frequently asked questions section of profvis and it is how do I share there it is okay so you just run profvis save that save that in verbal P and you say HTML widgets colon colon save widget and then and then set the file name that's that's how you'd save it without using without using the RStudio graphical interface.

Okay this is a really good one in the flame graph do blank the blank spaces correspond to C++ code a CRC plus plus code being executed or time that you aren't clicking on anything in the shiny site so let's see so where is that last okay so in this example so the blank spaces so I think I well if it's these blank spaces this means that there's or at the very top above everything else but that's time that is spent probably in C code there's spaces between these you know this this sort of empty space between things this is just because in these areas the call stack is not as deep there weren't as many function calls so that's why there's nothing in between them now there are these cells that are white there are cells in the flame graph that are yellow and there's ones that are white the yellow ones are ones for which we have source refs and that means that we can show that we can show you the source code in the in the code panel on the left so that was the code for this shiny application and for shiny itself because as I mentioned I had loaded shiny with with all the source the source refs and so that I can view them all I can view all the shiny code in that code panel the white stuff these are we did for these this is calling code that we don't have source refs for so these are for packages that we don't have what's called source refs installed and that's this is a these are probably in the base or yeah that these functions are probably in the base package and those are not installed with source refs by default in R so we just don't have them there's ways to do it so you can install you can install it these packages with source refs but I haven't done that.

Okay okay how do you install shiny to show internal functions great okay we were just talking about this so again this is in the frequently asked questions sections of the website let's see how do we get code from an R package to show in the code panel so there's this question here and you need to install the package with source refs so we basically typically well you'd say install packages in the package name type equals source and then you could pass it this install option with keep source and so any package that you install that way the internals of that package will will show in the the code panel of profvis.

Does coding simpler but nested functions worse in memory usage I mean is it a good practice to define a function inside of another function that will call it you know defining functions within other functions in R has a very very low memory overhead and and also memory and time overhead so that's actually it's pretty much there's very little cost to doing that and if you define functions within functions and it can often help like just to mentally sort out your code so I you know if you have a place where you're thinking about doing that it's it's probably a good idea but that but there's no there's probably no real performance benefit or or penalty for doing that.

Okay here's another one will profvis works with code being run in parallel say with a parallel package and I believe the answer is no I haven't really tried this out myself I think somebody told me that it did not work and that's because the built-in our profiler doesn't record information for with parallel that that can be useful it doesn't record the information that you need when you're profiling using parallel.