Resources

How to use {pointblank} to understand, validate, and document your data

How to use {pointblank} to understand, validate, and document your data - Rich Iannone Abstract: This workshop will focus on the data quality and data documentation workflows that the pointblank package makes possible. We will use functions that allow us to: (1) quickly understand a new dataset (2) validate tabular data using rules that are based on our understanding of the data (3) fully document a table by describing its variables and other important details. The pointblank package was created to scale from small validation problems (“Let’s make certain this table fits my expectations before moving on”) to very large (“Let’s validate these 35 database tables every day and ensure data quality is maintained”) and we’ll delve into all sorts of data quality scenarios so you’ll be comfortable using this package in your organization. Data documentation is seemingly and unfortunately less common in organizations (maybe even less than the practice of data validation). We’ll learn all about how this doesn’t have to be a tedious chore. The pointblank package allows you to create informative and beautiful data documentation that will help others understand what’s in all those tables that are so vital to an organization. Resources mentioned in the workshop: * Workshop GitHub repository: https://github.com/rich-iannone/pointblank-workshop * pointblank documentation: https://rstudio.github.io/pointblank/

Feb 8, 2026
1h 53min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello, everybody. We are excited to keep the positive and wonderful momentum going for the rPharma 2025 workshop series. And there's been a whole bunch of fantastic ones already. I am super excited to introduce our instructor today.

I definitely consider one of my R friends in IRL. I've known him for years. And he has built amazing tools in the R ecosystem. You may know him for his awesome GT package, but he does way more than that. And we are going to be talking about the pointblank package today, which is one of those underrated gems in the R ecosystem, and more recently, the Python ecosystem as well. So without further ado, I'm really excited to introduce our instructor for today, software engineer at Posit, Richie Ong.

So we're really excited to have you here. And I'll be watching the chat for any questions. And I'll serve them up your way whenever you like. But the floor is yours. We're ready to get all into pointblank. It's one of my favorite packages.

It's actually, OK, I know there's a Python workshop just before. Just so you know, we're doing this in R. But there actually is a pointblank for Python, just so you know. So if you were using R and you just had to go into Python, when you use pointblank, you can just transfer those skills right to the Python, which is pretty close to it.

Just a little tidbit of information. I'm going to share my screen. I am hoping that this is very large, large enough for everybody on the workshop. You may want to zoom in a couple of notches. Just a bit more. OK, yeah, maybe like this. A little more. A little more, OK, here we go. This should do it.

All right. That's more for the on-demand recording, especially. Yeah, yeah, good call. So what I'm going to do is I have everything that is here is actually in a repo. It's in a repo under my personal repo, richemn slash pointblank hyphen workshop. So all that you see here is there. And you can always get it at any time. And of course, it's available inside of Paws of Cloud, which is where I'm using this. But here it is right there, pointblank workshop. It's right in front of me.

So basically, I'm going to go through in this next two hours just a few of these QMD files, basically the first, second, third, and the fifth, I believe, is what I'm going to do. That'll fit nicely during the two-hour time. And it'll leave lots of time for questions and for your own experimentation. So let's get right into this.

Introduction to data validation

OK, so let's first introduce what is data validation. So pointblank is for data validation, obviously. And the workflow that we use inside this package is we create something called an agent. It's just a function called create agent. So that's the first thing you got to do. That's where you actually give the object your data, like the table you want to validate. OK, so that's very important. It's kind of like the first step.

And then in the middle, there's as many functions as you want, this declaration of validation steps using validation functions. OK, the more you add, the more stringent that your validation process is going to go down. OK, and then finally, there's a third step, the interrogation of the data. This is where the agent finally carries out the validation tasks. Up to that point, it just has the data, but it doesn't actually do anything with it. So you have to use this to sort of initiate the whole process.

So let's actually run through this. I'll take a table that's inside the pointblank package. It's called smalltable. It's really very small. I'm just going to run this real fast. And we see here, it's got 13 rows, eight columns. It's pretty good for just validation tasks, just to see, just to experiment with the package. OK, so I'm going to close this up.

OK, so I'm going to do that first thing. I'm going to call this object agent1. We're going to use the createAgent function. First argument is table. We give the smalltable from the package. We could optionally give it just a friendly name, smalltable like this in quotes. That's just for the display. In the end, you're going to get a report, and it's good to see the actual name of the table. So you're just sort of echoing that right here. Another thing you do is add a label. This is just a line of text. Again, that appears in the report. And I'll show you the report.

It's going to be like this. I'm going to hit Play, and it appears right here. OK, so I have to scroll a little bit because this whole area doesn't actually capture the entire table. What we see here in the top, pointblank validation plan, no interrogation performed, because all we did in the first step here is just create the agent. We didn't add any steps. So we see that this table is actually empty. So let's make it less empty.

Adding validation steps

Let's add some steps. So in this case, we're going to take the agent, again, that we already made. We save it over the initial one. But this time, we're going to add some steps. And there's lots of these. There's probably 30 of these. And but the good thing is they all sort of start with the same sort of pattern. There's groups of these. They start either with callVals, callIs. Sometimes they're special names like rowsDistinct. But there's just a few groups of these validation functions. So I'm going to add a few of these. And I'll go through what these actually do a little bit later. I just want to show you the entire process first. So let's actually run this.

OK, so now we get another table as a result. Again, we see no interrogation performed because we never used the last step, which is interrogate. But I wanted to show you what happens if you just don't do that. OK, so we get a number of steps. And this essentially is the validation plan. We have a number of steps. And what will happen when you use interrogate right here is that those steps will eventually have data.

OK, so we see it right now. OK, and now this is actually a good time to understand what this does, what the individual steps do, and what this table shows. So what we've done here, I'm just going back a little bit, we're checking that the column values in column D are greater than or equal to value 0, so a static value. We could compare it against other columns. But in this case, we're just choosing a static value inside of this value argument.

The second step is we're checking for values within a set. So column F will have text values. And we're saying that we expect values in column F to be either low, mid, or high. OK, two more, or three more, which begin with col is. So we're saying columns E and D are logical and numeric, respectively. And that columns B and F are character columns. OK, finally, for the last validation step, we have rows distinct. So in this one, we're saying that all the rows are unique.

OK, those are our expectations. So we see in the actual output, though, it's kind of nice to see this because we sort of see our validation functions being echoed here. And if you hover over them, we sort of see plain English expectation text. So that sort of echoes what we expect to see. And if you scroll a little bit to the right, we see something called units, pass, and fail.

OK, so units are, for each step, there's individual atomic test units. So for these callVals functions, we're checking values down an entire column, and there's 13 rows. So really, there's 13 individual checks. And because we see there's 13 units and 13 pass or validation, and we have none that fail, we see that all the columns match this expectation that all values in D are greater than or equal to 0.

Great, and we also see other ones, which are identity things, like columns are different types. They have one test unit because they're checking one thing for the type, and we see that they all seem to pass. It's only when we get to rows distinct, where we expect distinct rows across all of our columns, basically each row is different from the other, we get two failures in this column. So out of 13 test units, 11 pass, we have two failures. That's because we don't have entirely distinct columns in our small table data set.

Another cool thing is that we see right here a button, CSV. So we can actually click that, and what we get is it just downloads a CSV, which is kind of cool if someone doesn't have access to pointblank or R, and you just give them this file, and they're curious about which actual rows are the failure rows. They can actually look at this and see what the problems are, and this is available for all row-based validation functions. They can produce these if there are failures.

Another cool thing is that we see right here a button, CSV. So we can actually click that, and what we get is it just downloads a CSV, which is kind of cool if someone doesn't have access to pointblank or R, and you just give them this file, and they're curious about which actual rows are the failure rows.

Understanding the validation report

OK, that was like the very first thing, and I want to check if there's any questions at all, because I see there's a few, but really it's just Phil. And OK, that's great. OK, but just know if there are any questions, feel free to ask in the chat, and I will take a break, and I will answer those.

OK, so I went over the validation report. Here it is in text form. I didn't go over everything. Basically, step is important. That's the column, the first column. Each time you use a validation function, it will make a new step. Sometimes one use of a validation function will make multiple steps. Say if you define multiple columns in a column type check, it will just expand to several rows or several steps.

Columns, if there are column-based validation checks, this will become relevant, and there'll be names of target columns in there. Same goes for the values column. If there are any values associated with something, say, for instance, you're checking for a column greater than some value, it will appear in this column. So it gives you a little bit information. So argument values will appear in this table, but not always. But most of the time, it will.

There's actually two columns, table and a val. OK, so I'm going to scroll back up here. In this case, we always get a circle with a line facing right, and we get a checkmark. It just means, well, there's a feature we haven't seen yet. But what happens is you can modify the table within a step, and it's isolated within a step. You can mutate the table, in other words. In all our cases, we never did that. And also, this val column means there's no evaluation issues with the table. We didn't use a column that didn't exist. If we used a column that didn't exist, there'd be some problem here flagged, and this step would fail, or be a deactivate, and you would just know.

So that's what this does. It tells you if there's any problems with the validation itself, or the setup of the validation, but not so much the validation results. OK.

Pass, fail, and I'm going to go over these next few columns, W, S, and N. I'll just tell you right now, these are indicators that show whether we have a certain condition that is met, and these conditions are warn, stop, or notify. And as we see here, we see nothing but dashes, because we never actually set those up, but we will in a future example.

And finally, I just want to note that pass and fail, when you look at the values in there, there's actually two values per each of the cells. What we get is the actual absolute number of test units which passed here, and then the fraction thereof. And then for fail, again, it's the number of test units that failed, and the fraction of test units that failed out of the total. OK. And here's T column. We'll have the CSP buttons if there are any problem rows to see.

Setting action levels and thresholds

So now, I'm going to take this a step further. I'm going to use something called action levels to set thresholds. And we have three of them that we saw, warn, stop, and notify. And what you would do is you would define this function. And I'm just doing it in a way where I set it right there to an object. And then I introduce it to the create agent function under the actions parameter or argument right there. But the cool thing here is that if you set the object and print the object, we can actually sort of see what it prints out.

It just sort of echoes back that we set here a warn failure threshold of one. We set here a warn failure threshold of 0.15 of all test units. So basically, if 15% of test units fail within a step, then it will enter the warn state. And we're increasing that for other states. For stop, it's going to be 0.25. For notify, it's 0.35 or 35%.

So I'm going to use that object in another agent, calling this agent2. So I'm going to operate over a small table again. And this time, we're going to use some other validation functions, some that we saw before, some not. And we're going to interrogate. So we're just going to see what the effect of the actions and levels function does. So I'm going to run that.

Great. So now I'm going to scroll down and see the results. We get another validation report table. And in this case, we see that we have circles here instead of dashes because we set all three levels. We also see that we get this in the top in the title. So these are our global validation thresholds right there, exactly what we put in, 0.15, 0.25, and 0.35. And we also see here that in this step right here, callVal's lt, it must be the case that not all values in column A were less than 7. So we have some failures. And because we have 0.15, that equals this threshold right here, which was set to 0.15. So we have a solid filled yellow circle.

And for this regex function, we're expecting that values in B, column B, should match this regex. We have a lot of failures. Over 50% of the test units failed. So we're actually well above the 0.35 or 35% threshold for this. So all of these become filled. So this is like a visual way to see that our testing has failed above acceptable levels.

And here. And we also get these visual indicators on the side, which sort of, at a glance, show us how things went. Basically, solid green is like everything passed without fail. Light green is things did fail, but no thresholds were met. And these colors are basically the same as the threshold levels here. So they get the highest one each time.

Great. So I'm going to make another one. And I'm going to demonstrate that it's actually pretty versatile what you can do with this action levels function. You can actually set it globally, as we did before. What we did is we take that object, which is made from this function, and pass it into actions. So that sort of set our thresholds for every single step. Or if you wanted to, you can set it for individual steps.

So over here, we see that call vows between. We set some parameters. We're seeing here that the values in column D should be between 0 and 4,000. But we also want to set some custom actions. So we're saying that warn if there's one test failure. Stop if there's three. Notify if there's five. So this will just be for that step. Every other step will receive the thresholds that are from this object. So I'm going to run this.

Scroll down. Great. And we see that right here. So the one we set here is call vows between. We have one test failure. And one test failure corresponds to this right here, warn at 1. So we can either use absolute numbers when we define values inside of action levels, like we did here. So these are absolute numbers. Or we can use fractions. And those are what you see here right there on the bottom.

Overview of validation functions

And speaking of validation functions, there are quite a few. 36 at last count. I'm just going to go through them all real quickly because there's so many. But I want to just give you the basic features. So whenever you see call vows like this, it just means that we're checking values within a column. So whether they're less than something, less than or equal, equal, so on and so forth. So that's one distinct group.

Another group is starting with rows. So we're checking something about entire rows. In this case, we're checking whether rows are unique or distinct or whether rows are complete, which means that they have no NA values within a row. Another class of these validation functions is call is. These ones right here, just bunch them. We're just type checking. So we're saying this column is a character. You give the columns that should be character. If they're integer, you can use the call as integer validation function.

We may check for columns existing. So for that, we use calls exist and provide some column names. What we're saying, these columns should exist in our table that we provided. Call schema match is a pretty comprehensive one, which involves a helper function. What that does is we're basically saying our table has these columns and these types. And we're setting some more parameters for either loosening the checks, basically specifying whether order should be maintained. And really, there's lots of options in there. But it's a way to check the structure of the table.

There's a few specialized ones here. I'm going to go through just two more. Basically, row count match, we're just saying that our table should have the exact number of rows that we specified. Or compared to another table, it should have the same number of rows as another table. And call count match is the same sort of thing, except for column counts. So a lot to keep track of. But hopefully, it's a little bit easier, since we have some conventions in naming. OK, great.

Q&A: agents, multi-agents, and type checking

OK, so I'm going to check for questions real quick. I don't think there are any, but it's always good. OK, there are some questions. Great.

Can you create an agent without specifying table so it can be applied to different tables? No. We don't have that yet. But it's been asked a few times. And it's actually a pretty good one for doing that sort of thing. But what you could do for using something a few times is create some YAML. It's not really covered in this workshop. But you can actually create a YAML definition of an interrogation. And then just use set table to set the table and then interrogate, basically, on that. So there's ways to sort of swap in a table and output the set table function. But unfortunately, with the agent, it does require, I think initially, some sort of table to be set. And we do want to loosen that. And make it, actually, this whole thing a bit more lazy. But until we get that laziness condition, we can't really do that with the agent object.

OK, another question right here. Can you stack agents to create one report on different tables? You actually can. So what you can do is you can create multiple agents. And then once you have those objects, you can create something called a multi-agent. And then that gives you two reporting options, either a wide report, where you're actually checking the same agent over time. But more or less, what you're asking is to have basically multiple validation reports on one printout, which is what's offered there in that function.

OK, and call is integer if the type is character, but the values are castable. Does that pass or fail? Currently, no. It's actually just checking the column class. But that's actually a really good idea for a feature, integer-like, whether it could be cursable to an integer and succeed. So I definitely encourage, and I may think of this for myself as well, issues for this, because this is an evolving package. And it's always good to have new things like that, that I haven't really even thought of.

Using validation functions directly on data

OK, great. OK, I'm going to keep going on this one file. I'm going to show you a different way to do validations using the same functions. OK, so this is a little bit strange, but we can actually use validation functions directly on the data. We don't have to specify an agent. So what you do is you just take your object, and you just pipe it towards a function that we saw here, one of these from a list. And then specify the arguments as before. So what does it even do? What it does is, if there's a problem, it will stop. If there's no problem, it'll just sail through. The small data object will just continue. So it acts as a bit of a filter, a filter for failure, as it were.

I'm going to run this to show you that, in the end, because this passes, we just get small table. OK, run. Great, no problems. We have a small table. No errors appeared. So we're good. This validation passed. Spoiler alert, like this says, error equals true. If you want to render this, that means that this would render. But obviously, we have some sort of problem now. When we narrow this range to 5 and 10 instead of 0 and 10, we have some values which are outside this range. So there's going to be a problem here. So let's run this.

OK, this is kind of cool. So we don't get the table. We actually get this error message. And the error message is not bad in terms of description quality. It says, exceedance of, oh, it's not exceedance of, OK, maybe it's a little bit confusing at the beginning. Because it says, exceedance of failed test units where values in A, that's where we're testing, should have been between 5 and 10. OK, so I'm going to tell you what this exceedance is, failed test units. By default, it's 1. And you can change it with a threshold argument. You can set thresholds just within this function here. There's an argument for that. But by default, it is 1. If you get one failure, then any more failures or any failure or more than that will trigger this error. So that's what we see here. Threshold's 1, failures are 10.

Call is, it'll check whether a column is of a certain type. So let's look at a few cases of, here's one that's passing. So we're thinking that column B is a character column. I think it is, because we see the table in the end when we pass the table through this function. Great. And in this case, date is not numeric. It's probably a date column. So I'm going to run this. OK, see, they're the same sort of thing. It actually tells you what type of column it is, numeric, which is nice. And of course, it's one error. There's only one error possible, because we're just checking one thing, which is the type.

OK, so this is kind of cool. So what you can do with these is you can insert these checks inside pipelines, if you want to. You can have another function or another, you can pipe to another function that does something, or just have it run by itself like this and just stop a notebook, if you're rendering a notebook, and have error. Don't have this error equals true. It'll just essentially stop the notebook if there are problems that you don't really want to have these problems.

OK, let's check out these rows functions. We have two of them. RowsDistinct and RowsComplete. So again, it'll check entire rows. And a cool thing is that there's actually a columns argument. And what it does is it sort of narrows down. It's almost like selecting the table beforehand, before doing the check. So you can check a subset. So if you wanted to have, say, for instance, a distinctness check excluding some columns, we can do that by providing what is here a, I think it's column subs, or just columns. Yeah, right there.

OK, but we're not doing that here. And we already kind of know from before, if you're paying attention, that we don't have entirely distinct rows. So we actually get this error right here. Great. And distinct rows, well, in this case, we're filtering a little bit. We're just taking a few of the top rows. And I think within that region of the table, we do have distinct rows, entirely distinct. So we get the table there.

OK, I'm just going to close this up. OK, rows complete. I think it's the same sort of situation. We do have nulls, NA values within our table. So the entire table will not have complete rows, because we do have presence of NAs. But here's a case where we're just checking a few columns, date time, date, A and B, so four columns. Within those columns, or within the cells of those columns, there's no NAs down the entire table. So this will pass just fine. So we see the table.

Great. And OK, we've got a few functions here, which are end with underscore match. So it validates whether some aspects of the table, as a whole, matches some sort of expectation. OK, so let's go through these again. Call schema match. That's column schema matching. Row count match, where we're expecting a certain number of rows. Either you give a literal sort of value, or you compare it against another table. This is great for things like joins. You have the original table, and you're doing a join on that table. And you want to make sure that you have the same number of rows in the end, because some joins will have rows or whatever. So this could be good for that sort of thing.

Save the call count match. Maybe you have an expectation of how many columns are after an operation, or just how many columns you expect in the initial data. You haven't seen it in a while, and it changes regularly. You want to say, it has this many tables. I mean, this many columns. Table match. This is a bit of a strange one. Maybe you won't reach for this very often. It's basically, does the table match some other table? Like, exactly. Maybe you want that, maybe you don't. There's probably good reasons to use that, but it's a good one to have.

OK, let's look at row count match. So in this case, small table. We're expecting 13 rows in that table. And it's true, because it did an error here. So it must be true. We took the data, passed it to this. If it didn't have 13 rows, you'd get an error. For columns, I said before we can compare it against another table. It just so happens that small table has the same number of columns as the penguins data set in Polymer Penguins. So I'm just showing this as a weird example, but I'm showing that you can use another table. It'll just fetch the count from that and use it here. So I'm running that. And it gives us small table again, because the column count is matched.

Great. I'm going to check again for questions, because I like to do that. Table match versus diffdf package. Not quite as powerful, but it will tell you. Basically, what happens is we count the rows. We count the columns, we count the rows. Check all the schema. And basically, then check the values one by one. And that's the order we do it. If it does error at any point, we tell you what occurred. But we don't get deep into it. We don't compare individual values. So not as detailed as the other package. It's like diffdf lite, I guess you could say.

Oh, and there's a nice suggestion from making a more obscure suggestion from call has tagged na from Haven. That looks fantastic. And what I'll do is I'll make a note of all this for future requests. And there's also, OK, here we go. Or maybe you have to explain how to write my own LLJ functions.

We can do that. Let's scroll back up here. With specially, you provide your own function. And I think what it expects is either a vector of logical values. And that is used for reporting.

Or you can have a table where the final column will be logical. So as long as your function that you provide to specially produces that, it'll work within the framework, especially the reporting framework. It needs to have some sort of way to say how many test units you have and how many paths are failed. So you can definitely use this and supply any function.

Getting data extracts from failing validations

OK, so now I'm going to show you what to do when things fail and you want to know more. So you can use those CSV buttons in the validation report. That's great. That's more for other people. Like if you're sharing this HTML file, which has a report, they can just click on the CSV button and get the right extracts of things that failed.

But you have the program right in front of you. You have pointblank, and we can do more. So we can actually use the getDataExtracts function and does a lot more. Basically, it gives you that in table form.

So agent 3, which is up here, the very last one, it has some failures. So we have some CSV buttons. And every time you see a CSV button, that means there are, of course, rows that failed. And you can use get, this function right here called getDataExtracts to get specific values.

If you just use it on the agent itself, like this, what we get is a list. It's presented a little bit in this way inside of a notebook right here. But this is essentially a list of tables or whatever the data frame that you passed in, whatever type it is.

So this is good. And you can probably peel off a table you want from the list. Or we present this argument i. And you provide a step number. And that step corresponds to the step in the table. So in this case, we're not getting a list. We're getting the actual table or the extract of the table where there's row failures.

So I'm going to link this back. I'm going to scroll back a little bit. So this is step 9. I'm going to take a look and see what this was. Scrolling back. This is called values in set. So we expected that values in f should be in the set of low and mid.

So if you look at column f in this extract table, we said we get nothing but high. So these are the ones that are the failing rows. None of them are low and mid. So that's how we get this. So again, for small tables, you can easily inspect this. But as tables become huge, obviously, you're going to want this to see what the problems are.

So basically, you can use, just to wrap up this part, you can use the entire agent object to get a list of tables. Or you can use specific values of i, which correspond to steps from the validation report table.

Sundered data

So we have failing data. Another way to sort of get it is something called sundered data. So basically, it's just a split of the original table to good rows and bad rows. And bad rows are ones that have some failure within a row from any validation step. And good rows are just like those rows that have no failures at all within the cells of those rows from any validation step.

So it's kind of like the more validation steps you have, the more potential there is for failures and more potential for bad rows.

So getSenderData is the function where you can get these pieces of the table. Um, so let's actually use that. Let's make a new agent right here. Uh, we're getting a small table again. It's giving me a table name, small table again, a little description. Uh, just two validation functions here. Uh, values are greater than some value and values are between two values.

But here's something kind of cool. Uh, in this case, I'm using vars and I'm saying that values in column C should be between neighboring values, um, of column A and column D. So it can use values in columns, like they're beside each other. And in this case, we're not using literal values. And of course you can put a literal value here and a, you know, a column value here. So you can mix and match whichever way you want.

Uh, another cool argument, since I'm bringing this up is naPass. So what I'm saying here is that if any of these columns have nAs, we're just going to say that this row is passed. We're going to excuse this, this row. By default, this is false, but we can just say nAs, it's a pass.

So I will run this. Great. And now we see the report. Two, uh, validation functions reuse. So there's two steps. Uh, we have some failures. We didn't set any action levels right here. So we have nothing here, but dashes. Uh, and this is like a light green, which means that there were some failures, but there's no thresholds, uh, exceeded because there's nothing here, but we have some failures, okay, but we don't know where those failures are, uh, and say, for instance, we want to use that data and just filter out like the, the stuff that failed and no matter which step it was, uh, we could do that with getSunderData.

So I'm going to run this. So we're taking the agent. We're passing it right to getSunderData with no arguments just by itself. And we get here is like a past piece right here by default. So it must've been that, uh, for these two validations, everything went fine for those rows.

So we could maybe just prove it a little bit. Column D is always greater than a thousand. We sort of see that. Yeah, that checks out. Okay. What's the other one? Uh, C is between a and D. Okay. So C in this case, yeah, a is smaller. D is much larger. We can just say if these are smaller, we're fine. And he's pass.

This is the good, the good set of data, but we can also get the complimentary data piece, so all the rows that failed. So basically it will be, uh, the other rows. So I'm going to use getSunderData type fail. We'll run that right now. Okay. So this is eight rows, uh, and this is five rows and in total we have 13 rows. So it's always a split of the data in some way. Uh, so if you add these rows up together, they should of course add up to the, to the count of the original number of rows.

So you can do that if you want the field piece. Uh, another thing you do is you can have a combined data set. What that does is it adds a flag call. And then you get, uh, well, better just to show you, I'm going to run that right now. So we get this new column right here called PB combined and has either pass or fail for, you know, whether all, all the values in these rows passed or whether there were some failures somewhere within this row and the labels within PB combined. They're, they're flexible. You can set up what they are.

So pass, fail, that argument, you can provide a vector true or false, and then if you run that, we see, we get true or false instead of, uh, the text, which is pass or fail, tells you the numbers, in this case, zero or one for pass or fail, uh, which just saves you a bit of, uh, uh, an additional sort of transformation because it does it here for you.

Q&A: sundered data and the X list

Again, I will check for questions now just to make sure. Can sender data that failed also have a list of failures? Um, no, I don't think it has that. Yeah. Basically it's, um, it's kind of take it or leave it at this point. It, there's, there's some error. It doesn't actually tell you which errors occurred, where there could be multiple on different cells. Um, but there's no way to visualize or even find out right now.

Um, now before we end off this, this part of it, I want to show you another thing you can do because there's lots of information and you can get something called an X list from an agent. And really it's just a giant list. Um, and better to show you than to just talk about it because it's just right here.

So use that function on the agent and printing it gives you this display because it's actually lots of information. It's a, it's a list full of information. So you have things like the, uh, it's basically all the metadata from the interrogation. So the time of start, time of end, we got things like labels, descriptive things, the table itself, uh, and really the important thing is like the results here, like the step, um, some more metadata and right here is the values right here of like how many test units, how many passed, how many failed and the fraction of test units, which passed or failed.

Then we have conditions right here and a few other things. So let's take a look at this. Let's use one of these. Let's use dollar sign it from the X list or run that. So in this case, we see a list, uh, these are test units. It's a little bit strange because we don't really know, uh, we know this is step one and this is the last step, but we just get these values one, 13, but basically the position shows you the step number, uh, let's look at number past, again, it's going step-by-step starting at one, uh, so step one had one pass, one failed, one passed, one failed, one failed, one failed, one passed, one failed, one failed, one passed, one failed, one failed, one failed, uh, so step one had one passing step two at 11 passing. So on and so forth. Fraction is kind of the same.

Great. So this is the fraction of, uh, test units that passed each step. Okay, great. So you can actually take this and do your own thing with it. If you didn't, you know, you want to do something aside from the report you get, you can make your own report. So in this case, we're making a table. We can just pass in the X list, some vectors and get ourselves a table. So steps, whether we have one stop or notify, and then we can just move with that and do other things, presumably.

Emailing validation reports

Another cool thing you do is you can email, uh, the interrogation report. And we do that with, um, this function called email create. Uh, we have a package called blastula and it works well with pointblank. So if you ever want to like in production or in CI run these validations and then like notify yourself or notify someone else, you can do that with this. So I'm going to show you what agent three looks like when you pass it through email, create.

It's actually going to appear here on the side. So I can open that up. So we get here is an email message body, uh, with some texts saying when the validation was done, how many steps there are, and sort of like a mini version of the validation report, it's missing some of the columns, but this is meant to fit within an email, so it's going to be kind of small, great. So that's what you get there.

And the idea is that when you have that, you could use, um, this sort of construction right here. Like say, for instance, if you had some sort of test failure, uh, with notify. So if any of these results from the X list, uh, provide true for notify, then you can basically take that email and then send it off through blastula like that.

Customizing the agent report

And, uh, speaking of reports being customized or changed, you can actually customize reports yourself a little bit. Um, we actually have a function called get agent report. Normally when you have an agent, you can just print it. It'll provide the report, but if you use get agent report, it'll do the same thing, but it'll give you a chance to use some options. Uh, for instance, we can change the title. I will run that and show you, in this case, it's the third example. And we're using markdown here and it just sort of applies that markdown, uh, to the title. So that's one option, changing the title.

Um, you could do things like arrange steps in certain ways. Uh, in this case, severity is a keyword we can use. If I run that, it'll put the most severe steps first and then go down to the and then go down to like less severe. Uh, so that's sort of nice reordering. So it shows you what failed first.

Uh, we can do that and also just keep the failure states. So basically in this, in this case, we're just cutting out everything that was essentially green. Great. Uh, last thing you may not need it, but you can always change the language of the agent report. We have for quite a few of them. In this Lang parameter. So if you run this, it'll just change the language and there's quite a few languages supported. So all the texts will just change to suit that language.

So that was basically it for the first, uh, QMD file of this workshop. Um, I want to provide like maybe 10 minutes. If you have a Symfronium, uh, I really encourage you to sort of play with some of these cells. Change a few things, maybe insert your own data. And, uh, I'm thinking of taking a long break, maybe a 10 minute break up until like, uh, five to the hour. It's more like an eight minute break at this point, and I'll, I'll be here to answer questions, obviously. So it's the old mini Q and a period plus experimentation time.

Q&A: preconditions, database support, and design philosophy

All right. Thank you, Richard. And I'll keep an eye on the questions for as I come in, but yeah. Great. First session really decided to learn even more of what pointblank's capable of.

Thanks. And, uh, there's actually a question here. So I'm gonna answer that, um, it's possible to manipulate, manipulate the data in the cell of a table before running a step, for example, is one value bigger than another, if a table is contains 95% CI or me, I'm not gonna read the rest cause it's yes. Um, we didn't go through it because, um, otherwise, you know, this workshop would be very long, but, uh, there's actually an argument in each of these steps, uh, each of these functions, which constitutes a step.

I'm gonna go to one of these. Uh, it's called create conditions. I'm looking for a complete, but it's not continuing. Uh, I'll just go to the function itself, please. One sec. I chose the one function. I didn't have it. Uh, let's go back up. So call vals, um, functions really most of them except for identity ones. They allowed you to change the table.

Pre conditions. Okay. Perfect. So in this case, I'm going to just move that away. You could like provide an expression and you can mutate the table before proceeding with the validation step. Okay. This is only for that step two. It's isolated within that step. So you can add a call, uh, which has some value you want to check against. Uh, can have multiple columns. And then like any values provide for columns and value that it, it will presume it will work on the table that has been mutated.

So you can add a, add a column here, which exists only in the mutated table. Say column X, it's not available in the original table, but it is after this function is applied to the table. Okay. So that's how you do it. Use formulas and tax for that. And, or you can provide a, a bear function. Okay. So basically you have some way to sort of like mutate your table, shape it to the way you, to what you need to make a validation work.

I have a question, Rich, if you don't mind, um, is, um, DuckDB databases supported in pointblank? They are. Yeah. Nice. So basically anything, or most everything in dbplyr and dplyr will be supported. It's used as like the way to sort of like the backend and even things like databases, like MySQL, uh, SQLite also supported Postgres it's in there. And we do a lot of tests to make sure that it does work. Um, so we do a little bit of verification on that end. Um, and basically as dbplyr gets better, this will get better in working with it.

Uh, okay. So some questions about philosophical and or design differences, uh, between this and assertor and validate. Yeah, I've taken a look at those packages. Uh, basically these are very similar, except the big thing I want to, I want to design for was having, um, reporting that was publishable. So basically this is a good first, uh, you know, sharing with the people, you know, that don't use the program or don't, or, you know, they're allergic to console output, things like that. Or we don't have to create your own reports. You know, we have something serviceable that is available right away.

And, um, yeah, that was, that's, that's, I think the biggest distinction, also the composable nature of, um, mutating data and then handling with failure, handling failures and reporting. Um, say for instance, you, you provide a, um, no, call it doesn't exist. Like the report doesn't fail. It just keeps going. It just mentioned that, you know, there's, there's a problem you should address. Uh, and so that, that, those are some of the things we want to make sure that things generally sort of work, but give you more information to sort of, you know, fix things up on if it has to do with a user problem, not so much a data, a problem itself.

Uh, basically these are very similar, except the big thing I want to, I want to design for was having, um, reporting that was publishable.

Summary of the validation workflow

Um, basically data validation, blank, blank, you need agents, a set of validation functions, and then that last interrogate step, get the last function call. Would it have to collect all the NYC taxi data if only a couple of columns are getting validated? Um, nothing is being collected. Everything's all being done on the database side, uh, for our, for everything. Like there should be no transfer of large datasets to your machine. Um, hope that answers the question because we're using dbplyr and, um, we don't collect the very end and we collect a small amount of data, which is basically just like the, the, the tally of results, which is not very large. Condense it down before we actually pull anything back.

Okay. So the agent, it creates a report that tries to be informative and easily explainable as those reports you've seen multiple times, we can set data quality thresholds with action levels. Uh, there can be D default, DQ data quality thresholds and steps specific thresholds. Okay. So basically you can use action, like the action levels function or the object created from it within certain steps or globally within, uh, uh, create agent, uh, there are 36 validation functions and they have a similar interface and many common arguments. Um, and it can be used with an agent or directly on the data. And directly on the data means that you pass data through or you error based on data quality, uh, test units failing.

We can get data extracts, uh, pertain to failing test units in rows of the input data set with get data extracts. It has an eye argument if you want to match it to a step. And that way you just get like the table itself and not a list of tables. There's the option to, to obtain sundered data, uh, which is the input data split by whether cells contain failing test units, and you can get either the passing piece or the failing piece or a combined report.

Um, there's a large amount of validation data with that get agent X list function. So it creates a list and the print method just shows you what's basically how to, what's in the list, basically what, you know, ways to access it and also how many objects are in each one of those. Um, if you want to email, um, as a result of validation, sometimes you're just running this on a schedule. You can do that with email create combined with, uh, functions in the blastula R package. And finally, there's some custom customization opportunities with get agent reports. Normally you just print the agent itself, but you could use this to give it some, some options for display.

Scan data

Great. So I'm gonna move on to the next step or the next step. Uh, the next QMD file in, in our set of QMD files from this, uh, workshop. And that is about scanning data and also using a function called draft validation.

Okay. So scan data. Um, what this is, is basically just like a large description of a dataset. Um, you run one function, scan data, give it a table. You can also provide, um, there's many different sections. It provides, it provides an HTML report in the end. You can cut it down or reorder just by using this, this sections argument right here. So each of these letters corresponds to a letter here, like overview variables, interactions, correlations, missing values, and finally sample. Uh, in this case, I just want the overview variables, uh, missing values and a sample of the data.

So I'm going to run this and they're quite sure where it'll appear, either it's on the right or below, um, cause it's been ages since I used this type of, uh, way of working. But we see here, it's going, it'll take some time, uh, depending on how many columns you have usually. Uh, but now it's done. Okay. So now we're taking a look at the table scan. And if you click here, you just shortcuts to get to different sections, or you could just scroll down.

Uh, we have overview, basically the sort of like the dimensions of the table. Uh, some information about the processing of the table and, oh, it doesn't make it easy to scroll down. Jeez, I'm going to use this variables. There we go. I'm going to try my best here to scroll down without the ID fighting me, but we see here, we see one, uh, column. It's a character column. We're showing just some, you know, basic data about the column and toggle details, this is where some good stuff is and I may not be able to get to it.

I'm going to actually take this and pop it out into a browser or somewhere else, anywhere else is good. Okay. There we go. This is much better. There we go. So if you click on this, we get some details. So you can see things like, uh, well, this is a character column. So we see common values. That's just the way he wants to print it out. And string lengths for the table. Uh, start with the plot there.

Uh, let's look at something that's more like a measurement. Um, let's look at column, column in depth here. There we go. So we see some stats about these values. Uh, kind of cool. You can observe outliers here really nicely. So this is a little bit kind of pre data validation, data quality. You just start scoping things out and really understanding the data. You see common values. Uh, it's not so great for numeric data, but it might be good for things like categorical data. Uh, min max, our max min slices, we can sort of see frequency counts of different values.

Um, but the really cool stuff is in the first one, the stats, descriptive stats tables. And we've got a few more here and we got some, you know, some, uh, Delta values here as well. Um, and a cool thing here, it's not really labeled. This is like missing values. If it's all blue, this is the top of the table. This is the bottom. It's sliced up into different sectors. We see here, we don't get a missing values cause it's all blue in these columns, everything from like this column here to, uh, the date egg column.

But we start to see missing values in certain parts. Um, you know, right here in this sector and also the bottom, there's some fraction of missing values. Uh, for these columns, we have even more missing, missing values distributed throughout. And what makes lots of sense is comments. These are totally optional, it seems. So there would be lots of missing values. So, um, basically when there's comments, they're pretty, they're pretty sparse. Uh, but it's a cool way to sort of like understand if you have just one missing value, uh, this won't be blue. You can flag out of really large tables. There's some missing values really quickly.

And then finally, uh, there's like the sample table. Um, not much here, but basically this is just a GDL table. Um, you know, the first five rows and the last five rows of the entire table, and you can scroll across, uh, to sort of see, see things and just, just get a feel for what the data is in the sample table. We sort of see here, if you have really good eyes, you can sort of see there's missing, uh, rows here, but it gives you like the actual row numbers at the end. So you even have a sense of what the size of the table is.

Great. So that's table scan. It's kind of cool. You can publish this as well. I've done it in the past. It's basically just a chunk of HTML in the end. Great. And yeah, some additional instructions here. Basically the sections could be reordered. You can, you can use a string like this, just capital letters for each of the different parts. You can omit some things. Um, that's a way of customizing it.

Um, I was going to show you something here too. Let's look at some pharma tables. Let's look at CDDM. Here we go. So in this case, I'm getting the, uh, adverse effects table here for the I just want the, uh, the first few columns, which are overview, um, sample. So sample second, and then variables last.

And I, I do apologize. It does sometimes take so much time, a little bit of time. Um, also because we're on cloud, it takes even longer because these instances are not very powerful. Um, but I'll just talk over this part and I'll get back to that part. Uh, and again, just like the validation reports, you can have them in different languages. Oh, it did pop out. So that's great. Uh, just certain things I want to show you here. I'm going to pop this out as before. Cause this has some really cool features.

Um, so again, overview this time to samples in the middle. And then down here, we have this cool thing. It's like labels. They get shown here too. So it's kind of anything. This is a label dataset. So they just appear, uh, right inside this report. So kind of cool. Um, and there's lots more columns. That's why this took a long time. Um, I believe ggplot for this cause sometimes there's, there's plots in some of these, um, we'll see these, these character value ones, their string lengths. This takes a little bit of time to process. So, um, that, that requires a bit more waiting. Uh, but kind of cool, kind of cool. It does scale to lots of columns, but you just have to, you just have to wait a bit.

Great. Okay. I'm not gonna show the other one, uh, but I'm going to go to the next section here. Well, actually I'm not going to skip. You can actually export this to an HTML file. Um, there's a function called export report. This works for all sorts of objects within pointblank. It could be a validation report. It could be this thing. Uh, it could be the next thing I'm going to show you later, but there's different types of reports and exporting of them can be easily handled with this one function called export report. You provide a file name, you provide the object, and it does the thing that you want. It writes that to disk.

Draft validation

Okay. Now I showed you this before, like, um, these validations right here, these validation workflows with lots of steps. I'm going to scroll up till I get to one which has lots. Um, here we go. Great. All this stuff here. So I'm going to get back to that. Um, but I realized that this could take some time. Maybe it's a bit discouraging when you're doing this. Maybe it's a bit discouraging when you start off in the package, you have to understand what these all are. You have to refute, refer to the documentation, uh, quite a bit. A little hard to get started initially. Uh, but we have something for that. We have a function called draft validation.

What it will do is it will generate a draft validation plan. So basically it'll just write a plan for you, uh, into a new R file using an input data table, uh, just like we did with scan data, you just pop it in the input table and away it goes. Uh, but with draft validation, the data table will be scanned to learn about its column data and it'll provide you a set of starter validation steps. Okay. So it'd be like a plan will be written for you. Okay. So let's look at storms from dplyr. Okay. Quite a few columns.

And, uh, what I'm saying is that draft validation, if you include that, it will, uh, it will look at it and make a file for you.

Uh, we have to do this thing here. This is a little bit strange. Uh, we have to provide formal notation, this sort of like Tilda initially, uh, because we're trying to be lazy with it. And, um, it's really an expression for getting the data. So you can do more things with this if you want to like, you know, like mutate select if you wanted to. Um, but we have to do this to make it work.

So I'm not gonna run it cause it's already been run. Actually. Storms validation. It's right here. Um, or I should say it was there. Here it is. Thoughts aside. So it creates this.

This is a brand new R file and, uh, it looked, you know, it generates like library pointblank, which is what you need as a minimum. Uh, we fetched the, uh, the data through dplyr storms table can also take an expression like this to, to get the data instead of just having it on. You can provide a recipe for getting the data essentially this way.

And, uh, it does that even as action levels, uh, just by itself, just to show you it's possible. It's just like a template thing. It doesn't mean much. You didn't provide that, but it just does it to show you that the, you know, the functionality exists, essentially. Uh, it does tell you things like, uh, you know, the audition plan was generated by this. And a cool thing is it provides this. We don't have it updated yet for native pipes, but that's the thing we'll do pretty soon.

Um, but does provide comments between each of these and provides, you know, these with values that work essentially. So if you were to run this, it would essentially run and it basically just uses the limits of the data for lots of these, these between checks. And then also at the end, it does a really nice thing as you get towards it. It does Rose distinct call schema match. It calls up the call schema helper function to define the schema for the table, and it just puts it within call schema match because it requires this object.

And then, so if you run this check again with the table, if there's a change in the schema, this will flag that, uh, and then it ends up interrogate, and then it just prints out agent in the end. So it gives you this file essentially is what I'm going to show you. So pretty cool. Um, it's a good step.

It's a good sort of like, uh, you know, you have like big tables and you're, you're holding off on doing validations because it does take some time. Uh, this is a good way to sort of like speed things up a little bit. Initially. So just want to show you that, so it's ready to run all validation steps, run without failing test units. That's the promise. At least I run it on multiple data sets to make sure that it seems to do the right thing.

It even knows about certain things like latitude and longitude, uh, columns, just by sniffing the column name and some of the content within it. So it does, it does a few things, a few extra things.

So basically all I want to say for this file is, uh, it's a great idea to examine your data. You're unfamiliar with using scan data. Um, cause it can inform your, your data quality checks. And, uh, speaking of which the draft validation function give you a really good quick start for data validation because it scans your data, but in a different way to create a file.

Expect and test function variants

Uh, so now I'm going to go to the next document here, expect to test functions, QMD. So this is all about, uh, using some cousins of the validation function, uh, their variants. They all begin with expect or began with test underscore, but they have the same names. So basically all those 36, uh, functions you saw, you're just glomming expect or test before it, and they have different functionality. And I'll show you what that functionality really is.

Um, so let's start with expect. So we have expect, uh, the prefix. It'll indicate that those functions are to be used with unit testing. So like test that, uh, and if you ever use test that a lot of functions are, they all begin expect.

Um, so another one is the test prefix. So those functions, those variants of the functions will, uh, either give you true or false and nothing else. Okay. They'll produce logical outputs. So this is great for conditionals or programming with, with data. Say, for instance, you saw, you don't want to carry on in a certain, you know, programming path, uh, based on some, some data quality issue. Um, you can do that, or it can redirect to something else. Uh, probably a message or some sort of failure or, you know, what have you, but it gives you options for programming. Uh, with data.

So let's first look at, um, expectation functions, the ones that begin with expect. Okay. So the test that package has, uh, has a questions of functions, uh, functions beginning with expect those functions. Um, these functions here follow the same convention and they actually could be used within the standard test that workflow. Uh, so you just have to provide, you know, test hyphen, um, some name provide, um, you know, pointblank's expect functions. You can mix them with, with, with test that's functions as well. And it works fine with test that's reporter.

Uh, but, you know, as opposed to what you do with test that we're testing data instead. Okay. So let's look at our table here, small table, and we want to test the values in the SQL, uh, really trust column names in this table, but that's, that's what we have. Okay. So we have these values within that column. Okay. So say, for instance, we always expect that those values are between zero and 10. Is any values. Those are fine. We're going to permit those. We're going to pass those. So we can use expect call vials between. So it's just like call vials between we saw before, just expect in front of it. So we can run that.

Okay. So any pass is true because we're, we're fine with any, any is passing. This is the left and right values. And this is the column we're checking. Okay. Run that. Run that. Oh, we see nothing. Nothing happened, which is actually good. Cause in test that you'd expect functions do nothing until they fail.

Um, okay. Let's try something that fails. So we're doing it with a different range and obviously it's going to be, um, not all the values within this range are, are going to be within the range. So I'm going to run this. Great. We get error. Um, so it's just like the error you would use if you used, uh, call vials between directly on the data. It's very similar to that, except this works within the test that framework. That's the key difference here.

Okay. So if you're doing test that use expect call vials between, or there's lots of options. You can use like these test values. They get true or false. And you can always use test that expect true or expect false. So a lot of ways to do it.

Um, so just like there's 36, you know, like regular, uh, validation functions, we get 36 expect functions, 36 test functions. Um, so one thing that you can do, this is like a little, little suggestion. Uh, you can use draft validation that we saw before to generate a validation plan with the data as a primary input. And then we have another function called write test that file. It'll create a test that dot R file using the agent from the draft validation file.

So that is actually right here. I'm using game revenue. It's used a similar way as the other one. Use the tilde in front, give it a file name, and it'll actually create this test that file. And it says right here, generate by blank. It opens up, uh, uses library statements. runs a library statement, it loads the data, and then it creates a number of individual test that, like, functions to wrap up these expect functions in the end. And because it begins with test, you can run tests right away, which is kind of cool. So if you have, say, for instance, a dataset in your package, you can test that dataset. Maybe it changes once in a while, you want to update it, you want to make sure that, you know, things don't go out of your expected parameters when things get updated, you can always run this as part of your package checks, which is kind of neat.

So the test functions. So I alluded to this before, if they begin with test underscore, and it matches all the other ones, they give us a single true or false. So, for instance, if you wanted to have a script to error that had errors, if there aren't any values in a daytime column of a small table dataset, we can write this. There we go. So if not, test call vals not null, it's a little bit confusing because we're using not twice, then we should have the stop statement. I'm going to run this, and we'll see that we don't have stop, we don't have the error at all. This one does. In this case, we're getting billions here from these two things. We're negating both, and we're checking as an or. A little bit confusing, especially in a workshop, but I'm just going to run it, and it shows that because of these tests, we do have a true passing through, or I should say a false passing through in this case, and we do get this, there are problems with small table.

So these are just variants of the validation functions. They may be able to test or expect, and again, you can validate tabular data and test that workflow. I do this sometimes for other packages. I got a package called font awesome. I'm always changing the dataset because there's always new fonts being added, so I use these expect functions within that to make sure that certain things are what I expect, like nothing gets too different because I have functions acting on the data. So I definitely do use these expect functions a lot. And tests, really good for programming. You don't have to. There's other ways to get true or false. We saw with the X list, you can just obtain them, get that vector of true or false, but this is not bad for a quick Boolean thing or logical value when you need it.

I got a package called font awesome. I'm always changing the dataset because there's always new fonts being added, so I use these expect functions within that to make sure that certain things are what I expect, like nothing gets too different because I have functions acting on the data. So I definitely do use these expect functions a lot.

Again, I will check for questions. There is cucumber in addition to test that. Yeah. There's actually quite a few, when you really break it down, there's actually quite a few unit testing frameworks for R. And I hope with the options that you can use, you can use these with as many as, you know, with a wide range of them.

Introduction to data documentation

I'm not going to take a break from this one because this was a short section. I'm going to go right into the next section, which is section 5. We're skipping 4 because this is only a two-hour workshop. And this is a pretty important section. And essentially what this is, is the documenting part of pointblank. So this is an introduction to data documentation. And it uses an entirely different set of functions, which are not validation, but more like getting metadata and trying to explain your data and still publishing what the data is. But this is more aligned with things like data dictionaries and data documentation. It goes by different names. But I do think that a good thing to do often is to document our datasets. I'm reading this line right here. And we can do that through the use of several functions that let us define portions of information about a table.

Okay. So, again, I just can't let go of a small table. I'm just going to use that here as well. And I'm saying let's document that small table dataset. It's easily available. It's right inside the package. Let's have a look at it again. Okay. Here it is. This is all of it. It's got these columns, date time, date, A, B, C, D, E, F. Not the most exciting table to document. But as an example, it's not too bad. We can at least see it all in front of us. There's only 13 rows. But really, when you're documenting data, you're not really concerned about the rows. You're more concerned about what the columns are, what the table represents, how you use it. More description type things than values. Values do play a part in this. And I'll show you that soon.

So, to start the process, we have another create function. This time it's called create informant. And this creates an informant object that is a bit different from the agent object. Okay. But it looks similar in terms of the way it's set up. We have table, table name, label. Those are the same arguments as in create agent. I'm going to pass a small table. This is just a name that's going to be used in the report. And we're going to call this metadata for the small table dataset. And as before, you just have to print the object to see it. Okay. So, we'll run that now. Okay. This table is not as wide as the other one. So, I don't have to scroll to the right constantly like I did before.

And again, we can customize this. You don't have to always see pointblank information. You can change this. There's a function for that. But what it shows you, again, is what type of table this is, the name that we gave it, small table. It gives us the dimensions at the top. Because it's pretty important to know more about the table. This is all about table metadata. So, we have right away rows and columns. Cool. And what we have here is another section. It really just has columns. And then each of the column names in a box. And then each of the column types just beside. Okay. So, that's what we start off with. It's not much, but it's something to go on. Okay. So, that's how we begin. Okay. So, it's automatically generating information.

Obviously, you know, this program knows nothing about what's actually in the columns or how to describe them. You provide that. And I'll show you how to do that next. So, we have like three functions here. They all begin with info. Info underscore. So, info tabular. This is where you just add some text. And it's just pertaining to the data table as a whole. You just want to give like an introductory paragraph. And it goes to the top. Info columns. This is for adding information for each table column. So, right now we see like a lack of information. We have nothing here to the sides. This is how you add information to each of these rows. And finally, info section. This is more like a free form type thing. It lets you add sections of text and provide ancillary information. So, basically you can add multiple sections, add text within those sections, use markdown. So, you can provide all sorts of information, like the table user, stakeholders, contact info, whatever you want. You can just add sections based on any metadata that is important. But it will be at the bottom of the table, the report table at the end.

Okay. Enough talking. Let's actually try doing this with some code. So, I'm going to start over again. Create informant with small table. Example two. In this case, I'm using info tabular. The argument is description. You can leave that out. There's only one argument. And we're just saying this table is included in the pointblank package. I've used markdown here. You can see the two stars on each side. This will be bold. So, markdown is just assumed as being there. Okay. Info columns. Date time. Only doing what? I'm only describing the date time column. And what I'm providing is info. Actually, that's not the name of the argument. You can actually use any argument name you want. It will basically just like use that in what's written below. You'll see. So, it will say this column is full of timestamps. And that's how I'm describing date time.

Okay. Finally, this one is a bit more complicated. Info section. You provide your arguments again. So, section name, further information. And then within that section, examples and documentation. And then we can write some multiline markdown here. Okay. I'm going to run this. Great. Okay. I'm going to scroll down a little bit here. I'm going to see if I can pop this out just so we can see this a little bit better. Great. I'm going to try that. Here we go. This is slightly better. A little bit small. Okay. That's much better.

Okay. So, this is info tabular. The text I provided. Description. The table is included in the pointblank package. I'm going to scroll up so you can see where it comes from. Description. That's where description comes from. And then this bit of text is right here. Rendered as HTML from the markdown. Columns, the first column is date time. We just so happen to provide that in info columns, the info, which is right here. So, basically, this bit of text, like this label, or this argument name, becomes this label. And then, like, this column is full of timestamps is this text right here. So, you can associate data, metadata, to each of the columns. Great. And then, finally, info section. That lets you add sections to the back. It's a bit multi tiered. Because we have one call of it. And this is a real argument called section name. We call that further information. That's right here. And then within that, we can have multiple of these. But we just chose one. Example of documentation. That's right here. And then the large piece of text we have right here. And this is a link. Because it's all just markdown and this does work. Because it's just using column mark to render that. Okay. I'm going to close this up. So, that's a very basic way of doing this. Okay. And, of course, we can go further. We can describe each of the columns.

But I want to keep going a little bit more on, again, what you use these for. So, tabular, again, is for stuff at the top to the table section. Okay. Use named arguments to define the subsection names. So, it can be description or it can be something else. And you just put in some text. Okay. Ideas. For instance, you can have, like, high level summary of the table. What each row of the table represents. You can describe that. The main users of the table. Description of how the table is generated. Information on the frequency of updates. These are just suggestions. But they're pretty good.

Okay. I'll get a little more into info columns, though. Because that's really where the most of the work should be done. You want to describe each of the columns. Okay. So, create informant is how you create the report. Info columns is for adding information. And to add information to the columns, you'd have to have a separate call each time in the columns. There's no way to bulk, you know, provide information. There is a way with a separate function, but not with info columns. Okay.

Okay. So, let's try this with a much more interesting dataset. Let's use the penguins dataset. In this case, this is a very long bit of code. But basically, what I'm doing, I'm creating the informant using penguins. Providing some information at the top. This is a label. And then each of the columns, I'm providing a description. Using markdown wherever possible. And another cool thing is that you don't quite see it yet. But you might notice that I used ends with mm. So, all these columns here, they end with mm. And if you use info columns, all the text you add is additive. It will just keep appending the text to the back. So, the order is essential here. So, basically, we're saying these columns right here, they have different bits of text, but they all pertain to some sort of units. Millimeters, in this case. So, we can actually use this. Anything that ends with millimeters, which are these columns right here, will get this extra bit of text in parentheses, in units of millimeters. So, kind of cool. So, let's run that. So, we can sort of see it.

Okay. I'm going to scroll down. I'm going to learn my lesson and just make this into a separate window. And expand this. And zoom a little bit here. Okay. Great. So, we add the pieces of text. And the new thing I want to note is that besides adding text for all these, this in units of millimeters wasn't repeated three times for each of these. It was just added multiple times with some type select statements. Because of the property of each time you call these, it will just keep adding text to the same key here at the end. So, we can add some common text to multiple columns. So, that's kind of cool.

Okay. And scroll down. So, again, type select. We can use all of them. All the different ones which are most useful are things like starts with, ends with, contains, matches, and everything, in case you want to include something to every column. Some piece of text. Pretty useful when you have things which are common or maybe the text is the same for multiple columns.

Okay. So, info section. So, it's that's a bit of a bigger function in that you can add multiple sections at the end of that informant table. But this is information that doesn't fit at the top, like the table or the column sections. So, it's kind of almost like additional information or reference information that's important. So, some of it might be like source information. Like, for instance, the citations for the papers that were used to get the dataset or the dataset was involved with. So, I'm going to run that. So, I'm going to take that same data or that same object, just add on info section with the section name source, and then provide this multiline markdown. Okay. I'm going to run that. Scroll down.

There we are. Okay. So, this hasn't changed. But this has. So, now we have references appearing at the top here. And these are all nicely formatted. These DOI links, they do what they're supposed to. The links are nicely formatted because of some internal stuff that we do in terms of styles. So, that's great. And the note here is basically just an additional note we put in, another piece of text. Okay. So, this is great. So, you can do anything you want with this last function, which is like info section. As many sections as you want, as much text as you want, for things which are not, don't need to be up front or don't need to be, don't really pertain exactly to individual columns. Great. We'll close this.

Okay. And then, yeah. Basically, I'm going through some ideas about other things that could go in the back. I'll just run through them because why not? Info related to the source, definitions, explanations, person's responsible, further details on table production, important issues with the table, notes on upcoming changes, links to other information, report generation metadata, include things like update history, person's responsible. If you can do it with Markdown, then you can do it here.

Okay. And I want to say, as before, you don't have to see that title at the top, which says pointblank informant. You can just get rid of that, change it. You can use get informant report, and you provide options there. Mostly with the title, you can change the width of the table a bit. So, it fits, like, you know, like whatever your main document you're putting this into. This is kind of like a keyword. This is all in the documentation for get informant report. You can include some static text or this keyword with quotes or sorry, with colons on each side, table name. So, the title will just become the table name. Let's actually run that right here. It's fine to just show it here and just close it up. It just says penguins because that's the data set we're using, the table name, it just being hoisted up as the title of this whole report. Great. And you might notice that it was not as wide because it's now 600px. Great. And as I've mentioned before, we have this function that does a lot called export report. You can use it with scan data reports. You can use it with the validation report. It just gives you an opportunity to take that object and to make HTML out of that. So, you can embed that anywhere else that you want it. So, that's export report here.

Getting deeper into data documentation

And I would say there's definitely more to this than that. So, there's actually another QMD which we want to get hopefully done. It's called getting deeper into data documentation. And I won't waste any time getting to it because it's actually pretty useful. That's what might seal the deal in terms of using this. Okay. So, there's more you can do. You can provide static text or you could do things like actually use parts of the data to change the text. Okay. And that's in the concept of snippets. Info snippet is one thing. And there's other versions of this. But let me show what that does. Okay. And best examples. But I want to set the stage saying that, you know, lots of information about the table could be from the table itself. So, if you wanted to show some categorical values from a column, you could just take those from the data instead of typing those in yourself. Or you wanted a range of values in an important numeric column. You can obtain that from the data. It could be KPI values that you can calculate using data from the table. So, you can actually fashion a function to get information from the table, include it in your documentation. So, I'll be done with the info snippet function.

Let's look at small table again. We're getting away from penguins. That's the small table. Because it's simple and it might illustrate our points better. Okay. Very small table. We've got numeric values and things like that. So, D, we could, for instance, we could make a function or make a small pipeline here to get the mean and round the value. Great. Okay. That's nice. So, it just proves that we can take the data and use it. So, let's do that. And apologize for the pipes. Don't have it yet in terms of native R pipes in these examples. But you could do it that way, too. But the idea here is that you create an informant and we're creating a snippet. We're giving it a name, mean D, and a function. We're buying a function to get that value. So, I'll run that. Great. I'm not showing the report because we're actually not done yet. We have mean D.

Okay. But what are we doing with it? Well, we're going to insert that into some text. So, the next thing we're going to do is take that same object, use info columns. In this case, we're looking at column D. And we're seeing here this column contains fairly large numbers, much larger than the numbers in column A. Have to write something. The mean value is, oh, check this out, in curly braces, mean D. That corresponds to this right here. So, we define what the snippet is, the snippet of text. We can insert it also with curly braces. And let's actually run that.

Great. Okay. Let's keep going. I'm going to do more of it. I'm going to write the whole thing again. So, we see it all together. Create informant. Create the snippet. So, we're getting some aggregate value from some of our data right here. We're calling it mean D. We're inserting it in a piece of text. And then because we have to access the data, we have to give permission to do that. We have to use incorporate. So, it's just like interrogate. If we're doing any of this stuff where we actually have to access the data, we have to use a function. In this case, that function is called incorporate. Okay. I'm going to run that. Okay. Okay. Kind of cool. It says some things in the console. Incorporation started. There's a single snippet to process. Information gathered, snippets processed, information built. Okay. Cool. That's encouraging. I'm going to run this now.

Okay. Now I'm really excited to get down to the column which I changed, which is D. That's the only column. Oh, of course, it's cut off. There we go. Okay. This column contains fairly large values, blah, blah, blah. And then the mean value is, oh, cool, this value I got right here. Nice. Which is far greater than any number in any of that other column. Doesn't matter what I said. But the key thing is, like, this value can change over time. Your table may change. But these values update. As long as you run that whole thing, use incorporate, it will take that recipe and you're allowed to insert values that you can obtain from the table, which is kind of cool for evolving tables. It's really kind of neat. It means that if you do run this, it doesn't get stale. You don't have to use manual values. It's really cool. And we have some variants. We built in some of these common some of the snippet functions for common things. So, we got a few

But the key thing is, like, this value can change over time. Your table may change. But these values update. As long as you run that whole thing, use incorporate, it will take that recipe and you're allowed to insert values that you can obtain from the table, which is kind of cool for evolving tables. It's really kind of neat. It means that if you do run this, it doesn't get stale. You don't have to use manual values. It's really cool.

functions available to make it easier to get commonly used text snippets. Okay. So, snip list. What it does is get a list of column categories. Nice. Snip lowest and highest. Get the lowest and highest value from a column. You can use that in a min or max type situation. Snip stats. Get an inline stat summary. Kind of cool. Okay. So, let's see how those are used. So, each of these functions can be used directly as an FN value and info snippet. And we don't have to specify the table since it assumes the target table is the one we're snipping data from. So, let's have a look at that. Let's go back to a good dataset, penguins. And we're going to say here we're going to get two snippets. Snip a list. We're going to get species. Okay. So, basically getting a list of values from the species column and a list of values from the island column. We want to know our islands.

Oh, and here's where it's used. Nice. And here's where the other one's used. Cool. Okay. Let's see it. Okay. I'm running that. And the console said stuff. Two snippets to process. Great. All check marks. Now let's look at this. And I'm going to zoom. Expand that. Okay. Very cool. I'm going to compare that against the text that we have. A factor. Okay. Species. A factor denoting penguin species. And then we said species snippet. Cool. Okay. Nice. That's right here. And then it just got that because a snip list. Okay. This can change over time. There can be less, you know, more species can be added. This will just keep track of that. And we use that snippet for island snippet right here. A factor denoting the island in the Palmer archipelago Antarctica. So, we have that list right here. So, it was obtained from the data, which is kind of cool. Great.

Nice. So, I'll close that. So, it also works for numeric values. Let's use snip list to provide a text snippet based on values in the year column. Okay. So, in this case, we're creating a snippet. Year. I will run that. I'm using the same object over and again. So, it's going to basically add to the same thing. I use incorporate as well. Really important here if you use snippets. Okay. I'm going to run this. And do this. So, the year should be put in as well. Here we go. Info for integer. Sorry, for year. The study year. 2007, 2008, and 2009. Nice.

Great. So, it can be used for numbers and also for categories, text. Okay. Snip lowest and snip highest. I think you know where I'm going with this one. I'm going to create those snippets. Here's a cool thing. Min depth. I specified it down here, but I'm using it up here. So, the order doesn't really matter in the end, which is kind of nice. You don't have to worry about that, which is super, super nice. There we go. I just want to demonstrate that. It could be down below. It could be all gathered up in one sort of spot. It shouldn't affect the outcome. Great.

Now, we have lots of snippets because we keep adding to this one informant object. Now, there's six in total. It keeps doing the other ones, too. Okay. So, let's print this out.

Great. We'll expand this. There we are. Okay. There we go. Integer denoting flipper length Okay. There we go. Integer denoting flipper length in units of millimeters. Largest observed is 231 mm. Great. Because of this. There we are. This text right here. So, we just insert the number there. And this one for bill depth millimeters, a number denoting bill depth in the range of 13.1 to 21.5 millimeters. That is up here. Great.

So, yeah, kind of cool. Like, your data can, again, your data can change. This keeps up with that, with those changes. You do all sorts of other things, too. You can use little tiny, well, small conventions like links can be written like this or link text like that. And dates can be enclosed in parentheses. And they can be formatted differently, is what I'm saying. So, pointblank will try to find certain things and try to make them look a little bit different. And so, let's actually take a look at that in action here. So, in this case, I'm putting a date within parentheses. Good stuff. And links, well, I don't know if you can see this, but well, this is not much different than when you have a markdown. But the dates thing is definitely different. But let's run this. Again, we're using the same object over and over again, rewriting it and seeing the change. Okay. No new snippets, just more text. I'm going to run this.

Expand that out. Okay. So, we see here there's a date. It just has another line. Sometimes dates are important. So, that's what that does. If you include parentheses around a date, it'll just do that. And here, some nicely styled links. Basically, the underline appears when you hover over. And that's all I want to say about that. But kind of cool that you can do that.

Labels and styled text

Okay. Labels. So, this is maybe getting a little bit too far. But we can actually enclose bits of text with double parentheses or triple parens for different types of labels. Like a rectangular label or a rounded rect label. Okay. So, you can do things like this. So, let's just do that. And run this. Okay. No new snippets. Same as before. It's good to see that it does that, though. We just add more text. And we have to scroll to the very bottom to see that. And, of course, I'm going to pop this out first. Okay. Now, I will scroll. Maybe zoom in a little bit. Additional notes. Data types, factor, numeric, integer. That's based on this right here. This section we added additional notes, data types, the subsection. And it just has text with the triple parens making, like, these rounded rectangle labels. Sometimes good to have.

Great. Okay. Style text. There's many more things you can do. You can also add in double angle brackets. You can put in some CSS. Just a quick way to style it any way you want. These are just a few suggestions that work well. You can change the color of the text, the background color. Do some text decoration, like, overlines, line throughs, underlines. Change the font style, letter spacing, a border around it. Change the font. Things like that. Bold, italic. The font size. All sorts of things.

Let's try one more example where we take these labels and we do change, like, the border values. Okay. This may be the last one, I think. Okay. Great. I'm going to run this. And there's just so you know, there's two vignettes in the pointblank website that shows all the stuff. Like, basically runs through exactly this, aside from, like, the content here. So, this is, like, pretty easy to find, this reference material. But here. We have these values that are in square, square bordered labels. And they have color fills for each of these. So, this might be good if you want to color code certain things. And not bad for metadata. Especially things like keywords and data types.

So, that's that. Okay. And that's finally it. Basically, we have a lot of snippet functions and we have info snippet, as well, where you provide your own function. Again, we have several provided that you can just use inside the function argument. And that's a way to query the table and to produce text that goes within your own text. You just use curly braces to insert the snippets, the snippet values. And if you print out, you always see this sort of thing. Incorporation started. You have to use incorporate. If you see that you still have curly braces in your text, that means you didn't use incorporate or maybe something worse happened. But that's usually what happens or what doesn't happen if you don't use incorporate.

And that is kind of it. Hopefully, this is a way to somewhat easily get data into sort of like an object that you can publish and provides information for other people that might want to use the data. I'm going to look at questions. And that really is like the end of my content here. And now I think for the last 5, 10 minutes, I will just take questions. It could be any question. Doesn't matter with me. I will answer them.

Q&A

And in addition to questions, there's actually some really funny comments in here. Like mean underscore D smells like glue. It is kind of like glue. I think it might even use glue under the hood. So basically don't use glue again, essentially, with any of these text snippets.

Okay. Got another question. What are some of the more important use cases of generating HTML data documentation? So if I publish data in a package, I might write a core document about it. That's kind of it. It might be for things like colleagues, people you're onboarding with datasets, especially if you have core datasets that, you know, they update frequently. And understanding it is very essential to carrying out your work. This is like a, quote, unquote, nice thing to do. But I also want to make it easy. And there's actually some functions which I haven't covered in here which allow you to get allow you to do the bulk thing of like use a data frame filled with descriptions of columns and then apply them to the columns in one pass. And you can imagine combining that with Excel, for instance. Someone might start in Excel, grab that data into a data frame and then use this function to get those pieces of text instead of using info columns multiple times. Which is fine. But there's other ways to do it.

So got a question here. When pointblank is used on database tables, are the calculations performed in the database or is the data pulled locally first? Definitely the first one, in the database. We do everything with dbplyr, which is loaded at dplyr load time, and we don't purposefully collect any data before everything is summarized down into the final results.

When pointblank is used on database tables, are the calculations performed in the database or is the data pulled locally first? Definitely the first one, in the database. We do everything with dbplyr, which is loaded at dplyr load time, and we don't purposefully collect any data before everything is summarized down into the final results.

Eric said my adventures with pointblank can be seen. I love this before, and I have to admit, I almost forgot about it. But it's wild. I loved it, and there's a link to it right now in the webinar chat to go see it. And I do recommend seeing it.

Okay, well, thank you for joining us today. And if you don't mind, I may ask one or two more things, because it's not every day we get to have you talk to us about all things pointblank. I've always been one of those people that subscribes to learning by doing. And so that little table contest was a great nudge to take your previous materials. And now, obviously, with this material, it's even more of abundance of materials to choose from. And one thing that pointblank really opened my eyes on is that this could be a great use case for kind of a CICD situation where maybe you're getting data updated every month, every week, or whatnot, but you have the same checks you want to run every time, right? And you don't want to do that manually. So being able to take what you've taught us here, putting it into a set of R scripts that could be called on like a cron job-like fashion on GitHub Actions, I think is a real game changer to what we deal with a lot in our day-to-day here.

Yeah. And I've known people that do that. And there's a whole thing I haven't talked about, which is the YAML workflow. You can express all these things in YAML if you want to. It's actually more shareable with people that don't use R and don't want to see R function calls. With one function called YAML interrogate, I believe it is. So that's a good way to run in production because it sort of splits. You have these very committable YAML files and just one R script which calls the YAML file with the data, which is pretty nice for a CI.

Yeah. And maybe not quite a related question, but would you say there's type of data that you wouldn't use pointblank for? Are there any types of variables or types of structures that you would say, nah, probably not a good fit? Probably non-tabular data and basically any data you do have in tables, which are things like list columns, doesn't really handle that. Unless you especially, and you can handle it yourself. I mean, you can do pretty much anything you want with that one function, which gives you carte blanche. Just write your own UDF, your own function to create a validation. As long as you give it the results that it expects, which is a list of, sorry, a vector of logicals or a table with a final column is logical. You can do pretty much whatever you want and whatever you can imagine in terms of validation.

Excellent. And I really love the idea of having these HTML based reports. I know I've been on a crusade in my day job efforts to get away from static, you know, Word document reports and whatnot. So having HTML to be able to get to these details in a novel way, I think has been a huge, huge help for making pointblank more mainstream in our data checking needs.

Well, that's awesome. That's great. Yeah, that's what I was thinking too initially. Although I didn't really know until people started using it and yeah, got reports back. That's actually good.

Does it matter if the source table is grouped? I think it either conveniently ignores it or ungroups it at the beginning. I know we handle it at some point, but I forget the actual behavior. I think it does matter, but it's handled is my question. If you have grouped tables and it does adversely affect a validation, let me know. And we can always provide some options there.

And I believe you touched on it earlier, Rich, but it sounds like Parquet files are just as supportive as anything else for pointblank, right? Okay. In the Python version, yes. They may work just by virtue of dplyr using them, but probably not. But I haven't had a report and I haven't actually done the legwork myself to see whether Parquet files do work. Like that they're verified to work. So that's a thing to do. But if you do Python, yes, the answer is yes. But we haven't talked about Python much here.

Have you thought about dplyr and using this with some of the new LLM agent-based tools? Have you thought about that piece for pointblank? Yeah, I thought about it. And that stuff, unfortunately, it's there, but it's in the Python version only. The Python version was developed later, just in the last year compared to this. So it applies with Python. There's stuff missing still. Like all this informant stuff, the data documentation stuff is not in there yet. But in the Python version, if you want to use AI, you can. There's quite a few avenues for that. There's a way to sort of create through LLM's validation plans, which is not dissimilar to draft validation. And there's also an assistant in there for just like talking and reasoning through. You describe what you want. It knows the API. And you describe which validations you want. And it will just like suggest them to you. And you put them in. There's also another validation function in that called prompt. And what you do there is you just provide text on what the check is. So a validation step is essentially a prompt. But you're basically using English to validate certain parts of your table with that function. Not in the R version yet, but I plan to get those things in due course into the R version.

Rich, I'm excited for next year. Highlighting it, bringing it more to the public. What are some things you think we can do with pointblank for next year? I'm planning workshops, talks. I may also create some videos as well. I'm planning for great tables and other packages as well. It's pretty much just that. Just getting information out there. It could be in multiple forms, but people just got to know about this. I think it's great to check your data and you'll sleep better at night.

Do you have any early packages that you made? Not one of the first, but it's definitely early. It started in 2017. I think my earliest package was somewhere in 2013 kind of time. I believe I did some packages to do with atmospheric modeling initially. It's cool because this package was based on some work you were actually doing. I had data quality issues myself. I don't want to just throw a bunch of SQL at the problem. That's annoying. I had tons of tables. I'm going to not do my work and write a package. A month later, I'll do the work. I got it spun up pretty quickly because it just didn't have all the good reporting. GT wasn't a thing at the time. Report tables were using different R packages that were available circa 2017.

I don't know if you saw at PositConf, there's a workshop called Data Science Workflows. Pointblank was a big part of that. It was actually a TA for one of those. The very first one in Chicago. The idea that it uses pointblank to validate data and then trigger emails automatically that basically let you know what shape your data is in. Based on the pointblank reports, actions can get triggered. Data looks good, send it off. Data looks bad, further review. It didn't quite cover that. In actions level, there's actually a separate set of arguments that allow you to provide functions. To do enter those thresholds, it's not just visual, you can actually fire off functions. It could be logging, it could be mostly logging or errors. To stop things, Dennis tracks. You can actually react to failures in that way.

Someone mentioned that with Snowflake, because that's a use case that I hear all the time. It didn't quite work early on, but I think since 2021, that kind of time, things got a little more improved. I think there's actually a blog post kicking around somewhere where someone actually did use Snowflake with pointblank for R. Yeah, pointblank, Databricks, all of the data platforms.

I did test it on the R side for Spark or Databricks. I did test it, so it does actually work. Not theoretically, but I verified that.

I'll give you one last chance to post here in the chat. Otherwise, we will see you next year, hopefully, Rich. Yeah, love to be here.

Okay, well, I think that's it. We'll go ahead and end the webinar. Yeah, Rich, you shared your slide, so I think people should be in good shape. That concludes day two of R and Pharma. Then we'll be kicking off the APAC tonight at 9 p.m. They've got a series of talks that will run until about 3 a.m. Then we pick back up here at 10 a.m. tomorrow with the keynote with Simon from Posit, who will be talking about AI. It should be an exciting 24 hours, so hopefully we'll see everybody tomorrow or stick around for APAC tonight. I think that's it. Thanks, Rich. All right, thanks for having me.