Bryan Butler | R in Marketing - Survey Design | RStudio

Transcript#

This transcript was generated automatically and may contain errors.

This is only going to be about a 30-minute talk, so we should have plenty of time at the end for any kind of question, discussion, or whatever we want everybody else wants to talk to after all of this. And this is a project that I've worked on over the years, and it keeps coming back. So that's one of the reasons why I decided to bring this up with a marketing group.

I've been in and out of marketing groups for the past few years, and also worked in marketing research. And so a little bit about me, I am a head manager of analytics data science at a small bank in Boston. I started out in chemistry and then got an MBA, did a lot of work in the insurance industry doing some pretty exotic stuff, got heavily involved in Monte Carlo and pricing, and then got involved in quant banking consulting. So I think, as you can see, like every other data scientist, I just went the exact straight route out of college into data science.

Overview of the project

So what we're going to talk about here is what this project is all about, and I'm going to try to give it to you in a way so that you can take this to your next marketing project. Talk a little bit about what good survey design is, or what I think it is.

There's an interesting bit of intellectual property that the ability to work on called customer quotient, similar in a way to NPS, but also very different. We'll look at some parameters to constraints, then getting involved in the machine learning part of how do you actually do machine learning with survey data.

So the role of this was, how do you take a survey, because that's what a lot of people in marketing research do, and develop a segmentation engine out of it, so that as people complete your survey, you can put them in the right segment for whatever, and your segments can be based on whatever. In this case, this was a sort of a behavioral psychographic segmentation, which means it wasn't so much, you bought product A, B, or you live in this part of the world, or wherever. This is more about how you feel and how you behave.

And so it gives you a little different approach to sort of the different dimensions of marketing. And it was looking to reveal, this is, again, this is part of a much larger project, how people view companies. And the other way to think about it is, at the end of the day, it rates how customer-centric a company is. And that's how the questions were sort of phrased.

Good survey design

So these are some of the things that I've learned about what makes a good survey. And these are just my, these are my takeaways from being involved in surveys and trying to apply machine learning and, you know, how do you get it, how do you ensure you get a lot of respondents, et cetera. So the first thing is, I want a goal of, like, what is the survey design to answer? And then you're going to want to have some supporting questions. So I was thinking structurally, that's how we want to start it. And the goal, you shouldn't have more than three things you're trying to answer. I usually say one to two is good, just in case, you know, you pick up some other ancillary information. And these goal statements are binaries, yes, no's.

My first experience was dealing with a survey and people are like, well, how likely are you to purchase from us again? And so you end up with these neutral, you know, everybody's in the neutral range, and it doesn't really help you out. Or what do you do when you get a handful at the positive and a handful in the negative and a handful in the neutral? Do you drop them? Do you include them with something else? So this sort of resolves that issue. It's a simple yes, no. With that, the supporting questions, you want as few as you can get away with, but still get the information that you want. So, you know, I like to think maybe you're between five and 10 questions. If you start getting over 10 questions, you're just getting too much, you're asking for too much information, and you'll find that people will drop out over time.

The flip side around it is in the marketing research world, you do have the ability to pay people with, pick your incentive, Amazon gift card, PayPal, whatever else, they will stay in for longer, knowing that if they finish it, they'll get some money at the end. And I've done a couple of these, I've done many of both, and there doesn't seem to be a difference in engagement as far as like, you get different answers if you pay people.

And then here's a key thing. I like my supporting questions. So we got our one binary, like now instead of how likely are you to buy from me again, is I have bought from you again, or I will buy from you again, yes or no? And then we're going to ask a bunch of questions that support it. And I think everybody's seen these, strongly agree, agree, neutral, et cetera. I do not like that scale. One, you don't get as much variability in the modeling phase, whereas one to 10 gives you plenty of variability. And I think if you go over more than 10, people don't know what the numbers mean.

And just as a little aside, I worked in the wine industry for a while, and you know, they have their 100 point wine scale. But the funny thing is 99 point, whatever percentage of the wines are ranked between 80 and 100. So what they really have is a 20 point scale. And so I don't know, they disguise it as a 100 point scale versus 20 point scale. Same thing here. 10 points will give you enough variability.

And then the other thing is, you know, people always say, oh, we can ask a question a different way to make sure that we don't have people gaming the system. Maybe if you get this giant survey that are like a personality test or something like that, but that's a whole different animal here. If you're surveying your customers, they're not going to, they're not going to do the duplicates. And if they start seeing the duplicates, it'd be like, this is a waste of time. And it's also a waste of your time, too. Like every question matters.

And I've seen some other indices out there where they have like, they try to decompose like a net promoter score, which into three indices, and it's kind of like, the three indices are so incredibly correlated that you just wasted two questions asking about the other two indices when they're like 90% correlated. So it's kind of like, you really have one index. In the market world, I know anybody who's worked in marketing has heard of net promoter score. It's kind of a hybrid because it does ask on a scale of one to 10. But you can also, but at the end of the day, you can make people binaries out of it. In fact, there is a, you know, people that score you nine and 10 are your promoters, six to eight are your neutrals, and five and below are your detractors. And the score only works with promoters and detractors. So it's a very unique scale. But when you do NPS, you should really ask, you know, get into the whys.

And then a good rule of thumb is to try to get about 1000 responses. Because you can do some really good modeling. I know everybody thinks, this is a world of big data, I need 10,000, I need 100,000, I need a million. No, you don't. You need a well designed survey. And if you get close to 1000 responses, you will get a reasonable amount of information.

You need a well designed survey. And if you get close to 1000 responses, you will get a reasonable amount of information.

What's interesting about surveys, is that if your questions are distinctive enough, then even if you get smaller segments, you will be able to predict them. So for instance, you know, if segment one is questions one to three, and you know, segment two is maybe a question three and question four, but then segment three is this oddball question, question five, even if it comes out small, and you get into this highly imbalanced data scenario, the fact that question five is so unique, it will help even with the prediction in an imbalanced situation, because it's not random.

I said, Look, if you increase your score for this question by one point, the probability that they are one of your promoters or, you know, frequent buyers, and everything goes up by 5%. So it's a very quantitative approach to being able to say, how much do I need to move the lever?

And I'll say in the NPS world, and I was doing this in my last company in healthcare, it was, it's a lot easier to make a person go from a detractor to a neutral, than it is to take a neutral person to make them a promoter. And so as you think about ranking your your actions that you're going to take, and so you're in, in this case, your dependent variables, they're promoter or they're not, and you rank the questions of importance, it'll tell you what's driving your answers. You should focus on ones that are the low hanging fruit, but they're also easy to get them out of the detractor hole. And that's why actually, we did that in a couple places. And it worked, our executives actually had in their bonus plan, that their NPS had to go up by four or five points. And this actually created a bit of a an action plan for them.

Summary and wrap-up

So what did we do here? So we started out with our elastic net. We got the question, we got the survey down from 17 to nine questions. That's good. Obviously, the elastic net is very good. I looked at and I didn't show these here, but we looked at CV, ROC curves, confusion matrix, I showed you confusion matrices, just so we could get the few extra stats out of there. We have a model for each segment, we use our voting approach. Overall, the overall accuracy was over 85%. And they said that was the client said it was acceptable. And they didn't have any, we didn't have to come up with penalties for misclassification. Making a false positive or false negative on a segment is from a market standpoint, it's not the end of the world. Maybe if you're spending lots and lots of money on something, it may become more important. But in this case, it was really looking to understand what's going on within their customer segments.

And that's it. So this, so if we look at it, you can take a survey, make a binary question out of it, apply clustering, maybe like KNN better than hierarchical or something like that. Get your segment labels and then apply some machine learning against it. And if you get about 1000 respondents, as you've seen here, even with 900, it'd be reasonably accurate. And you can do this to all any survey. And this is just I do this on a regular basis, over and over different surveys, try to tease out some really good quantitative aspects of it.

Q&A

Awesome. Thank you so much, Brian. I see there's a few questions that are starting to come in right now on Slido. But just want to let people know if you want to unmute yourself or raise your hand on on zoom, you could ask questions live to

Lee, I see you had asked the question earlier in the zoom chat to Brian, I believe it said, do you reverse code? Or any other item types? Would you want to add any other context to that question? Can you hear me? Okay. Yep, I can. So earlier, when you were talking about item types, you talked about Likert scales and 10 point scales and all that. I'm wondering if you use any reverse coding or Thurstone type items to make sure people just aren't repeatedly filling out the same value over and over when you do your assessments.

I don't like to do that. We used to have a large, I used to have large conversations with people, not so much reverse coding, but it comes straight liners, which I think is what you're getting at, you know, somebody, they give everything a five or everything, you know, whatever it is, I look at it. I don't want to throw it out because especially with a Likert scale. It's a, you don't know, that could be a very valid data point. 10 point scale, you tend to see a lot less straight, what they would call straight lining. But it's always part of the EDA. And just because we sat on that conversation all the time.

Thanks, Brian. I can read some of the anonymous questions. But again, if you want to use Slido too, you can put your name in there. And you can ask it live as well. But the first one was, how are the surveys conducted and collected? I've noticed YouTube does surveys in between videos. How did you do these?

This survey was done through a marketing research platform. Our same, actually, most of them were done that way. So when I, so I said, this one is through a marketing research platform, I don't know who you are. And you just plug it away. And then we get it out on the back end, I think, like SurveyMonkey and something like that. I've done this exact same analysis with data from SurveyMonkey also. That was on InsureTech industry. We did about 2000 respondents from all from different groups.

And some of them are just, a lot of them are generally web based, where you've had a transaction, just like you say on YouTube, you know, hey, how did you find this video, you know, give us a rating one to 10 or sometimes, you know, you'll get a note. At the end of this call, you'll be asked to take a survey. And my health care company that I work for had it that way too. So which was interesting, because now you're punching in numbers on a phone. You can still do one to 10. It's just, it's a little different than click in a box, and you don't get comments as much. So then you, you have to ask good questions, detailed questions.

I was curious, if, if you don't know who is answering the survey, and then you're going and making changes through the business, how do you measure, like, what actually made the change in the survey, or if it was like those actual customers or?

Most of them, most of these types are directed at your, at your customers. So you, so you know, your customers. And in fact, I did one recently at our bank. It was a survey. And so we had all their information. And it was actually a rerun of a service. So they ran a couple pre-pandemic, post-pandemic, wanted to see what was going on. But it was highly skewed by the age of the respondent. And, and, you know, you just have to point that out. Like, you know, I'd actually show a slide to our presidents and say, look, this is our real customer base. This is what they look like. This is what our respondent group looks like by age and demographics. They are different.

Bryan Butler | R in Marketing - Survey Design | RStudio

Transcript#

Overview of the project

Good survey design

Customer quotient

Project parameters and constraints

The machine learning roadmap

Clustering and elbow plot

Elastic net and logistic regression results

Summary and wrap-up

Q&A

Featured software#

rstudio