Resources

ChalkTalk: Globalizing Data Science Education with AI-generated Videos (Kene David Nwosu)

ChalkTalk: Globalizing Data Science Education with AI-generated Videos Speaker(s): Kene David Nwosu Abstract: We present ChalkTalk, an open-source tool that converts Quarto documents into engaging educational videos with AI-powered voices and avatars. By adding simple text-to-speech (TTS) and text-to-video (TTV) attributes to markdown files, educators can automatically generate multilingual video content while maintaining the reproducibility benefits of Quarto. At The GRAPH Courses, where we've trained over 3,000 learners globally, we are testing out this tool to scale our video content creation. We'll demonstrate its integration with Quarto and present preliminary findings from our A/B testing with students. GitHub - https://github.com/the-graph-courses/chalktalk_studio Slides - https://www.dropbox.com/scl/fi/flsb81eznl8ypuwpxi53t/Posit-Conf-Prezi-Chalktalk-Kene-David-Nwosu.pptx?rlkey=hxmeyb9vijp0ucgpjqmod5qwt&e=4&dl=0 posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

All right, hi everyone. I'm Kene. I'm going to be talking about what's up there. It's a bit of a mouthful. ChalkTalk. Automating Video Tutorials with Large Language Models and Text-to-Speech. And subtitle, what I've learned about the art of vibe coding. And I'm the Curriculum Director at The Graph Courses. We're based between Geneva, London, and kind of global.

Okay, I'll start with this picture. Which will probably be familiar to many of you as you were traveling over to PositConf. You're at the security checkpoint at the airport and need to remove your laptop and other large electronics from your bag. It's a bit of a minor inconvenience. For me, it's a slightly more major inconvenience.

Because here are some things I need to remove usually from my bag. Here's my big laptop that has a big graphics processing unit. My extra portable display. Usually I have a spare laptop in case anything goes wrong with my main laptop. And I have a camera and a microphone as well. And sometimes big headphones. And after taking all of that out, my bag looks like this. With all of the charging cables. So I still get the extra search at those checkpoints.

And why do I live like this? Well, because this is my job. Hello and welcome back. Welcome to this lesson on lines, scales, and labels with ggplot2. You now know how to select your variables, how to filter your data entries. In this lesson, you will be learning how to pivot data. R is the programming language you're going to use to write code. These three components, data, aesthetics, and geometry. Superior or equal to 25. It's easy to infer the mean. Okay, I think you get the point.

About The Graph Courses

So I work for an organization called The Graph Courses. And that's our homepage there. We teach data and code skills for health and life sciences. So that's a lot of our Python stuff. And we have taught about 4,000 students in our free self-paced courses on our YouTube and our website. Have graduated about 500 students from these boot camps. We do 8 to 12 weeks. And we also do custom trainings for universities and other organizations.

And as part of doing this, we've made hundreds of videos now. And so this means that I have to carry all of that equipment. And I think it is kind of worth it because videos are valuable. This is something we know from research and also just from talking to our students. Here is, for example, a study looking at Gen Z and their learning preferences. And you can see YouTube has a 59% preference versus printed books having 47%. That's from an online poll by Harris and Pearson.

Or you can also look at the effect on learning in higher ed. Where adding videos to existing content gives you 0.88 additional standard deviation of improvement on a range of scores. And that's from meta-analysis of a bunch of RCTs. So videos are valuable. And we know this as well from our experiences with students. But videos are expensive. They're expensive in terms of time spent. In terms of the equipment you need to carry around. And in terms of tears as well.

So I have this mental anguish index of different things that cause me suffering. One is chasing students for late assignments. Another is my art console crashing in the middle of regression. That's a 10 index. And recording and debugging videos is a 50 on that scale. Because it's a perfect union of Murphy's Law, so everything will go wrong. And Hofstadter's Law, which is things take longer than you think they will. And I've had lots of painful sessions recording videos. And my colleagues will also share that experience.

The case for AI-generated videos

And so starting a few years ago when ChatGPT came around, we've been seeing headlines like this. About how AI may be coming for our jobs. And to this, I've been thinking, well, yes, please. Or more specifically, just this part of my job, this extra tedious painful part of video creation. Can AI make it easier? Or maybe take it away completely?

And in particular, there's been a confluence of trends over the last few years. That have contributed to me thinking this could be possible. One, like we know, is large language models. Now, these are not yet competent teachers. They can't yet write very good lessons. But they could maybe get you started on lesson writing. Get you started on video creation. Help you overcome that blank page problem. And then they could maybe create videos for existing lessons as well.

And then text-to-speech has gotten quite good. So it used to sound quite horrible. But these days, text-to-speech has improved a significant amount. Which I'll show you in a moment. And so the question is, could we plug these two things together? Large language models and text-to-speech models. In order to make a kind of automated video creator.

The problem, though, is that the folks on our team, including me, are mostly data people. Not really app developers. But the solution maybe is another trend. Which is the rise of something called vibe coding. Now, what is vibe coding? It's typified by some quotes like this. The hottest new programming language is English. And how I learned to stop worrying and trust the model. The idea here being that models are getting good enough at writing code. That you often don't actually need to read or understand the code to work with them.

And so maybe, not being app developers, we or I could try to start building out this tool that I've dreamed about. And an extra detail is that I knew I would have about maybe four weeks of free time to actually work on this specific task. In preparation for PositConf. So motivation questions for my talk, then, are... Can I vibe code this dream application, an AI video creator, in about four weeks? And maybe more importantly, if I can actually do this, is there still a point to making this app? Because is my job, is programming education still useful? Does it still make sense to be teaching students how to code if I can actually just do this without being a proper web developer?

Can I vibe code this dream application, an AI video creator, in about four weeks? And maybe more importantly, if I can actually do this, is there still a point to making this app? Because is my job, is programming education still useful?

Live demo of ChalkTalk

So now it's four weeks later. And I'm going to try to do a live demo of what I have so far. And then we can judge what the answer to this is. So here's the app. It's currently running on localhost. Let me refresh it. Just clear the cache. And I'm going to open up a new presentation. And we've been thinking about making a presentation or a video on the base pipe. We taught the Magritte pipe a long time ago. And we haven't updated with a base pipe video. So maybe we could use this to try to get started on that.

I have a small issue here. I'll close that. And let me zoom in. And let me just say a single sentence. Make two slides about using the base pipe in R. Now, we're not supposed to do live demos at Posit. So hopefully this goes well.

All right. And you can see that was pretty fast. It's using the Quen4ATB model, which is an open source model that runs fairly quickly. And we can actually edit some of this text if we wanted. And next, I'm just going to generate it with the AI voiceover. And this is going to use 11 labs text-to-speech models on the back end. And then now I can start. Let's see if it works.

Welcome to our discussion about the base pipe operator in R version 4.1 and above. The base pipe operator was introduced in R version 4.1 as a native way to chain operations. It allows you to pass the result of one function directly as the first argument to the next function. This makes code more readable by avoiding nested function calls like traditional R syntax. Unlike the magrittr pipe, the base pipe is built into R without requiring additional packages. Let's look at some practical examples of using the base pipe operator in R. Here's a simple example that takes a vector of numbers, filters those greater than 5, and calculates their mean. Here's a data frame example that selects rows where age is greater than 30, then extracts the name column. The underscore placeholder can be used.

Okay. So maybe some progress. You may notice or I definitely noticed some hallucinations from the model. For example, it had filter with a capital F. And I don't know what get element name is really doing there. So maybe we could use a different model. But the important thing here is that you can actually edit the text yourself. And you can click in and edit the script as well on the back end that the model has written. So it can give you a kind of initial draft.

And then the other thing I wanted to mention is that it's kind of ugly at the moment. So maybe we could ask the AI to make it more beautiful, make it more pretty. And it will go in and try to do a decent job. Yeah, the heading is a bit messed up, but we could move that around. You could also ask it to translate to a different language. Let's say translate it to Spanish. And again, the model should do an okay job there. It has moved the header back where I don't want it. But overall, we've made some progress, I would say.

Reflecting on the demo

Let's go back to our slide deck. Okay, let me make sure I cover those things. I wanted to make sure I didn't forget to do two. Yep, yep, okay, we covered that in the demo. Okay, so going back to the motivating questions. Can I vibe code my dream application in four weeks? I would say not quite, maybe almost, depending on how you define it. There are many things still pending. One is we don't have avatars. And so it's just pure visuals. There's no SVG support, so at the moment you can't really have rich diagrams in there. There's no highlighting for the AI to sort of refer to a specific part of the slide deck. There's many bugs that I tried to hide in that demo that we still haven't fixed. And many other things. Maybe most importantly, we need a water gun to keep students awake, if that's going to be the kind of video that they'll be watching when they take our courses.

But I would say that it's already promising in a number of ways. One is I could already imagine using it as a sort of slide creator, so not yet as a video creator. But the slide functionality seems to work quite well. And maybe we could do short snippet video lessons. Maybe we could use this for some package documentation videos. There's also many languages that we're hoping to expand into, and we could maybe start thinking about using these for those expansions. And maybe also some of you may find some use for this kind of software. And so it's open source. It's on our GitHub, graphcourses.chalktalk.studio. And so do take a look. Do fork it. See if you can use it for your own purposes.

Lessons from vibe coding

Okay, so that's the first part of the talk, which is about Chalk Talk. And I'd hoped that maybe it'd be so good and so ready that I'd just teach you how to use it and show you how to, and it would be fully deployed. But that hasn't been the case. So instead, I'll talk to you about what I've learned about the art of vibe coding over those four weeks. And I'll split this into two buckets. One is reasons to vibe code more. And then the other is caveats to consider.

And here they are. Models improve fast. Iteration is cheap. Apps are safer than stats. And then in terms of caveats, smaller is better. Security, security. And vibe coding is a bad name. So let's jump into some of those.

The first is models improve fast. So this is kind of self-explanatory. I think we really are still in the exponential phase of improvement of some of these coding models. Here is one example illustration of that. This is SuiteBench, which is a popular evaluation framework, software engineering bench, of large language models on their ability to solve open issues on a range of open source repositories on GitHub. And you can see that over one year, from August of last year to August of this year, the frontier models have gone from maybe 33% performance on this to about 75%. So learning software development much faster than I can. And so what was impossible a few months ago is now routine.

And I have been checking in monthly or every few months to see, like, at what point are the models getting good enough where they can actually build these things that I've been dreaming of for me. And so I'd say to you as well, if you've been put off before, maybe you tried some AI coding tools and you found they were not good enough for your use cases, do retry now.

The second is that iteration is cheap. Okay, so generations are quite fast. Models are non-deterministic, so you should retry often. Many times you try out some coding problem and the first solution is bad, so you decide that the model is not capable of that kind of reasoning. But because you can generate many iterations very quickly, I'd recommend that you just retry. In many cases, I would sort of going back and forth with the model, only get to a solution that worked on the 10th or 11th try.

And to call out Positron Assistant, they recently added a way to restore your previous checkpoint. Positron Assistant, by the way, is a great way to get into vibe coding. And so after the model makes some mistake, you can just go back to your previous checkpoint and start again. And here's a prompt that I often use in a lot of my debugging with these AI tools, is I have the model try to identify five possible causes of a specific bug, and then code up a solution for those theoretical causes, and then put each of them in a separate script, and then I go in and test each of those. So you can iterate very quickly this way and get to good outcomes.

And finally, apps are safer than stats. What I mean here is that it's easier to spot hallucinations in software applications than in statistical code. So one reason why I was very skeptical about using these AI coding tools initially is that they can write sort of bad code that you don't spot, and then your analysis is broken and you are in a mess because of this AI tool. Here's two examples of obvious mistakes from a theoretical model. And you would imagine that the one on the left, I've now made the button red, is much easier to spot than the one on the right. I've now written the bootstrap code. But the mistakes are both pretty blatant. But one of those is much easier to spot. And so even if you haven't maybe, if you're not feeling confident enough in the models to be coding up your statistical code, they could maybe be building simple apps for you.

And so if you've been convinced, here is a sort of vibe coding starter pack of some types of software and tools that you could use for this kind of AI-assisted development. One, of course, is Positron, which is bringing in a lot of useful AI tooling at the moment. You could start deploying HTML pages to GitHub. And here are some ideas of things you could build. Interactive learning games. For example, at Graph Courses, we have many multiple-choice questions. And Klaus also talked about having these interactive LearnR exercises. And you could try to vibe code those into sort of more fun games for students. Student work galleries as well. Course websites. If you haven't built these yet because you're a bit scared of the web and app development, then you should definitely try to get into that. If you want to go deeper, there's also these command-line interface tools like Cloud Code and Gemini, Codex, and so on. And here are some links. And I'll put those links as well at the end of the slide deck.

Caveats to consider

Okay, so those are some reasons to vibe code more. Now here are some caveats to consider. We'll go through each of them in turn. First is smaller is better. And so early on in this vibe coding process, maybe when I had one to four files, less than 2,000 lines, I could ask very vague questions or vague requests and get immediate correct responses from the models. But as the code base grew, you start to need to refer to specific files and give very detailed specifications about what you want the models to do. And for mature repos, you will run into a high anguish index. So we had said that recording and debugging videos was a 50 on that index. Letting a model make changes unsupervised on your large code base is off the charts. So as you get to large repos, these models really start to flounder.

And as one demonstration of the kinds of large repos, I wouldn't recommend you use models unsupervised on. Here's PyShiny, which is 1,600 files, about a million tokens. So that wouldn't even fit into the context of the largest or the best models at the moment.

Second is security security. So one area where hallucinations and mistakes are not obvious, even in application development, is security. So you can have very big security holes in your application. And if you vibe coded that, you will not know where those are. And this is known in the software development industry. There have been a few cases of folks who had big database holes, and there's questions about whether those were caused by the rise of vibe coding. And here's one example from my development of Chalk Talk. The models in their testing would often just put all our API routes as a public route, meaning anyone can just use up all of our credits, and I would need to go in and fix that. So that's one demonstration of the security issue. So it still needs deep expertise, and this is one reason why maybe we haven't released or are not planning to too soon release Chalk Talk as a sort of hosted platform that you can use yet until we've had sort of deeper professional security consulting on that.

And then last and maybe most importantly, vibe coding is a bad name. So it gives the impression that it's a very easy thing. It doesn't require any focus. You don't need to know any coding. But I say it still requires focus even to build fairly simple apps, and some one-on-one knowledge of web development will come in super handy.

And then last and maybe most importantly, vibe coding is a bad name. So it gives the impression that it's a very easy thing. It doesn't require any focus. You don't need to know any coding. But I say it still requires focus even to build fairly simple apps, and some one-on-one knowledge of web development will come in super handy.

And then for substantial apps, models need a lot of guidance. So my real history is that I've done a lot of shiny development and some web development massive open online courses as well, so I've learned a little bit of web stuff. And so you can tell from the kinds of prompts I'm using when I'm communicating with these models, lots of references to CSS and body tags and things of that sort. So again, like Klaus, I've been forced to learn CSS as part of my data science career. And there's lots of last-mile issues as well. The models are bad at finding docs, handling auth, and so on and so forth.

Closing thoughts

Okay, so going back to the motivating questions from my talk, can I vibe code my dream application? I would say almost. If so, is there still a point? Is programming education still useful? And I would say definitely yes. Because even as we move to a world where most code is AI-generated, you still kind of need to speak the language of code to specify what you want to the models and understand what you get out of those models. And there's always going to be many last-mile problems that still need humans.

And as a quick analogy to that, should children learn arithmetic, one may wonder? And I think most people would say yes. Because even in a world where we have calculators that can do mental arithmetic, you know, millions to billions of times faster than humans, you still need to speak the language of math to specify what you want to these calculators and to understand what you get out of them. And there's many math problems that still need humans who then need the strong foundations, many of these last-mile problems.

Okay, so closing again, my motivating questions. I can almost build this app. There is still a point of programming education. And unfortunately, I still have to travel with a bunch of recording equipment. And I'll say thank you. Here's some of our team. And here's a bunch of links.