Melissa van Bussel - Practical Tips for Using Generative AI in Data Science Workflows
Now that we're a couple of years into the age of Generative AI, it's clear that this technology has the power to transform the way that we work. As Generative AI continues to evolve, the ways that we use these models should evolve, too. In this talk, we'll explore how we, as data professionals, can maximize the benefits of these tools in 2024 and how they can be incorporated into our everyday workflows. We'll also look at creative use cases that might not seem immediately obvious, but that will allow us to combine Generative AI with other data science tools that we already know and love, like Quarto and Shiny. Talk by Melissa van Bussel Slides: https://github.com/melissavanbussel/posit-conf-2024/blob/main/slides.pdf GitHub Repo: https://github.com/melissavanbussel/posit-conf-2024
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
My name is Melissa van Bussel and I'm a Senior Data Analyst at Statistics Canada, but if you know me, you probably know me as the person who's been posting a lot of YouTube videos about Generative AI. In the last year and a half-ish, I've spent most of my free time making videos and giving talks and workshops about how to use Generative AI as an R programmer.
Now initially, this all started because I was a bit fearful of Generative AI. I was really afraid that large language models were going to take my job, or worse, take over the world, and so I wanted to learn more about how these models worked and I wanted to understand how to best use them. And the more that I started to learn about Gen AI, the less afraid I became. And in fact, the exact opposite started to happen. I started to become really excited about this stuff. I became a Gen AI fanatic and it sort of became a special interest for me for a while. I became completely convinced that this was going to permanently change the way that I thought about programming and about data science more broadly. I wanted to talk about it with anyone who would let me, and whenever one of my friends or someone online would be skeptical about how useful the technology actually is, I was the one jumping into the conversation and proclaiming that this was the next big world-changing thing.
And in some ways, I was definitely right. There's no denying that Generative AI has forever changed the world as we know it. But even on a more individual level, this really did change my world. My videos started getting a lot more views than they ever had before, and people started reaching out to me more often, and my network started to grow really, really fast. And then I got the email from Hadley asking if I would talk about Generative AI at this conference, and I was ecstatic. It truly felt like a dream come true. I was going to get to talk about my favorite topic at my favorite conference, surrounded by the best possible people.
But as I've been preparing for this talk over the last six months or so, something has happened. And I'm a little bit embarrassed to admit this, but I started getting tired of hearing and talking about Generative AI. I ended up becoming Gen-AI fatigued. The more that I used these tools in different projects and in different contexts, the more walls I hit and the more frustrated I felt. And once this light switch flipped for me from fanatic to fatigued, it started to become much harder for me to have these conversations. And this definitely wasn't helped by the sheer number of posts that I would see on social media that were like this, many of which were using really aggressive marketing tactics. It was just really tiring. But as I've been preparing for this talk, I've managed to find some ways to get really excited about this stuff again.
I have a feeling that most people in this room have probably felt one of those three feelings at some point. Or maybe you're like me and you've even felt a combination. And as a side note here, if you feel like you don't see yourself in any of those three categories, then it might be because I probably should have called the last one Gen-AI hesitant. But then I wouldn't have had three words that all start with the same letter and my overly organized brain just would not have been okay with that.
Either way though, I recognize that we're all in a different place in our relationships with generative AI. And so I've tried to design this talk in a way that there's something for everyone regardless of where you're at. If you're a Gen-AI fanatic, then I want to talk about the latest and greatest improvements that we've seen recently and why you might care as someone who works with data. If you're Gen-AI fatigued, then first of all I'm sorry. I'm really sorry that you have to listen to an hour-long talk about Gen-AI, especially right before dinner and especially after a full day of listening to talks that you're probably way more excited about than this one. But if you're AI fatigued, then I want to show you some creative use cases that you hopefully haven't seen before that might make you start to feel a bit excited about Gen-AI again. And finally, if you're AI fearful, then I want to share some best practices for working with Gen-AI tools more responsibly that will hopefully help ease some of your worries.
For the Gen-AI fanatics: GPT-4o capabilities
So this first part of the talk is for the AI fanatics in the room, but before I dive into this section, I want to talk a little bit about what it's like to be a fan. Now I'm Canadian and I come from the country's capital, Ottawa, which is home to the world's largest skating rink, the Rideau Canal. So naturally, as you might expect, I am a huge fan of Love is Blind. Now for those who don't know, Love is Blind is a Netflix dating reality TV show where contestants try to fall in love with each other and get engaged without ever seeing each other face-to-face. It's a really silly concept for a show, but it's definitely a guilty pleasure of mine. Whenever a new season is going on, I become completely obsessed and I fully immerse myself in the Love is Blind subculture.
Now this is obviously partially because I like the show, but it's also because my social media algorithms become completely overtaken by Love is Blind content as soon as a new season is out. I'll like one Instagram reel about which couples are the cutest together and the next thing I know I'm deep in a YouTube rabbit hole watching videos made by a therapist who psychoanalyzes each contestant and their relationships. Point being, whenever there's new content to consume, I'm consuming it. This obsession with wanting to know the latest and greatest is something that members of all fandoms have in common. Whether the thing that you're obsessed with is hockey, Love is Blind, or Gen AI.
For the Gen AI fans, this part is for you. We're going to be talking about the latest and greatest advancements that we've seen recently. In particular, I'm going to be focusing on OpenAI's GPT-4.0 model, which was just released a couple of months ago back in May. GPT-4.0 is OpenAI's first model that combines text, vision, and audio capabilities all in the same model. This means you can do things like upload images, documents, and even CSVs and interact with all of them in the same chat interface. You can also generate and then download outputs in all of those formats as well. The other huge difference with GPT-4.0 is that the model's available even on the free tier subscription. While the features I'm going to talk about here aren't technically speaking new, they might be new to anyone in the audience who hasn't been paying for a GPT Plus subscription.
One of the features that I've been finding really cool about the model is its image processing capabilities. I can take an image on my phone and then I can upload that photo into the chat and then ask questions about that image. I would have loved this technology back when I was in university because I often found myself taking pictures of math that was written out on chalkboards or whiteboards. Now going from an image that has math in it to a digital format that has math in it is a breeze and it's pretty accurate too. So with this prompt here I'm asking the model to take the text from the image of the whiteboard that you saw before and then convert that into a Quarto HTML document all while keeping the text formatting the same. The model is able to do that on the first go and is able to correctly identify where the headings should go and where the math should go. And the file that you see here is actually what it generated and this works on the first go without errors and if you render it this is what it looks like. And I copied and pasted that directly from the actual Quarto document that was generated so it really did turn out looking like that. I just made things a little bit bigger so that it wouldn't look so funny on these big screens.
Even with more complicated handwriting I've been really impressed with the model's ability to read and process math. So with this prompt I'm not only asking it to convert the image into a Quarto HTML document but this time I'm also asking it to add styling so that the colors in the document match the colors of the pens that I was using. And once again the model has no trouble doing this for us.
Using images as inputs also comes in handy if you've got data that are printed out. And if you've ever had a job doing data entry you're definitely going to appreciate how easy GPT 4.0 makes this. You can digitize the data and then ask questions about that data all in the same chat. And the data that you see on the screen here is entirely fake. It was actually simulated using GPT 4.0 and it's just something that shows the annual salary for a bunch of made-up job roles and industries. And I'm going to be using this data set in the next few examples.
So first I'm asking chat GPT to convert that table into a more usable format. A CSV is generated and then I can download that CSV from within the chat. Not only can you see an interactive preview of the data set but you can also modify it on the fly too. So you can do things like highlight different columns or cells and then provide data cleaning or data formatting instructions. You can also do things like simulate additional observations including for variables that don't even exist on the data set. So here I'm asking it to simulate additional observations so that there's a total of 1,000 rows in the data set. And then I'm also asking it to create two new variables. One representing the number of years of experience and the highest level the other one being the highest level of education attained. Whenever you make modifications to the data set the downloadable CSV is also going to be updated accordingly to reflect those changes. And then if you want to know how those modifications were applied you can click on the view analysis button to see the underlying code that was run to generate those changes.
And while it's awesome to be able to provide data to the model in unconventional ways like uploading an image or simulating data you can of course provide data sets in more conventional ways as well like uploading a file from your computer or connecting to OneDrive or Google Drive. Once you've got some data that you're ready to analyze doing so is as easy as asking. You can ask for things like a short overview of the data set, summary statistics, or other exploratory analysis. One of the prompts that I've been using and finding really helpful is asking the model to extract key insights and then pull those key insights together into a short paragraph. And I realized that the data set that I'm using here is entirely fake so those insights not interesting at all. But you can see how something like this would be a great starting point if you were analyzing real data.
With the advanced data analysis feature you can easily make quick visualizations like scatter plots and box plots, heat maps, to name just a few. And with certain types of data visualizations the plots are now interactive. So that means that you can do things like hover over any of the data points or change any of the colors. Once you're happy with how the plot looks you can download it as an image. And if you want to create more complicated data visualizations you can do that. But the model doesn't currently support interactivity for data set or data visualizations that look like that. But with that being said you can still make stuff that looks like that and have it appear in the chat. It's just going to show up as a static image instead of an interactive plot.
And at this point I know there's probably some people in the room who might be thinking it's gonna take my job. Like this is exactly what I do at work. And I know that that's a really scary thought. But what I want to show is that doing this actually takes quite a lot of data analysis expertise. Prompt engineering is a skill that's going to go hand-in-hand with your existing data science and programming skills. Rather than one that's going to replace the other. And the reason why I say that is because the key to getting these plots to look right is to be really really specific and detailed in your prompts.
Prompt engineering is a skill that's going to go hand-in-hand with your existing data science and programming skills. Rather than one that's going to replace the other.
In the next few slides we're going to build up this plot that you see here which is a bar chart that shows the average annual salary grouped by job role and industry. But we're going to do this using just one very detailed prompt. And the way that I like to do this is by building things up in the same way that I would if I were using ggplot. So I like to think about building the prompt up layer by layer using the grammar of graphics but by using natural language instead of code.
Okay I hope everyone's ready because you're all about to become professional prompt engineers. First we tell the model the data that we'd like to use and here I'm using that job salaries data set from the example from before. Next we're going to specify which aesthetics we want to map onto each axis and I'm putting average average salary on the y-axis and job role on the x-axis. And then I'm also asking for each job role to have a different color on the plot. Next we're going to need to specify which type of data visualization we'd like to create and here I'm trying to create a bar chart but specifically I want to create a bar chart that's grouped by or faceted by the industry variable. We can also add other statistics to the plot as well for example a horizontal line showing the average salary for each industry. And then the variable that's on the y-axis is a dollar value so I want to add some formatting here by getting some dollar signs and some commas. And then I want this plot to look nice but I don't want to spend a bunch of time fiddling around with themes and color palettes so I'm going to ask chat GPT for help with this. So here I'm asking for a modern color palette and a minimal theme and then I want the legend to be displayed on the right side of the plot. All in all this is the final prompt and it's very long but it's also specific enough that it's going to give us exactly what we want. And with just that one prompt GPT 4.0 not only shows us the code that's necessary in order to generate that visualization but then it's also going to execute that code and display the image in the chat.
And this is because you can now run Python code from directly within the chat and this is a huge improvement over what we've seen in the past. Because now you're going to be able to see right away whether or not there's errors in the code and you don't need to copy paste the code into an IDE in order to execute it. And if you want to make iterative updates to the code you don't have to worry about switching back and forth between different windows. These are just a few of the ways that GPT 4.0 can be helpful for data analysis but the possibilities are truly endless. And if you want to learn more about the model's capabilities you can check it out on OpenAI's website. The examples that I've talked about so far are mostly for quick and dirty data exploration but you can also use OpenAI's models programmatically in your scripts as well which I'll talk about a little bit later.
For the Gen-AI fatigued: creative use cases
Okay so for this part of the talk I want to address the people in the room who are tired of hearing about generative AI. Now there's different reasons why you might be feeling this way but if you're anything like me then your relationship with generative AI might feel a bit like the experience of getting a new kitchen gadget. A couple of Christmases ago I was super excited to be gifted an ice cream maker. The advertising on the box said it was only going to take 20 minutes to make the ice cream so I thought this was going to be a really fun and convenient way to have ice cream whenever I wanted. But when it came time to actually make the ice cream it turns out that it takes 20 minutes after you've pre-frozen the bowl by putting it in the freezer for a full 24 hours. Definitely not as straightforward as I thought it was going to be. And now this gadget that I was once super excited about isn't getting used at all and is just collecting dust. It doesn't really feel practical enough for regular use.
I tried out the ice cream maker a few times and it kind of worked but eventually I just realized that buying the pre-made store-bought ice cream was a lot easier and honestly took about the same amount of time. And you might feel the same way about generative AI. Despite your initial excitement the lack of practical convenient use cases that you can integrate into your everyday workflow might have you feeling like it's just another overhyped kitchen gadget taking up counter space. Instead of letting it go completely unused though I want to show you some creative use cases that you hopefully haven't seen before that might make gen AI feel as exciting again as it did that first day that you used it.
This section is going to be a little bit more technical than the first so I want to start by giving a quick preview of what I'm going to talk about before I get too deep into the nitty-gritty technical details. First I'm going to talk about a Quarto reveal.js theme generator that I made. Next I'm going to talk about an easy way to use gen AI to make hex sticker designs and finally I'm going to talk about some tools that make teaching and communicating about data a little bit easier.
When working with data there are often things that we do that are really important parts of the process but that end up taking way longer than we want them to. It's often been said that the vast majority of data analysis is actually data cleaning but I find for myself my process looks a little bit more like this and I know that I'm not alone. Making things look pretty is something that this particular community is really really good at and if you're someone like Allison or Megan or Emil then you probably don't mind spending a lot of time on making things look pretty but if you're someone like me and you're not artistically talented but you still care about making things look nice then you might find yourself wishing there was an easier way. So if you're someone who finds fiddling with CSS and color palettes to be kind of tedious then I want you to know that you're not alone and I want to share with you some ways that I've been using generative AI to cut down on the amount of time that I spend on making things look pretty specifically when it comes to Quarto.
I've given quite a few talks over the last few years which means I've spent quite a lot of time on making Quarto slide decks look nice. This is entirely a me problem because every time I give a new talk I want the theme to be something that people have never seen before and I also want it to be something that goes really well with the content of the slides themselves. So like this talk for example when I was picking out a theme I wanted to make sure that I picked something that was like fairly modern and minimal looking because it just would not have made sense to give a talk about AI and use Comic Sans. Now because I'm giving talks I want to make my slides publicly available to other people and I want to do this in a way that's fully reproducible where people can access not only the slides themselves but also all of the underlying files that generate them. Whenever I make changes to the content of the slides I want all of that to be version controlled and I also want the public facing version of the slides to be automatically updated as well. So this makes github pages perfect for the job and if there's anyone in the audience who's wondering if I used Quarto and github pages for this talk the answer is no because this slide deck has over 220 slides in it most of which just have images on them and if you've ever made a Quarto presentation before then you know that that would have taken way way too long but in general I do try to use Quarto and github pages for presentations.
Now if you've never used github pages to host a Quarto project before then I think the best way that I can describe it is that it kind of feels like assembling a puzzle. The end result is super satisfying but you might spend hours searching for that one missing piece and the reason why I say that is because the file structure and folder structure that you need to use in order to get everything working correctly is honestly it's a little bit complicated but the good news is that once you've figured out how to successfully assemble this puzzle once you're always going to know exactly how it all fits together.
Whenever you make a new Quarto theme and you post it using github pages this folder structure that you see here is going to be the exact same every single time. And if you want to make a new slide theme it's really only a handful of files that need to be modified. So you're going to need any images that you want to use for the slide backgrounds and then you're also going to need to make some modifications to the custom SCSS file. Within the custom SCSS file it's actually only a handful of lines of code that need to be modified and in particular it's really just those strings that you see highlighted there in yellow that need to be modified at all in order to create an entirely new looking theme. And that's because if we change just those strings that are highlighted in yellow then that means that we'd be changing the main font family and the color palette. And those are the styling elements that are going to make the largest most noticeable differences.
Now we know that the folder structure stays the same each time and we know that all we need to do in order to create a new theme is generate some new images and make some minimal styling changes to the custom SCSS file. And we also know that generative AI is pretty good at generating images and at writing code. So we can combine all this information together and conclude that creating Quarto themes is a use case that's perfect for generative AI. So with all of these facts in mind I developed a Shiny app that starts by setting up this consistent file structure and then it uses gen AI to create new images for the slide backgrounds and then the app uses chat GPT to generate styling suggestions and then update the custom SCSS file accordingly. The app looks like this and if you want to use it all you need to do is enter in a phrase for the theme and then provide your open AI API key. Once you click on the generate my theme button all of the files that are necessary for the project are going to be generated for you. You can then download those files and if you push the repo to github then the slides are going to be automatically hosted using github pages without you needing to modify any of the files. So everything's already properly configured for you.
So for example I used the phrase abstract purple minimal and here's the resulting slide deck that was generated for me. And I didn't modify any of these files so this is actually what it produced. And I know it doesn't look as pretty as it would have if someone like Allison or Megan or Emil had made it but I think it looks pretty good. If you're interested in trying this app out for yourself I'm going to share the link to it in a couple of minutes but I first want to talk about how generative AI is being used here. So first the background images are generated using the image generation endpoint from the open AI API and then the prompt that's passed to the model is a concatenation of the user's input phrase and the words desktop background.
And you might be wondering oh actually forgot about this slide. So if for example let's say you know that you want to create like a softer color palette for your theme and so you decide to use the phrase pastel chalk. Then what the app is going to do is it's going to take that phrase and then modify it so that the words desktop background are added and the phrase that actually gets passed to the prompt or to the image generation model is going to be pastel chalk desktop background. And you might be wondering why that's necessary and the reason is because if we don't include the words desktop background then the model is probably going to try to create a really photorealistic image like that. Whereas if we include the words desktop background then we're going to get something that looks a lot more appropriate for background images in a slide deck. From here the app then changes the opacity of the images to 20% so that if there's any text on the slides it's actually going to be visible. To generate the styling suggestions the code looks really similar to what we saw before but this time it's the chat completions endpoint that's being used. And then the prompt that gets sent to the model is super long and it says based on the user's input phrase recommend a Google font, a color palette consisting of three colors, and a Pandoc syntax highlighting theme. And then the key part of the prompt is the stuff that's at the bottom there and here I'm specifying the format that I want the results to come back in. And I'm asking for it to come back in a JSON format so that the results are going to come back in a way that I can use them programmatically instead of getting back a big long paragraph. And then finally this information is then used to populate the custom SCSS file. And if you want to see the details on how it gets populated then you can scan the QR code on the left and that's going to take you to the Shiny app on shinyapps.io. From there you can try the app out for yourself or you can take a look at the underlying code that generates it. If you want to see a video tutorial on how to use the app and how to upload everything so that it is all working with GitHub pages then I've got a video on my YouTube channel which you can access by scanning the QR code on the right.
Hex stickers with generative AI
When I was preparing for this talk and trying to decide what to include in it I messaged a lot of different people to get their perspective on what they thought might be interesting to this particular audience. A while back I was chatting with one of the other keynote speakers, Hannes, and he said that he really liked the idea of talking about practical applications of Gen AI in data science. And he also humorously shared with me that he wasn't quite sure how a cat picture generator could ever come in handy to someone who works with data. And when I saw that message I thought to myself, yeah you know what that's totally fair. There's definitely not a ton of obvious ways to use a cat picture generator as a data scientist and if I'm being completely honest the only way that I'd ever seen image generators being used in my everyday life was in a group chat at my workplace for our R and Python user group or R pug for short. One of my colleagues Jude would regularly send these hilarious AI generated images of pugs and everyone in the chat would have a great time trying to find everything that was wrong and broken with the images. And while this is a great use case but definitely improved morale, Hannes' message did make me wonder, is there a more practical way for people who work with data to use a cat picture generator?
So I thought about it and I want to share with you an example of exactly how a cat picture generator might come in handy to this audience. So imagine you've just created an awesome new R package and you want to make a hex sticker which means you need a logo to represent the package. And for the sake of this discussion let's say that this package just so happens to be themed around cats. If you want to make a logo for a package then you're probably looking for an image that's got a transparent background and that can be scaled fairly easily without becoming too pixelated. There's now text to SVG generators that are perfect for exactly this. You just enter in a phrase and then an editable vector graphic is created for you. There's a ton of different services that offer this but if you're looking for a free option then ReCraft AI might be a great solution for you. Super easy to get started. You just enter a prompt and then you specify a few options. You can set the height and width ratio. You can optionally specify a color palette. You can set how simple or how detailed you want the image to be. And then you can choose from a ton of different art styles. And when I say a ton I mean a ton. I only put a few on this slide but ReCraft AI has over 20 different styles for generating vector images. Some of the styles are definitely better suited than others depending on what you're trying to use the image for. And I find that if you're trying to make a hex sticker then the cartoon style works pretty well. And just like with other image generators you can do things like fine-tune or generate variations. And then because the image that's been created is a vector graphic rather than a raster image you can also easily change any of the colors. Once you've got an image that you're happy with you can easily turn it into a hex sticker design. ReCraft AI also has a lot of other really cool features too including the ability to generate mock-ups which is awesome if you want to see what something's going to look like before you order a bunch of them.
Tools for teaching and communicating data science
Next I want to talk about two tools that make teaching and communicating about data a little bit easier. The first is a tool called Scribe and Scribe allows you to automatically create step-by-step tutorials without needing to copy paste screenshots or record any videos. This is super helpful if you want to teach someone how to do a programming task. As a simple example let's say I want to show someone how to make a new R project using Posit Cloud. So first I would start the capture using the Scribe Chrome extension and then Scribe is going to start automatically writing numbered instructions based on what I'm doing in the capture. Anytime I click somewhere or type something the screenshot is going to be automatically generated for me and then the location where I've typed or clicked is going to be highlighted in orange. If I don't like the way that any of the descriptions have been written I can click on the instruction to modify what it says. Once the capture is stopped you can use Gen AI to create a title and a description for the tutorial again both of which can be modified if you don't like them. Once you're happy with the tutorial you can share it for free with others either by a direct link or by email or you can embed it as an iframe for example in a Quarto website. Now using the Scribe Chrome extension is free but there's also a pro subscription available which gives you access to things like the desktop version of Scribe and the desktop version is awesome if you've got something that you want to show someone how to do using an IDE that you only have installed locally on your machine. The pro subscription also comes with the ability to export in a variety of really useful formats including Markdown.
The last tool that I want to talk about that I find really helpful for teaching data science is called Descript and this is honestly probably one of the coolest things I have ever used. I've used Descript to edit every single YouTube video that I've made in the last three years and this has saved me countless hours of video editing and that's because if you use Descript you can edit videos or podcasts as if you're editing a Word document. So you just upload the video or audio file into Descript and then it's going to be transcribed for you. Transcription is available in over 20 different languages and it's really impressive, impressively accurate, sorry, even for technical words and phrases. Once the transcript is ready you can edit the transcript in order to edit the video itself. So if I've got this paragraph highlighted here and then I delete that paragraph it's going to automatically trim the video to the correct locations. With the click of a button you can do things like remove awkward silences and filler words which is great if you're someone who's guilty of saying um way too much.
Descript also has AI speech generation capabilities and you can make a clone of your own voice in as little as 60 seconds. So if you don't like the way that a word sounded when you originally recorded it you can use your voice clone to regenerate the content. And if you don't like the wording that you used and you want to change that up altogether then you can use the overdub feature to replace it without needing to re-record the clip. Descript also has a lot of other really cool AI power features as well. For example they've got an eye contact feature which is going to make it look like you're making eye contact with the camera even if you're not. So if you're someone who makes videos using a script then you don't need to memorize that script. You can just read it directly from your screen and it's going to make it look like you're looking directly into the camera anyways. If you're a perfectionist like me and you often find yourself re-recording sentences when you make videos because you want them to sound perfect then you're gonna love Descript's remove retakes feature. Descript is going to automatically detect your repeated takes and then you can remove the bad takes instantly. A feature that I've been using a lot and finding super helpful is the YouTube description generator. So from your transcript, Descript is going to generate a YouTube style video description including detecting where chapters should go and then providing the timestamps for those chapters in the generated video description. So if you're someone who loves talking about or teaching about data science but you've been feeling overwhelmed about the amount of work that it takes to edit videos or podcasts then I really recommend giving Descript a try.
For the AI fearful: responsible use
If you didn't find yourself fitting into those first two categories then maybe you resonate more with the idea of being fearful of generative AI. Like maybe you're afraid that AI powered machines are gonna take over the world and our AI overlords are gonna keep us trapped in a vat of goop forcing us to live in a simulated universe for the rest of our lives. But I know that I'm giving this talk to a really technical audience so I think most people in this room probably have a pretty intuitive sense of what these models are and aren't capable of. So I know that fearful isn't exactly the right word. But really though this part of the talk is for anyone who wants some guidance on how to use these tools more responsibly.
The government of Canada has put together a guide on the use of generative AI and the target audience for that guide is for public servants. And I'm a Canadian public servant so I personally think the guide is pretty good. But I think that regardless of what you do for your job or what context you're using gen AI in, this guide has a lot of really useful tips in it. You can find the guide at the link at the bottom but that link is super long so it's definitely easiest if you just Google guide on the use of generative AI and then click on the link that's from the government of Canada. This guide is super detailed but they've also developed a really convenient acronym that highlights the key principles. The acronym is FASTER and it stands for fair, accountable, secure, transparent, educated, and relevant.
FAIR means ensuring that outputs are accessible, inclusive, and comply with human rights. Many of these models have used publicly available data from the internet as part of their training data which means that these models have the potential to produce outputs that amplify historical biases and stereotypes and other types of harmful information. So it's really important to always manually review outputs and remove anything that's biased, non-inclusive, or discriminatory. Gen AI should never be used to make decisions about an individual that could impact them materially or legally or lead to a discrimination in services. Doing this could violate human rights but it's also against the terms of service for most of the popular tools. So for example OpenAI has a policy that prohibits its users from using the models to make decisions about an individual related to things like credit and housing and employment. And then Google's also got a Gen AI prohibited use policy which contains some pretty similar terms.
Accountable means ensuring that outputs are accurate, legal, and ethical. To ensure that outputs are accurate they should always be manually reviewed by a human to make sure that it's factually correct. Large language models shouldn't be used as search engines and they also shouldn't be used for tasks that you're not already somewhat skilled