Data Science in the Energy Industry | Frank Hull | Data Science Hangout
To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We'd love to see you! We were recently joined by Frank Hull, Director of Data Science and Analytics at ACES, to chat about forecasting energy demand and prices, managing over a thousand data models, full-stack data science, and advanced machine learning techniques for time series analysis. In this Hangout, we explore the necessity of managing a vast number of data models in the energy industry. Frank's team at ACES oversees over a thousand models (nearly 2,000, actually!), a staggering number explained by the complexity and fragmentation of the wholesale energy market. The United States is divided into various Independent System Operators (ISOs), each possessing unique regulations and diverse resource mixes. Each of ACES's 40+ portfolios can operate in different geographical areas within an ISO, presenting distinct challenges that necessitate individual modeling. These models are used to simulate a wide range of time horizons, from the next hour or day-ahead market to long-term financial planning and infrastructure decisions spanning 25 years. This intricate modeling helps in understanding hourly price shapes, demand patterns, supply mixes, and evaluating the effectiveness of new energy generators or hedging strategies, all with the goal of lowering variable costs for cooperatives and mitigating critical risks like blackouts during peak demand. Resources mentioned in the video and zoom chat: Tidymodels → https://www.tidymodels.org/ Orbital Project → https://orbital.tidymodels.org/ U.S. Energy Information Administration (EIA) Open Data → https://www.eia.gov/opendata/ Kuzco R Package → https://posit.co/blog/kuzco-computer-vision-with-llms-in-r/ If you didn’t join live, one great discussion you missed from the zoom chat was about handling imbalanced binary classification models. Participants discussed why techniques like SMOTE might not perform well in production with real-world data, shared experiences with alternative methods such as standard up/downsampling, and highlighted challenges in maintaining prediction accuracy in deployment despite strong training results. Let us know below if you’d like to hear more about this topic! ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co Thanks for hanging out with us! Timestamps: 00:00 Introduction 04:05 "What's ISO?" 08:20 "What are your go to models for analysis in the energy field?" 10:48 "Do you tend to use traditional stochastic models for time series analysis or more of the recent ML methods?" 13:30 "What is a full stack data scientist? What's the overlap between a full stack data scientist and something like an ML engineer or a data engineer?" 18:38 "Is there a specific data science skill set that's needed to get into energy analysis?" 19:59 "What is the portfolio model?" 23:36 "How have you found convincing regulators and other stats oriented stakeholders to trust and believe your AI fancy machine learning models that they can't really dive in and and prove to themselves that that's being statistically valid? Or have you found some good ways to demonstrate that?" 26:50 "Are there any good examples of open data in energy?" 27:54 "How are you keeping on top of the documentation for all of these models? Over a thousand models is a lot. Is there any learning you could share from that experience to help other people keep on top of their documentation?" 30:33 "How would you suggest handling missing data in time series forecasting?" 33:10 "Do you see long term electricity prices decreasing in the next twenty five years due to the abundance of renewables like wind and solar in lower population areas?" 35:14 "Do you have any career advice?" 36:50 "How do you see data science evolving within the energy industry?" 38:39 "How do you keep up to date on new packages?"
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hey there, welcome to the Paws at Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12 p.m. U.S. Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience. Can't wait to see you there.
Well, with that, I am so excited to be joined by our featured leader today, Frank Hull, Director of Data Science and Analytics at ACES. Frank, I would love it if you would introduce yourself. Tell us a little bit about you, what you do, and something you like to do for fun.
Yeah, yeah. So I lead a data science team in energy, which is a lot of fun. We get to solve a lot of difficult tasks. Also, you know, for fun, I have a lot of different hobbies. I kind of think of them as, like, cyclic. So right now I'm really into gardening. It's kind of taking over right now just because I'm in the middle of summer in Colorado. But other than that, you know, I spend a lot of time outdoors. So hiking, climbing, camping is a big part of my life.
Also really into playing guitar. I've been learning drums over the last few years. I'm kind of a musician first before I ever got into math or data or anything like that. Maybe I always try to think of it, I'm like, why do I like number patterns so much? I think it's just because I studied guitar tabs so much in middle school and high school, just to learn new songs. And then I started noticing number patterns and I was like, oh, I'm going to get into math. But other hobbies, I kind of went back and forth from homebrewing and doing other stuff like that. I have two sons, so three in one. So mostly playing with monster trucks and Hot Wheels in my free time.
And then, you know, I'm also, like, been working in open source software a little bit here and there, kind of building packages out in R, posting those on GitHub and just kind of talking about it on Blue Sky or X or something like that. So kind of like a hobby of many hobbies. And then, kind of goes back to my team, I think we're all kind of full stack data scientists. So we all do a little bit of shiny, do a little bit of data engineering, do some machine learning and everything in between.
Understanding the energy industry and model scale
Awesome. Well, I want to dig in a little bit to what you do and give everybody a little bit of context so that they can ask good questions in Slido. Because not everybody kind of knows a lot about energy markets or the energy industry. And when I learned what you do, I was shocked because you told me that your team oversees over a thousand models. And I was like, excuse me, what? Did you just, like, was that a verbal typo? Like more than a thousand models? Why do you need that? But then you explained it to me and it made perfect sense. So I'd love it if you could talk a little bit about that.
Yeah, totally. Yeah. So, like, we're in wholesale energy. So, like, ACES is an alliance of cooperatives for energy services. We have a whole kind of menu of services that we offer from front office to back office. I'm in front office. I support our portfolio management team, our portfolio strategy team, our portfolio modeling team. We also have resource planning and fundamentals and a capacity team. And then we kind of, like, bounce ideas back and forth with the transmission team. So there's, like, all these different teams that are, like, supporting these co-ops. And really, we're trying to lower their variable costs, but also understand the risk associated with the portfolio.
And, like, you would think you could set up, like, one model, but, like, just like the whole fact that the United States is split up into different ISOs and they have different regulations, they have different resource mix, they – each one of our portfolios could be in a different area of that ISO, and that brings on different challenges. So, you know, we're simulating over 40 portfolios, and then each of those have their own – maybe more than one load. And whenever I say load, I guess it's – so, like, load is being, like, a demand on the system. So it's kind of a supply and demand problem. And you need to serve this load. And they might be in an area of the country that has tons of wind. They might be in an area of the country that's, like, natural gas and solar and battery. And all of these things create different issues with the way that you can kind of create a supply mix to serve that load.
We have a quick question to interject. What's ISO?
Yeah, it's an independent system operator. There's a lot of acronyms in energy. So I'll try not to say too many, but please stop me if I do. Yeah.
So I think that the gist is there is so much variation across a region, across a town, between towns, not only in the energy that they demand, but also the energy that is available to be consumed, right? So you've got to model – you've got to model, for example, the San Antonio metropolitan area where I live. Like, how much are we going to need? And you go out a pretty far time horizon, like 25 years, right?
Yeah, yeah. So whether you regulate it or not, you kind of want to have an idea of what's going to happen in the next hour. Or then you start thinking, you're like, I want to know what's going to happen tomorrow, so there's a day ahead market. And then when you're doing financial planning and analysis, you want to know what happens next year. But then when you're talking to a city council or talking about putting new steel in the ground, and you're trying to defend whether or not to build a new combustion turbine in this area of the country, you kind of need to simulate out 25 years to understand what the hourly price shapes would be, what the hourly demand might be in that area, what the supply mix is going to be, and then whether or not that particular hedge or new generator would even be useful in that area of the country. So you start like, I want to know what happens in the next five minutes, the next hour. I want to plan for tomorrow. And then you start getting everyone in on the company, and you're like, okay, well, we need to plan years ahead of time.
And the independent system operator is also a market, so it's trying to also help that planning. But there's also non-ISO areas of the United States too. So regardless, you kind of don't want to have a blackout. So how do we mitigate that? And also- What happened in Texas in 2021?
Yeah, yeah. You don't want to run out of supply when there's high demand. So that's a huge issue, especially if demand is increasing across the United States.
One more context-building thing that Frank had told me the last time I talked to him that I thought was really interesting was that some regions have surplus energy, and that the energy price can actually be negative. You could have a price for energy that is zero. That kind of blew my mind a little bit. If there's an area where there is a bunch of wind electricity being generated, but there aren't a lot of homes there consuming that power, where does that power go? And how do you make sure that it ends up where there are high population density areas that really need that? That's a really tough problem to solve, isn't it?
Yeah, yeah. And you can think of this as not anyone's fault, but wind just blows more in Kansas, Oklahoma, Minnesota, but not a lot of people are in those areas. So then you have to build transmission to get it to the demand. So, Kansas City being really far away from western Kansas where you see tons of wind farms, so how do you get that wind power over to the people? And within the markets, it's designed in a way so it's going to desensitize supply where there's already tons of supply and no demand. So, you'll see prices go negative around renewable generation where there's a pocket of renewables and they haven't quite built out the transmission.
So, yeah, you have to – you start simulating, you come up with like a vanilla model, and it starts to work for everywhere, and then you start running into these edge cases and you have to build new models to handle these different nodes. So, that's where you end up thinking I have one model to simulate top-down. I'm just thinking total load of the system, total wind of the system, total solar penetration of the system. And then you're like, okay, I got a pretty good net load model for the entire system. But then you start getting into nuances of, hey, I need to simulate this one node for this portfolio that happens to be a wind turbine surrounded by other wind turbines. And you're like, how do I create a price shape for that location? It can be negative quite often. So, yeah, there is this really interesting markets plus energy, plus physics, plus engineering that all goes into it. Yeah.
Go-to models and machine learning approaches
Yeah. Yeah. And for anybody wondering, Frank's background is physics, but this does seem like a lot of finance, economics, mechanical engineering even, a little bit in there, electrical engineering. Well, we have so many good questions in the slide out. They are stacking up like crazy. So, let's see if we can get through some of them. And the one that I see that I think would be great to start with is from Edward Leland. Edward, are you available to unmute and ask your question live?
All right. Yeah. Sounds good. Hey, I'm also a physics background person. I was my undergrad. I was a double major in physics comp sci. Good to know other people got out of physics and are doing comp sci stuff. I don't know. It broke my brain. Anyway, sorry. Here's the question. What are your go-to models for analysis in your field? Because you got like 1,000 plus. Do you have to evaluate every problem individually, or do you have a class or type of models that you get to use for a lot of different applications?
Yeah, that's a good question. I think you have a class by maybe fuel resource type. So, we have a few engineering models for wind turbines, and that might change depending on what kind of turbine lane is on the particular wind farm or what the manufacturer is. Same with solar, depending on where it's located and you have an understanding of the kind of solar irradiance in that area and what kind of panel it is. We have pretty good engineering models for those as well. When it comes to kind of coming up with like demand profiles and price shapes, those can – you can start trying a few statistical methods, but I think you end up kind of doing a few different machine learning methods. So, that tree-based, rule-based, linear models, polynomial models, those work pretty well depending on how you kind of feature engineer.
So, I think that's kind of where it really helps to work with the business. So, like, that's where my team's kind of working with all these portfolio strategists, portfolio management teams, where they might see and have a hunch on, like, hey, this is what's driving this congestion. I think that's what's driving this negative pricing. And then you're working with them and kind of feed that into the model, and then the model picks up on the price patterns. So, that's kind of our methodology. My team, we're using tidymodels for 80 percent, 90 percent of the models that we're building, doing everything from scratch. So, yeah.
Yeah. Yeah, there's a few other resources in Python, Julia and R. NREL, they make a lot of toolkits. And then when you're – sometimes I talk – I say model a lot, so I'm always saying, like, regression model, but then there's the portfolio modelers that have a portfolio model. And then I'm talking to IT about models, and they're, like, talking data model, and I'm like, okay, I'm talking about regression model here. The portfolio modeling team has their own software as well. And that's kind of putting together all of the PNLs off of the loads and the supplies and stacking that and coming up with the average variable cost and risk associated with it.
Stochastic models vs. machine learning for time series
Because he had asked in the same vein, like, do you tend to use traditional stochastic models for time series analysis or more of the recent ML methods in time series analysis? I'm also curious about whether you have used LSTM, which is long short-term memory, for longer-term predictions and how that has performed. Yeah, yeah, that's a great question.
You know, we've moved away from Monte Carlo. So, I would say 10 years ago, I was doing a lot of Monte Carlo stochastic simulation. So, I think the problem is like with when you start introducing when you're in a transitional time. So, you went from just like, all right, is coal or is natural gas cheaper? Then we know which plant is going to run and we can kind of simulate those commodities and come up multiplying by the heat rate of that power plant, come up with a power price. And you're kind of chasing that. Then now we have like, we started introducing wind, it's taking over like 50% of the world and then you start adding solar and then batteries. And then you have like these price shapes that just for Monte Carlo simulation are just the tails are too fat and long. So, we start doing machine learning everywhere for stochastics.
So, starting with, you can start really simple, just take like one weather day and just trying to understand like what one weather day, what are the options for a wind profile, a solar profile, and a load profile. And then you can come up with the net load and the subsequent price shape. Then I say that one weather day, then you can just do that over a thousand times and you have a thousand load solar and wind shapes. So, I think that's kind of like what we're moving towards as an industry standard. I see that across other companies. There's at least like 7 out of the 10 companies that I know are moving towards that are already there. And then that's where we are, and we're providing the service for over 45, 46 energy companies across the United States.
So, I think everyone's more inclined to like saying like a billion dollar extreme weather event disasters keeps increasing over time. That's like, I think all the risk is associated with weather. So, trying to understand that and starting from that framework instead of just a simple Monte Carlo simulation. That's a great question. And then LSTMs, we don't use those. I work with Purdue and their geodata science department on testing out a few in MISO. Just the complexity and then the computation time. And then I just don't think for us, we don't have our data like normalized and stationary enough for that to be scalable across all the data. A tree-based model or root-based models better for that, in my opinion.
A tree-based model or root-based models better for that, in my opinion.
Full-stack data science and career paths
Very good to know. This is really good information for anyone who's like, where can I do this in practice? Like I learned all these things in grad school or undergrad, and I kind of want to do this. The energy industry might be a place for you, right? We had a couple of questions in the Slido that were about like what is a full-stack data scientist? I think Noor had asked and Abigail as well. But I wanted to give sort of you the chance to talk a little bit about your background as well because you told me that you learned. This is such a backward experience from what I normally see for everybody. You learned like Python in undergrad. You got out into industry and you learned Excel in industry. You were like confronted with huge Excel files. And then your IT department asked you if you know R. Now, every IT department that I have ever talked to, I have been trying to convince them to let me work in R, not the other way around.
So, if you could talk a little bit about how that happened for you. And then the question from Abigail and I think from Noor as well was like, what does full-stack data scientist mean to you? How do you explain that you're a full-stack data scientist to someone without confusing them? What's the overlap between a full-stack data scientist and something like an NL engineer or a data engineer?
Yeah. Yeah. Those are all great questions. I was scrolling through here and I didn't realize how many people were here. But I was looking for the person that recommended R, see if he was in here. I don't think he is. He might be. I could have skipped over his name. But yeah, we have a short-term analytics team and he had led analytics services, part of IT, kind of being that arm to help out our term training and stuff like that. And he looked over my shoulder one day and was just like, hey, I see you messing around with VBA. Have you ever heard of R? And I was just like, no, I've never heard of R. I was someone that if I ever had to find the max in a vector, I was creating my own algorithm look for a max. I didn't realize people use the form, just max in Excel and use max in R. I was like, whoa, you already have half of the problem solved for you.
Yeah. So from physics, I was learning C, Fortran, Python, and I got into a business and I was like, what is... Excel was a way for me to do data entry in a lab and then take that back and write a report. I never had thought that someone could build a full model in Excel or... I didn't know what SQL was. I was like, what is this? So I learned all of that and data science wasn't like a degree or a program when I went to college. So it was something that I happened into because I kind of really liked simulations. Like that was what I wanted to do.
So at that point, got into industry, I already felt kind of like I had a little bit of everything going on. I hire people like that, that you can see can do multiple tasks. And I think of our whole team as being like a knowledge sphere. If someone would come in and add to that knowledge sphere, that's great. And then if they already know other pieces of the knowledge sphere, that's awesome. And I think we can all share that knowledge sphere with each other. And that's where it's like, I think I might have our manager of data science here. He's an ML engineer pretty much, but like he's a manager of data science because he's seen enough across the whole spectrum of data science.
And then like my lead data scientist who might be in here, I saw him of himself. That's like a funny character in this group. He's almost more like also a data engineer. He's also a plumber, building plumber APIs all the time, building shiny applications. He's also like scheduling and automating tasks on Cron with our new data analyst that we just hired. And she's been doing unit testing and she has more of a software engineer background. There's like study stuff that I've never learned. So it's like we can come in and all have like different backgrounds that like teach each other the other aspects.
I have another data scientist that we just hired. So I like my team's growing this year. So I have quite a few new people, but she had been a data role where in a company that everything came back to her. And from agile development to like what data's in the database to how do I build a predictive algorithm on top of this? So I try to like, I think like full stack is like, maybe you have like, you're maybe not perfect and shiny, but you know how to build a shiny app, at least with the proper tool. It's like, maybe you can't do it from scratch, but you know how to use shiny assistant and like, and like get an application going on top of your machine learning algorithm. Maybe you don't know a ton of machine learning, but you can lean on someone else and learn quite a bit on the team. And so then you end up within a couple of years, you know, you know, a little bit of data engineering, a little bit of ML engineering, a little bit of shiny application development, and then like launching an API and doing automation. So I feel like every person has a strength and weakness too, so.
That's great to know because there's a question, there's an anonymous question in the Slido that says, is there a specific data science skill set that's needed to get into energy analysis? So time series, but like, is there anything else? It sounds like there's room for a lot of different skill sets. Yeah, yeah, there is. So like if you are just really good at SQL, I think there's opportunities. And then my team, you know, my team is leaning more towards machine learning and working with the business, but there's other data scientists in our company that work under IT and like handle other processes, you know. So I think if you want to get into energy, you know, I think it's kind of just like a little bit, maybe learn a little bit about simulation, a little bit about like Monte Carlo methods, that'll get you in the door, knowing some SQL. There's just so much, like so many different avenues to like get you into the door too, because if you have an economic background, you can start off as a real-time trader and then kind of like edge your way into a company that way too.
Convincing regulators and documenting models
My question, I've had experience trying to convince regulators to use machine learning models in situations that are very statistics heavy in the energy industry. And so my question is, how have you found convincing regulators and other, like, stats-oriented stakeholders to trust and believe your AI, fancy machine learning models that they can't really dive in and prove to themselves as being statistically valid? Or have you found some good ways to demonstrate that? Yeah, those are good questions.
I guess it depends if you're regulated or not too. And I guess are you trying to, like, convince an ISO? Or are you trying to, like, defend your method in front of a city council?
My experience has mostly been in verifying energy efficiency measurements, savings associated with energy efficiency measures. But I'm more interested in your experience, both talking with regulators and other folks that are maybe more less regulatory driven, but also, you know, have an interest in ensuring that your forecasts or predictions are accurate.
Yeah, yeah. I mean, it kind of depends. Like, for the most part, I haven't had too much, like, pushback for any of our methods. You know, we have developed white papers. So, like, that's something that I think, like, if you do have to defend yourself in front of, like, a city council or going to an ISO and trying to, like, submit your long-term load forecast and saying, hey, I deviated from your methodology a bit because I was trying to capture this. And then, like, trying to, like, have that presented in a white paper, I think that that goes a long way. And I think for me and telling you, like, white papers are really hard to do too, especially if you're, like, constantly changing your algorithm. So, using, you know, packaging everything and using version control goes a long way. And then, you know, making it so, like, the white papers automatically get built within the package down is really nice.
So, like, I've been going from, like, over just the last few years, you know, just going from, hey, I wrote this algorithm in a script. This is a really cool script. Here, take this script. And then, like, people start editing it and trying to change it for their one goal and purpose. And I was, like, oh, crap, now I have to manage all these scripts. And then, you know, going and then being, like, oh, well, maybe I should make this into a function. Eventually, we're just, like, we needed to package everything. So, I think just starting there first and just, like, defending everything with a white paper really goes a long way. So, I think that's probably the best way to counteract that.
And I wanted to ask kind of an overarching one or two for the last little bit here because we usually ask a career advice question. So, there was a question anonymously asked about, you know, are there any good examples of, like, open data and energy? Thomas had answered a couple of different options. I'm going to put all of those options in the chat here.
Okay. So, there was one other question that I am also curious about, which was, like, how are you keeping on top of the documentation for all of these models? Over 1,000 models is a lot. Is there any learning you could share from that experience to help other people keep on top of their documentation?
Yeah. Yeah. And I feel like at least for, like, a white paper standpoint, it's, like, having it for, like, kind of your overall process. And then for, like, detailed down to the model, try, like, a few different tricks, like model scorecards and stuff like that. Making something and I would say that this is constantly changing, too. So, like, if anyone else has ideas to share, like, that's totally cool. But, like, keeping, like, an ID for every single model and then having, like, all the information there that's, like, valuable to your team, but also maybe to the strategist, maybe to the modelers. So, like, what was the start and end date of your training data? Simple stuff like that helps. What was the out of sample score on your test data? Stuff like that. And then you can kind of save off model objects and come back to them if you need to.
And I know, like, some industries are way ahead, like, in health, like, care and stuff like that where they're using targets all the time. I know that that can be beneficial in some places, too, where you can also come back and see when the model was trained and if the new data wasn't added, then you don't need to retrain the model. So, you basically just need to build data catalogs for everything, which was kind of, like, eye opening, too, for me, too, because, like, I came at it, like, oh, we can handle this and then, like, have that technical debt and have that conversation with the team where, like, hey, no, we actually do need model scorecards and be able to have, monitor the model health. We thought we could skip over it, but you'll come back to all these things later. Even, like, functionizing and putting every function into a library, like, that saves you so much time.
Yeah. I heard Josh had said in the chat, like, using Quarto websites to document the groups of models, that's such a great way to use Quarto that, like, I hadn't thought about yet. I don't know. I heard the same exact thing, but I know Julia Silgay led a workflow demo on model cards. Is that similar? Oh, I don't know. It's, like, you know, that's a, we have our own, like, model card, but, like, yeah, using Bedivere and pins is, like, another route that you can go. I'm also, like, signed up for this workshop for the orbital. I might use that some places. I don't know. Do you all use Snowflake or no? No, we do not, but I still think there's, like, quite a bit we can learn from that whole concept. Yeah. That's what I've been trying to tell people. They're, like, oh, I don't use Snowflake. Maybe that's not going to apply to me. I'm, like, just go learn about the orbital part of it.
Handling missing data in time series
And Connor has a little asterisk saying, please ask this for me, so I am happy to. So, Connor Tompkins asks, how would you suggest handling missing data in time series forecasting? For example, if there is a blackout, that demand is censored. So, would you impute the missing data with another time series model and then fit a final model based on that, like, actual plus imputed data? That's a good question. Or do you impute at all? We do have some tools to impute, but I hardly ever use them. But the team could probably interject and be like, Frank, don't say that. I use it, like, every day.
No, there's, like, it's a two-sided issue where, you know, you have issues and zeros being, like, associated with energy data is, like, all the time, like, daylight savings time. It's very annoying. And then, you know, or daylight savings time is very annoying because it doesn't skip an hour, but, like, duplicates, like, just sums the total of the two hours, and you're just, like, you need to, like, smooth some spikes out of data. Like, EIA data is sometimes horrible in that regard. Like, it's just submitted to the EIA, and it's just, like, it's a free data source, so I might have used it a few times, but the amount of cleaning that you have to do is immense.
But, you know, like, for zeros, it kind of depends. Is it one hour? All of our models are hourly. So, I didn't really preface that, but, like, all of our models are pretty much hourly models. If it's one hour versus, like, two weeks, we use different imputation technique. So, you know, if this is one hour, like, you can just kind of smooth it with the average of the last two on either side. That's pretty naive, but it's pretty much acceptable. But, I mean, if you're missing, like, a week or two, then you might need to come through and, like, impute that with a separate model altogether, and then depending on, yeah, if it's one day versus two weeks, how complicated of an algorithm you're going to use. Right, yeah, and there's also some pretty good papers on SMOTE, and maybe why it's not the greatest thing to use. So, those are things I would go look out for and read.
Long-term electricity prices and the future of energy data science
Do you see long-term electricity prices decreasing in the next 25 years due to the abundance of renewables, like wind and solar, in lower population areas? What a good question, Ross.
Yeah, this is something I've talked with many people from many different companies. I'm waiting to see who's right, but I think overall it's not a secret. I think everyone thinks it's trending downward, but there's also ancillary service prices, too. So, I skip over those a lot, but if wind's serving your load and then you have a lull, what solves for that lull in wind? It could just be for 30 minutes. So, you need to have some sort of antithetical response to that to keep the frequency of the grid constant. So, right now, that's normally natural gas. In the long term, you have wind and solar. You have so much solar. You have so much battery that you can start to flatten out prices. Prices should trend towards zero because the energy cost is zero, but then there's like the maintenance of the solar and the wind and the transmission grid that does cost money.
But I think when you start – I think what will happen is you start layering in and everything starts trending towards zero, but also gets more volatile, too. So, like, maybe prices are typically zero but could bounce around to $100. I don't know. I just pulled $100 out of my hat. So, any kind of – like, hey, we're typically averaging $30 now, standard deviation of one. Maybe it goes to zero and standard deviation of two or three. So, we'll see.
Career advice and what's exciting in energy data science
Yeah, career advice is, like, interesting. So, like, I feel like I've always led data scientist, mathematician, statistician kind of people. So, I am typically lean introvert. So, I just say, like, get out of your chair throughout the day and, like, just go talk to people. Tell them what you're doing. Just, like, go and talk to an executive. You're going to feel so awkward and uncomfortable, probably. Just get up from your desk and go and, like, quit playing with data for five minutes. And then also, like, the other thing, too, is, like, you know, these machine learning algorithms. Like, I could spend, like, months trying to tune a model. Or I can just say, hey, it's good enough. So, like, understanding the business. The business is probably, like, I just need it now. And if it has a rate of 6%, let's just go. Because, you know, if it takes you three more weeks to get down to 2%, like, I don't need it by then. I need it today. So, like, understanding what your business threshold is for that accuracy and knowing how to communicate to them, too. So, like, business leaders like to use MAPE. We don't train our models with MAPE, but when I communicate to them, I say MAPE. And MAPE for anybody is M-A-P-E, mean absolute percentage error. So, that's, like, accuracy in a forecasting model. And sometimes it can be a little bit more accessible than some of the other things that people in the data science side might be using.
So, I just say, like, get out of your chair throughout the day and, like, just go talk to people. Tell them what you're doing. Just, like, go and talk to an executive. You're going to feel so awkward and uncomfortable, probably.
And the last one to lead us out here, Zach asked a fantastic question. And I think it's a good one to talk about, like, what you're excited about. It is how do you see data science evolving within the energy industry? Is there stuff that you think that's happening that's cool and new? What are you excited about? Yeah. Yeah. I think it's, like, constantly evolving. Like, I think, like, 15 years ago, you wouldn't need to build those complex models to handle the simulations. And then 15 years from now, I have no idea what might occur in energy that might make it twice as hard, but we'll have twice as much compute. So, we'll see what happens. But, you know, like, also another crazy thing in energy right now is bringing data centers back to the United States to train artificial intelligence, which creates so much demand. And then it's like, okay, well, now I think, like, the next five, 10 years is, like, how do we build on enough supply to serve this demand? So, it's constantly changing. There's always new risks and all these new problems to be solved.
I think Thomas brought up energy efficiency. Like, that's a huge problem in itself. It goes all the way down to a house level and trying to simulate, like, every single appliance in someone's house and, like, understanding how that changes over time. So, that's great.
Yeah. This is a good follow-on to this. If anyone did not attend Sajay Suresh's episode for Microsoft, Sajay is a Senior Director of Applied Science and Data at Microsoft, but Sajay works on forecasting demand for these data centers, and that's a really complicated and hard thing to do. His episode was fantastic, so go watch that one. And then I wanted to give Rachel a chance to ask a question and also mention really quickly that Frank also does open source package creation, and he, I think, Cusco just came out. I love the name Cusco because I love the Emperor's New Groove. I don't know if that's why it's called that, but it's a package for R to help with computer vision, right? Yeah, yeah, yeah, and I don't know if Isabella's here, but thanks for the mug.
Rachel, what did you want to ask? Oh, I was just curious, Frank, because you mentioned a few different packages, like tidymodels and Targets, and you heard about the Orbital package too. Like, how do you keep up to date on new packages? Especially for me working at Posit, it's always really helpful for me to hear, like, how people find out about things. Yeah, I don't know if everyone wants to know that I stay up all night just, like, surfing all of Posit's websites and keeping track. No, like, you know, like, sometimes now that I'm more familiar with GitHub, sometimes I, like, will peek at one of the packages and see if it has any merges lately, and I can kind of get ahead of, like, a CRAN release, which is kind of fun when you're on to something like that, like Mirai with her. It was something that my team and I was, like, watching for weeks, and I was, like, is it going to get posted to CRAN this week or next week? Like, when's it going to happen?
You know, so, like, I'm on Blue Sky. I'm constantly watching any new releases from, like, Simon. I follow him, I think, everywhere. Simon's on the tidymodels team, but also leads, like, tons of LLM stuff, so. I just saw Simon's here, too. Simon's in the chat. That's awesome. Oh, weird. I've never actually talked to Simon face-to-face. Simon, I guess we have to unmute you.
Yeah, so, like, just following a few of the software engineers across Posit is a great resource. Just following the boots on the ground, going straight to the grassroots, and just, like, following the people they're building. I think that's where I get most of my information.
Just following the boots on the ground, going straight to the grassroots, and just, like, following the people they're building. I think that's where I get most of my information.
I love it. Say their names, and they appear. You never know who's out there in the Hangout, right? We have over 100 people. Usually, we have about 150 people lately, so somebody might be hanging out and listening. Well, I want to wrap up and let everybody know who is coming next week, because I am so excited to have Jenny Bryan next week. Get your Laptop on Fire merch ready, and come hang out with Jenny and everybody, and we'll talk about all kinds of good things. Get your Posit Conf registration in. There is still time to book tickets, to book flights, to book hotel rooms. If you cannot attend in person, come hang out with me on the Discord and attend virtually. I promise it's still going to be amazing and fun and a great place to connect. And if you think that the chat was super valuable today, you can go save it. Click the little three dots in the top right of your chat and go to Save Chat. That will let you keep all of your resources. But when these videos end up on YouTube, I do try to get all those resources in the description for you as well. So if there's something you've missed in the past, you can go find it. Thank you, Frank, for hanging out with us today. Thank you, Rachel and Isabella, for helping behind the scenes. Thank you so much, Frank. Bye, everybody.