Resources

Making Ecological Modelling Accesible with EcoCommons (Jenna Wraith, QCIF) | posit::conf(2025)

Breaking Barriers: Making Ecological Modelling Accessible with EcoCommons Speaker(s): Jenna Wraith Abstract: Would koalas prefer to live on the East Coast or West Coast of the US? Answering complex questions like this requires robust and scalable technology. Enter EcoCommons, a cutting-edge platform built on an R package that’s scientifically rigorous and designed for scale. EcoCommons efficiently runs millions of models to help researchers, governments, and NGOs make data-driven decisions. Whether it's predicting habitat loss, optimising conservation strategies, or advising policy, this platform turns R into a powerhouse for real-world impact—all while building a thriving community of practice. In this talk, we’ll explore the challenges of running R at scale, lessons learned, and how our community is shaping the future of modelling. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi everyone, thank you so much for having me here today. I'm so excited to be a part of the POSIT conference this year and I really wish I could be there with you in person.

My name is Dr Jenna Wraith and I'm the head of the Sustainable Futures department at QCIF, which is a non-for-profit organisation with a focus on building digital infrastructure for researchers and I lead the EcoCommons project. Today I'm going to talk to you about how we make modelling accessible and the barriers that we have faced along the way so you can hopefully address some of this in your own work.

But before I dive in, it's customary in Australia to first acknowledge the traditional custodians on the land that we're meeting today. For me, it's the Wiradjuri people and I'd like to pay my respects to elders past, present and emerging and extend that respect to all traditional custodians on the lands that we're meeting today.

The USB stick that started it all

So my role in the EcoCommons project started with something as ordinary as a USB stick. At the time, it was about eight years ago, I was a PhD candidate in conservation science and I was juggling a collaboration between the government, the university and the department of e-research. One day I needed a data set from my government collaborators but this data set was too big, too inconsistent, too messy and also too sensitive to be shared online. So one of my colleagues copied it into a USB stick, they put it in an envelope and mailed it to me. About three weeks later, it arrived in the post.

I was able to access the data but then had to deliver that same USB stick to the department of e-research. When I delivered it, we had a good laugh. Here we were in the 21st century solving some biodiversity problems using snail mail. But it was also a light bulb moment for me. That little USB stick ended up being a really big turning point. It landed me a new role piloting a platform across all states and territories in Australia and that work eventually led to a five million dollar investment from the Australian Research Data Commons and the Australian government to what's now EcoCommons.

So when I say EcoCommons is about accessibility, it's not just theory, it comes from lived experience, from starting my career as a human courier service for biodiversity data. That USB moment really showed me just how many barriers are in the way. We have messy data, clunky sharing and so much wasted effort.

So when I say EcoCommons is about accessibility, it's not just theory, it comes from lived experience, from starting my career as a human courier service for biodiversity data.

What EcoCommons is

So the question then became, what if we could build something better? What if there was one place where data, tools and compute power could all come together and anyone could use it? And that's what EcoCommons is. EcoCommons is a free, open source, cloud-based platform to remove barriers to ecological modelling. It's built using Python with all our models and algorithms developed on R packages. It brings together trusted biodiversity and environmental data sets, integrates tools for analysis and modelling and makes them available in two key ways.

Firstly as point and click workflows for people who aren't coders and also as R and Python notebooks for those who really love the code. This means a student, a government scientist, a conservation NGO can all access the same data sets, the same models and get reproducible results. This really helps to level the playing field.

Key features of the platform

So let's walk through some of the key features. First, EcoCommons brings together thousands of data sets in one place and they're not just the raw files, they've been curated and standardised so they're ready to use, which alone saves weeks of work. Secondly, we make data wrangling easier. We've built tools to harmonise formats, manage metadata so you don't have to spend your time cleaning data. Thirdly, we've embedded trusted models and algorithms and these are the ones that are really widely used and validated by the scientific community.

For our advanced users, we provide Jupyter notebooks so you can customise workflows or run experiments at scale. And all of this runs in the cloud so you're not just using your own machine. I've seen many a laptop start to crash just by running one species distribution model and in EcoCommons, we can run millions. So finally, everything follows fair data principles. Our data and workflows are findable, accessible, interoperable and reusable and this ensures that results are high quality, transparent and shareable.

Koala habitat modelling demo

Okay, so here's a light-hearted example to showcase the platform. We asked the question, if koalas lived in the United States, would they prefer the east or the west coast? And let's showcase this using a demonstration. Let's dive in with a demonstration. This here is the EcoCommons platform where you can access data sets, view your results and run new analyses. We're going to run our species distribution models using the Biodiversity and Climate Change Virtual Laboratory. This is where you can conduct species distribution models and climate change projections. We're going to use a posit koala example here today.

The next step is to select your species occurrence records. These are simply records where we know koalas already live here in Australia. And we have access to different biodiversity data repositories through APIs so a user can simply search the species and then visualise here on the map. The next step is to select absence data. Absence data simply is where we know the koalas don't occur and we run this in the back end of the system using different methods depending on user needs and this really just helps improve the model.

The next step is to select the climate and environmental data. EcoCommons has access to thousands of data sets all in one place and today we're going to use a global climate data set. Here we select the different layers within the data set that we want to use in our model and we're selecting a few that we know are important for koalas. You can visualise this here on the map. We can see Australia and we can zoom out to see what it looks like across the globe. The next step is to select the study extent. We're going to use a global extent for this model today.

Next we want to select our algorithms. We have access to around 20 different algorithms that are all built on R packages. A user has the option to change all of the settings they need as if they were actually running the R package themselves. The next step we hit start experiment. We go grab a coffee and wait for our results. Now we can see our job submitted and then we can view our results here.

So let's see what it looks like for koalas in Australia. We can see a really nice distribution. The darker the colour, which in this case is the orange, is the more likely the species has habitat in that area. So let's see what it looks like for the States. If we zoom out, we can see that there are some really interesting results here for the United States. Let's dive in a little deeper.

And here are the results. The model suggests that if eucalyptus trees and the appropriate habitat was available, koalas could potentially thrive in some interesting places in the United States. We can see distribution around Seattle. We can see distribution near Austin in Texas and across the Southeast, particularly Georgia and Florida.

Of course, we're not seriously considering exporting koalas to the US. So while it's really fun to see where koalas might like to live in the United States, this example demonstrates something much bigger. The ability to ask complex ecological questions and get clear, reproducible answers. And that's really important because in the real world, we're facing urgent biodiversity challenges, and those same tools are being used to guide conservation, land management, biodiversity, and climate planning.

Why this work matters

Which brings me to my next point, why this work really matters. Globally, biodiversity is unfortunately declining at a pace that we've never seen before. Here in Australia, we have one of the highest rates of mammal extinction in the world. We have climate change and we have invasive species and habitat loss reshaping ecosystems faster than we can keep up. To respond effectively, we need trusted models and data. We need to know where species are now, how they're likely to shift in the future, and what management actions will make the biggest difference.

And that's the role that EcoCommons plays, turning complex data into the evidence and data-driven decisions that we need. But getting to this point really wasn't that simple. Before EcoCommons, the barriers made this kind of modelling almost impossible.

Data was scattered across institutions and often in inconsistent formats. Sometimes only accessible if you knew the right person. Running models required high-performance compute, and many ecologists simply don't have access to that, so work becomes stalled. Even if you had compute, you needed to have advanced coding skills, which can exclude some really brilliant practitioners who just hadn't been trained in programming. And on top of that, you need modelling expertise. And even then, the workflows are often black boxes. Two people could run what looked to be the same model and end up with really different results. So all of this meant that biodiversity modelling could be slow, fragmented, and often inaccessible.

How EcoCommons breaks down barriers

EcoCommons solves this first barrier with accessible data. We integrate trusted biodiversity data and environmental data sets into the platform. They're standardised and curated, so they're ready to use. No more USBs, no more hunting down files across multiple institutions. This makes modelling faster, but also makes it fairer. Everyone works from the same reliable data foundation.

EcoCommons runs in the cloud, so a lot of that heavy lifting happens behind the scenes. You don't need to own or rent a supercomputer. Whether you're running a small student project with a few models, or a government project running millions of models, the platform scales to fit. This opens the doors for non-government organisations, non-for-profits, smaller research groups, and anyone who doesn't have expensive infrastructure.

And skills are another big barrier. EcoCommons provides point-and-click workflows for people new to modelling. They can select data sets, choose algorithms, and run analyses through an intuitive interface. At the same time, our coders can integrate R and Python directly, so they're not limited by the interface. And we do this through the Jupyter Notebook library. We also run a lot of training and provide documentation, helping people build their confidence and skills over time. And the result here is instead of locking people out of modelling, we're really helping to bring them in.

And finally, methods. Too often, ecological modelling, or modelling as a whole, has been inconsistent or opaque. You don't always know what steps have been taken, and you can't always reproduce someone else's results. So, we've made these workflows open, transparent, and reproducible. Every step is documented, and the methods can be shared and modifiable. This transparency builds trust and also helps accelerate science, because the results can be repeated, compared, and improved.

So, what does all this add up to? EcoCommons is helping conservation groups plan on-ground actions, it's helping governments assess risk and make policy decisions, it's supporting NGOs and researchers who want to test new ideas and scale up models. And beyond individual projects, it's building a community of practice. People aren't just using EcoCommons in isolation, they're sharing their workflows, contributing data sets, and really learning from one another.

Real-world impact: the bristlebird

So, I want to ground this impact into a real story, and let's use the northern eastern bristlebird as an example. This bird has less than 40 individuals left in the wild here in Australia, and they hide in really dense patches across our southeast Queensland and northeast New South Wales. EcoCommons is helping to model the habitat with BirdLife Australia, to highlight likely sites for This has helped their field teams to focus surveys and deploy eco-acoustic recorders efficiently, saving time and resources. And already, it's really exciting results. We have new promising sites, and we are directly informing monitoring and recovery strategies, showing that what EcoCommons is really about, practical and targeted impact for conservation.

This bird has less than 40 individuals left in the wild here in Australia, and they hide in really dense patches across our southeast Queensland and northeast New South Wales. This has helped their field teams to focus surveys and deploy eco-acoustic recorders efficiently, saving time and resources.

Lessons learned and community

We've really learned a lot in building EcoCommons. Scaling our workflows to run millions of models was definitely technically challenging. We had to rethink our performance, our reproducibility, and how to handle large data sets. And we're consistently improving on this as well. We've had to balance the needs of really different user groups, so we've had advanced coders who want to have full flexibility, and newcomers to the industry who just want something intuitive. And we've really learned the importance of community feedback. Many of our improvements come directly from what users are telling us what worked, what didn't, and what they needed next.

And EcoCommons really isn't just a platform, it's a community of practice. We're building this together with researchers, government agencies, NGOs, and students, which means sharing tools, workflows, and providing training and support, and really learning from each other's experiences. The strength of a platform like this comes not only from the technology, which is great, but really what's important is the people. This community approach is what makes our platform sustainable and impactful in the long term.

So, let's come back to that USB in the post. We've come a really long way since mailing data around. And EcoCommons is about breaking down these barriers, so data, tools, and conservation impact are accessible to everyone. I'd really like to invite you to try it out, contribute data sets or workflows, and join our community. Because when accessibility is built in from the beginning, we can move faster, and we can do better science for the planet.

If you're interested in learning a bit more about EcoCommons, please check out one of our papers that explains our methods, and also has some really nice case studies that you can look at.

And of course, building a platform like this is not possible without an incredible community of partners and collaborators. So, a moment just to shout out the help from all these people.

And thank you again for having me here today. Again, I wish I could be there in person with you, but I'll be online to answer questions. And I encourage you to sign up for our newsletter or check out our GitHub page and stay in touch. Thanks so much.

Q&A

Can you hear us okay? Good. It's a brisk 3 a.m. here in Australia today. I was gonna say, happy good morning. Right. So, we have time for a question or two. Here's one. How much time and effort did it take to build this system and to get all the data?

Yeah, that's a great question. It took years. So, we had a first iteration that was up and running for a couple of years. And then we kind of did a full overhaul. And that really took three years. And to be honest, every day, we're still, yeah, we've got a team of developers and we're working on bugs, new sort of tech stacks, new functionality every day. So, I think, you know, we'll be working on it forever at this point.

Excellent. One more question. When you were doing your analysis, what was the extant in the system? So, you chose the global extant. What does that do?

Yeah. So, that just means it's gonna project your model to everywhere across the globe. And because we used a global climate layer, it means there was data for everywhere. But normally, we would maybe select an extent that's based on known habitat requirements. So, that's what we did. And it's often quite small and targeted based on ecological needs.

All right. Well, thank you for your going to heroic needs to answer our questions. And let's give her one more round of applause.