Let's Import Free, High-Quality Datasets into your Python IDE (using Positron and PydyTuesday)

Let's download data. Learn how to install a dataset using the pydytuesday package and do some basic visualization of the data using Positron's data viewers. We hope you join us in participating in PydyTuesday! Don't forget to use the hashtags #TidyTuesday and #PydyTuesday wherever you like to hangout online - Bluesky, Mastodon, LinkedIn, etc. - have fun out there! We can't wait to see the predictive models, visualizations, dashboards, and data apps that you create Resources and Repos to star: TidyTuesday GitHub Repo: https://github.com/rfordatascience/ti... Posit PydyTuesday GitHub Repo: https://github.com/posit-dev/python-t... TidyTuesday hashtag search on Bluesky: https://bsky.app/search?q=tidytuesday Other videos in this PydyTuesday playlist: • PydyTuesday | Python How-to Videos #pythoncontent

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

With your Python environment set up and necessary libraries installed, you're now poised to kick off your actual Python project.

Most data science projects begin with data, so in this segment we'll demonstrate how to effortlessly download a dataset using the PydyTuesday Python library.

The datasets we will focus on reside on the PydyTuesday GitHub repository. This repository is an incredible resource for exploratory data science, with a brand new dataset added every week.

For our example, let's grab a dataset posted in May of 2025 that dives into water quality at Sydney beaches.

Downloading the dataset

When you click on the dataset name, you'll see a Python code snippet with precise instructions for using the PydyTuesday library to download it. Let's simply copy this snippet and paste it into a new Python script in Positron , which we'll name WaterQuality.py.

In addition to the PydyTuesday library, we'll also need to import the Pandas library. This will allow us to read our downloaded dataset directly into our Python environment.

One of Positron's great features is the ability to run your Python code line by line. Simply place your cursor on a line and repeatedly press Command-Enter or Control-Enter.

After importing both the PydyTuesday and Pandas libraries, we'll run the code on line 5. This will download the Sydney water quality data and its associated files directly into your project's working directory.

Loading and exploring the data

Now that the data is in our project, let's use the readCSV function from the Pandas library to load it as a data frame and assign it to a variable we'll call WaterQuality. And just like that, you'll see the WaterQuality variable appear in your variables pane.

With Positron, you can inspect this data instantly. Just click the data frame icon, which will bring up the data visualizer. Here you can explore the raw data, view various summary statistics for each column, and get a quick overview of your dataset.

Now that you've successfully read in your WaterQuality data, you are ready to start analyzing. Whether you want to create stunning visualizations, build insightful tables, or try out that new modeling technique you've been meaning to learn, the sky is truly the limit. Just remember to have fun and enjoy every step of your Python learning journey.

the sky is truly the limit. Just remember to have fun and enjoy every step of your Python learning journey.

Featured software#