How to Scrape Free Stuff From Reddit

Created on Jul 24, 2021
Updated on Apr 14, 2022

Finding stuff from the web is fun especially when you scrape data with the command line.

In this tutorial, you’ll use your terminal to get data from Reddit especially free stuff from subreddits like Udemy to get the latest coupons posted by Reddit users. I’ve made it customized so that you can apply it on any other subreddit.

Minimal example

Let’s take it step by step and see first how to scrape Reddit. Do you really need API credentials to get such info? In fact, you just need to know the endpoint you desire and then use a command-line utility to get data from the data like curl or wget .

curl -sA 'udemy subreddit scraper' 'https://www.reddit.com/r/udemy/top.json?t=month'

That line returns JSON data filtered by the top data across the last month from udemy subreddit.

curl is used here with these options:

Getting the titles of subreddit posts

If you want to see the hierarchy of that JSON format, use jq program piped at the end. To get the titles of those top Reddit posts, get the value of the title key which lies inside the parent data which is a leaf of the children list of hashes. children are literally the children of the parent data .

curl -sA 'udemy subreddit scraper' 'https://www.reddit.com/r/udemy/top.json?t=month' | jq '.data.children[].data.title'

Getting free stuff

Let’s know how to filter the title with Free keyword. That assumes that the free stuff exists only if the Free Udemy course is stated explicitly with the keyword free in the title:

$ curl -sA 'udemy subreddit scraper' 'https://www.reddit.com/r/udemy/top.json?t=month' | jq '.data.children[].data | select(.title|test("Free")).title'
"List of 40+ Free & Some Best Selling Discounted Tuesday, June 22, 2021"
"Free Udemy Course - 4 July 2021"

See here, we used select to get filter specific value of the child title piped with test function to deal with regex including the Free word with any character whatsoever after it.

But this still has a problem, we don’t have any result of a title that has the FREE word (capitalized) or any variation of the word. For example, free , FRee , FREe , and FREE don’t get returned with that command. So we can fix that by adding the case insensitive argument "i" in the test function.

$ curl -sA 'udemy subreddit scraper' 'https://www.reddit.com/r/udemy/top.json?t=month' | jq '.data.children[].data | select(.title|test("free"; "i")).title'

"18 FREE Programming courses on Udemy! 3 days only!"
"List of 40+ Free & Some Best Selling Discounted Tuesday, June 22, 2021"
"Free Udemy Course - 4 July 2021"
"[FREE] Video Editing Courses - Adobe Premiere, After Effects, Davinci Resolve, Photoshop"

Now we have more results containing other variations even if there are free keywords and any other variations in that subreddit for our case, we could see it.

Now, let’s customize it and put it in a bash script:

SUBREDDIT="$1"
curl -sA 'subreddit reader' \
'https://www.reddit.com/r/'${SUBREDDIT}'/top.json?t=month' \
| jq '.data.children[].data | select(.title|test("free"; "i")).title'

We now have the subreddit name as an argument. Save that bash script as e.g. top_free.sh and then put the name of the subreddit you want to scrape the free stuff from:

$ chmod u+x top_free.sh # give access to that shell script
$ ./top_free.sh udemy   # replace udemy with whatever subreddit you want

"18 FREE Programming courses on Udemy! 3 days only!"
"List of 40+ Free & Some Best Selling Discounted Tuesday, June 22, 2021"
"Free Udemy Course - 4 July 2021"
"[FREE] Video Editing Courses - Adobe Premiere, After Effects, Davinci Resolve, Photoshop"

More customized command line

Let’s even create more customization in the bash script to the query parameter that you desire instead of ‘free’:

SUBREDDIT="$1"
QUERY="${2:-free}"
curl -sA 'subreddit reader' \
'https://www.reddit.com/r/'${SUBREDDIT}'/top.json?t=month' \
| jq '.data.children[].data | select(.title|test("'"${QUERY}"'"; "i")) | .title'

The parameter expansion ${2:-free} means that the query will be the second parameter. If it is null, the query will be assigned to the word free .

Let’s save this bash script as top_stuff.sh and see titles that have the keyword courses :

$ chmod u+x top_stuff.sh
$ ./top_stuff.sh udemy courses

"18 FREE Programming courses on Udemy! 3 days only!"
"13 Udemy (100% off Coupons) Programming Courses [Limited Time]"
"Udemy 15 (100% off Coupons) Programming Courses [Limited Time]"
"A tendency in coding courses I find really annoying"
"[FREE] Video Editing Courses - Adobe Premiere, After Effects, Davinci Resolve, Photoshop"

Getting both titles and URLs

Now, we have the desired titles. Why don’t we return the URL of each title so that we can click on it and explore that subreddit post and comments?

SUBREDDIT="$1"
QUERY="${2:-free}"
curl -sA 'subreddit reader' \
'https://www.reddit.com/r/'${SUBREDDIT}'/top.json?t=month' \
| jq '.data.children[].data | select(.title|test("'"${QUERY}"'"; "i")) | {title, url} | .[]'

Here, we just added both children title and url using Object Construction in jq documentation.

When we run that command again, we get:

"18 FREE Programming courses on Udemy! 3 days only!"
"https://www.reddit.com/r/Udemy/comments/ogxrrp/18_free_programming_courses_on_udemy_3_days_only/"
"Free Udemy Course - 4 July 2021"
"https://www.reddit.com/r/Udemy/comments/od89ex/free_udemy_course_4_july_2021/"
"Free Unity + AWS DynamoDB Course! 3 days only!"
"https://www.reddit.com/r/Udemy/comments/oq5iog/free_unity_aws_dynamodb_course_3_days_only/"
"[FREE] Video Editing Courses - Adobe Premiere, After Effects, Davinci Resolve, Photoshop"
"https://www.reddit.com/r/Udemy/comments/ohd0at/free_video_editing_courses_adobe_premiere_after/"

Final thoughts

In this tutorial, we’ve seen how to scrape the top monthly free stuff from any subreddit. We’ve retrieved the title of that post and the URL to be able to view that and participate in the community if you want.

We also make it generalized to include any keyword you want to search for in the subreddit you desire.

Please let me know if you have any further questions! or if you want more scraping posts, just comment below!

Enjoy!

You might be interested in this tutorial in which I used jq