Python is an extremely powerful language for journalists who want to scrape information from online sources. This series of videos, made for students on the MA in Data Journalism at Birmingham City University, explains some core concepts to get started in Python, how to use Colab notebooks within Google Drive, and introduces some code to get started with scraping.
Video 1: Python lists
The first video introduces lists in Python, why they’re so important to scraping, and the different ways that lists are used, from generating URLs to scrape to storing the data that’s in the webpages you want to scrape.
It also explains the difference between one-stage (where the URLs can be generated) and two-stage scraping (where a list of URLs must be scraped first, before a second scraper is written to scrape those).
Video 2: Using Google Colab to write Python in Google Drive
The second video introduces Google Colab as a way to write Python within Google Drive — and some basic programming concepts such as variables, comments, printing, lists and looping. It also explains how lists play a key role in scraping within data journalism.
Video 3: Learning from an example scraper
The third video walks through a Python scraper, explaining how to read the code, identifying concepts from the first video and learning further programming concepts such as libraries, functions and dictionaries.
The scraper demonstrates how lists and loops are used within a scraper, how CSS selectors are used to ‘target’ particular pieces of information on a webpage, and some useful Python libraries for scraping.
All three videos can be watched in this playlist.
You’ll find related resources and tutorials in the repo here.
This is part of a series of video posts.
Pingback: VIDEO PLAYLIST: An introduction to Python for data journalism and scraping (Online Journalism Blog) | ResearchBuzz: Firehose
Pingback: Menos gente quiere informarse: cómo hacer para que permanezcan - Medianalisis