Today I will be introducing my MA Data Journalism students to SQL (Structured Query Language), a language used widely in data journalism to query databases, datasets and APIs.
I’ll be partly using the mapping tool Carto as a way to get started with SQL, and thought I would share my tutorial here (especially as since its recent redesign the SQL tool is no longer easy to find).
So, here’s how you can get started using SQL in Carto — and where to find that pesky SQL option.
3 ways to use SQL in Carto
There are actually three ways to use SQL queries in Carto that I know of. These are:
- Applying SQL queries within a dataset
- Applying SQL queries in a map
- Using SQL to query your data using the Carto API
I’ll just cover the first two.
Before you begin, download some crime data from Data.police.uk. I’ll be using this for my examples.
Applying SQL queries within a dataset in Carto
Here’s a brief summary of the steps when you want to query a dataset in Carto
- Open Carto
- Upload your dataset
- Open the dataset
- Toggle the switch at the bottom left from ‘Metadata’ to ‘SQL’
- Write your SQL query, and apply it
- Create a map from the query if you want
Start by logging in and go to the ‘datasets’ section (there’s a button in the upper right corner).
Click on New dataset.
Click on Browse and find the data.
Click Connect dataset to upload it.
After a moment you should be able to see the data. There might be some annoying empty columns but those can be ignored. Look out for columns that are in green (numbers) and those that aren’t (text). This will affect the sorts of queries you can make against those columns (you can only sum and average number columns, for example)
Now, while inside your data, look at the bottom of the page for a toggle switch marked ‘METADATA’ and ‘SQL’.
Click on that toggle switch to move it from METADATA to SQL. It should light up green and the bottom half of the screen should now show a dark area where you can type a SQL query:
In fact, a SQL query will already be there in that box. It will look something like this:
SELECT * FROM table_2017_09_west_midlands_street_1
This means select all columns from the table with that name (this dataset).
You can now adapt this query to query your data in different ways. Notice that as you start typing a column name it will bring up a menu to make it easier for you to select the right name. Here is one example of a query which generates a pivot table of crime totals by type:
select crime_type, count(*) from table_2017_09_west_midlands_street_1 group by crime_type order by crime_type desc
That should give you an idea of what types there are. You can choose one of those to write a new query which filters the data to only show one type:
SELECT * FROM table_2017_09_west_midlands_street_1 WHERE crime_type = 'Anti-social behaviour'
Click CREATE MAP to see just those crimes shown.
Applying SQL queries within a Carto map
If you’ve already created a map and want to create a SQL query from within there, you just need to go into the data within the map.
First go into your Carto maps view and open the map. Once opened, the map will occupy the right two thirds of the screen, but the left third of the screen should contain a list of all the layers of that map (you may have only one).
Click on the layer containing the data you want to query.
That left area should now change to show ‘STYLE’ – one of 5 tabs including ‘DATA’, ‘ANALYSIS’, ‘POP-UP’ and ‘LEGEND’.
Click on the ‘DATA’ tab.
At the bottom of this left hand area you should now be able to see a toggle switch with VALUES and SQL. Click on that to toggle it to SQL.
Once again, when toggled it will turn green and that area of the screen will turn dark. A ready-made SQL query will also be in this window that you can start to adapt:
Change that query to:
select * from table_2017_09_west_midlands_street_1 where crime_type = 'Anti-social behaviour'
And you will only see the crimes shown that match the query.
Click APPLY to apply it.
Now you can leave the data window and the query should still be applied. Note that when you are looking at your map again you should now see a ‘SQL’ box on the layer where you queried it.
That’s it. You can always change your query by going back into the layer, and then into the DATA view. When you’re happy with it, publish.
Using SQL queries elsewhere
Carto is just a good place to start making SQL queries — there are dozens of other contexts where you might use them too. Here are some places to go next:
- Franchise is a new browser app which allows you to run SQL queries against databases within your browser. But it also has a number of sample datasets which you can practise on — including map and chart views of the results
- Data.world is a data hosting and sharing service which allows you to run — and share — SQL queries on your own data. They have published a Gitbook on SQL which is an excellent reference
- Google Sheets has a special `QUERY` function which allows you to run a SQL query on a specified table. You can find a chapter on this in my book Finding Stories in Spreadsheets
- The Firefox extension SQLite Manager allows you to set up a SQLite database and query it with SQL in your browser. Note that it now has to be used on the ESR release of Firefox
- If you use R the package `sqldf` allows you to query data using SQL
- If you’re learning Python, PythonAnywhere allows you to use SQL queries too
- Quickcode (formerly Scraperwiki) is a tool which allows you to write scrapers in a number of languages. It also allows you to query those scrapers using SQL.
- The Carto API allows you to query your own datasets on the platform using SQL, and get data in JSON format.