Coins Expenditure Database Published by Government – Open Data

This looks like an excellent start. The Coalition Government has just published the COINS database, which is the detailed database of Government spending:

The release of COINS data is just the first step in the Government’s commitment to data transparency on Government spending.

You can get the database from the website here. There are explanations to help you get to grips with it here.

Tim Almond notes (via chat) that it is a 68mb zipped file which extracts to 4GB, i.e., huge. It will require significant database tools to get to grips with this, but I’m predicting that easier ways of querying may be created by someone in 48 hours. Here is the full statement:

COINS: publishing data from the database

The data is available from (opens in new window) but the following guidance explains more about the release.

What is COINS?

COINS – the Combined On-line Information System – is used by the Treasury to collect financial data from across the public sector to support fiscal management, the production of Parliamentary Supply Estimates and public expenditure statistics, the preparation of Whole of Government Accounts (WGA) and to meet data requirements of the Office for National Statistics (ONS).

Up to nine years of data can be actively maintained – five historic (or outturn) years, the current year and up to three future (or plan) years depending on the timing of the latest spending review. COINS is a consolidation system rather than an accounts application, and so it does not hold details of individual financial transactions by departments.

Why are you doing this?

The coalition agreement made clear that this Government believes in removing the cloak of secrecy from government and throwing open the doors of public bodies, enabling the public to hold politicians and public bodies to account. Nowhere is this truer than in being transparent about the way in which the Government spends your money. The release of COINS data is just the first step in the Government’s commitment to data transparency on Government spending.

As the Prime Minister has made clear, by November, all new items of central government spending over £25,000 will be published online and by January of next year, all new items of local government spending over £500 will be published on a council-by-council basis.

Who might find the data useful?

COINS contains millions of rows of data; as a consequence the files are large and the data held within the files complex. Using these download files will require some degree of technical competence and expertise in handling and manipulating large volumes of data. It is likely that these data will be most easily used by organisations that have the relevant expertise, rather than individuals. Having access to this data, institutions and experts will be able to process it and present it in a way that is more accessible to the general public. In addition, subsets of data from the COINS database will also be made available in more accessible formats by August 2010.

Downloading the data

The COINS data are provided in two files for each financial year; the ‘fact table’ (fact table extract 200x xx.txt) and the ‘adjustment table’ (adjustment table extract 200x xx .txt). The contents of these two files are explained in ‘What is COINS data?’.

The ‘fact tables’ are approximately 70MB. With a fast broadband link of 8mbps, it will take approximately 10 minutes to download this file. The ‘adjustment tables’ are approximately 40MB, and this will take approximately 5 minutes to download. Both these files have been compressed using ZIP archival. When unzipped the file sizes will decompress and expand significantly to sizes of around 5GB and 0.5GB respectively.

The data are provided in a txt file format. The structure of the data is similar to a csv (comma separated variable) file with a string of characters being formed to represent each row, with each field separated by an ‘@’. We estimate that there are around 3.5 million rows in the ‘fact tables’, and around 500,000 rows in the ‘adjustment tables’. While the contents of the latter can be loaded into Excel 2007, the former is too large for the Excel software. In order to read the data, they will need to be uploaded into appropriate database software.

We need some crowd-sourced hackery here to prevent everyone reinventing the wheel. If you want to be in on the start of this conversation, try chatting to Tim on twitter.

The next step will need to be a regular and reliable process to allow meaningful continuing analysis.

