In order to do this Marion had downloaded details on the candidates who had stood successfully in the previous election.
“It was a very young intake. But it wasn’t easy to calculate their ages.”
Indeed. You would think that calculating ages in Excel would be easy. But there is no off-the-shelf function to help you do so. Or at least, no easy-to-find function.
Instead there are a range of different approaches: some of them particularly, and unnecessarily complicated.
In this extract from Finding Stories in Spreadsheets I will outline one approach to calculating ages, which also illustrates a useful technique in using spreadsheets in stories: the ability to break down a problem into different parts.
I’ll then show some other solutions to the same problem, and when those might be more appropriate.
At the end there’s an exercise for you to use these techniques yourself to calculate the ages of inmates at Guantanamo Bay.
Breaking down the problem
If your spreadsheet problem isn’t solvable with a single function, the best approach is often to break it down into smaller problems which, individually, can.
That is the case here.
For example: we can calculate an age by subtracting the year of a person’s date of birth (DOB) from the year of another date (today, for example, or the election day).
But this approach has room for a small error: it is only correct if the person’s date of birth comes before or on the date we are calculating against. If the person’s date of birth comes after the date we are using, the result will be an age that is one year older than they are at the moment.
For example: if we take the year 2014 and subtract 1974 to find out the age of someone born that year, we will get the result 40. But if they were born any later than today’s date they are still 39.
Once you’ve identified a possible error like this, don’t write it off immediately. You may be able to add an ‘error checking’ formula to correct it – and that is the case here.
Ultimately the key question to ask is: does this formula get me any closer to the result I need?
In this case the answer is ‘Yes’: a simple subtraction formula of ‘this year minus the year of the person’s date of birth’ will get us either the correct result, or one which is only one year out.
All we need is a way of identifying which result it is, and correcting the initial age accordingly.
Calculating the years
First, however, we need to calculate the two years that we need to work with: the year of someone’s date of birth; and the year of the date we want to calculate their age for (that might be today, or it might be a key date such as the day they were voted into power).
To calculate the year of the date of birth you can use the
YEAR function by typing it into any cell in your spreadsheet like so:
The formula above takes any date in A2 and gives you a number representing just the year in that date (2014, for example).
|If you get a result which is not a four digit number but yet another date then don’t panic: the result is not wrong: it’s just the way the results are formatted.Remember that all dates are stored as numbers. The number ‘2014’, for example, is the way that Excel stores the date 06/07/1905 (2014 days since the beginning of the 20th century).When that number is formatted as a date, you won’t see the number 2014 but instead the date 06/07/1905.
To fix this, change the formatting. Right-click on the cell concerned and select Format cells. Then change the formatting from date to Number or General. You may also need to reduce the decimal places to avoid getting `2014.00`.
To calculate the difference between two years – that is, the number of years between someone’s date of birth and another date – simply subtract the result of one YEAR formula from another YEAR formula like so:
Remember that the first
YEAR function in that calculation should refer to the more recent date; you want to subtract the earlier year (a smaller number) from that.
Let’s say, for example, that in column B we have the date of an election, repeated all the way down. And in column A we have each candidate’s date of birth.
The formula above – when also repeated down a column – grabs the year of the election date and subtracts from it the year of each candidate’s date of birth.
Given that the election date is always the same, another way of doing this would be to avoid using a cell reference for the first date and enter it directly into the formula so:
Or even simpler:
The result should be a column of ages that are at worst a year out.
Now to correct those.
Checking whether a birthday comes before or after a particular date
We’ve already described the factor that will separate the accurate results from those which are one year out: the individual’s birthday will come after the date we are using as the basis of our subtraction.
We could use the TEXT function to identify that birthday, by just grabbing the day and month of any date. Here’s how you would do that for a DOB in cell A2:
The two Ds and two Ms in
"ddmm" indicate that we want two character codes for the day (D) and month (M).
The two digit code for the first day of any month would be 01; for the second day 02; and so on. The same two digit codes apply to each month too.
The results will range from 0101 (the first day of January) to 3112 (the 31st day of December).
However, this will be treated as a whole number – so 0107 (the first of July) will be treated as smaller than 0201 (the second of January).
Instead, then, we need to use the US date formatting approach of month-day:
The results will now range from 0101 (the first day of January) to 1231 (the 31st day of December).
When done this way we can be sure that all dates in February (beginning 02) will be greater than dates in January (beginning 01), and so on.
To check if one date is later than another we can just use the ‘greater than’ operator > like so:
This asks whether the result of the first TEXT formula (someone’s birthday) is greater than the result of a second TEXT formula (the day of an election).
There are only two results: TRUE (yes, it is greater) or FALSE (no, it is not).
If it is greater it means that that person’s birthday came after the election, and their age should be adjusted accordingly.
If it is not later, the age does not need adjusting.
Making an adjustment based on the results
We could use an
IF function to adjust the age based on the results of these calculations…
…but in this case we don’t need it.
This is because, helpfully, TRUE and FALSE are treated as 1 and 0 by Excel: something we can use to our advantage in our formula.
We’ll make this simpler by putting our initial age guess in column C. Our table so far, then, has these columns:
A: The person’s DOB
B: The election date
C: A calculation to get an age which is either accurate or a year out: =YEAR(B2)-YEAR(A2) for row 2, and so on.
Now in D we can enter the following to correct that age in C:
The formula starts with the age from C2 and subtracts something from it.
What it subtracts is the result of a formula, in parentheses: a TRUE or FALSE result to the question “Is A2’s month-day greater than B2’s month-day value?”)
A TRUE result is the same as 1, so the result in those cases is
C2-1. In other words the age is reduced by 1 because the person hadn’t had their birthday.
A FALSE result is the same as 0, so the result is
C2-0: in other words, the age is correct and does not need adjusting.
You can cut out the need for column C by putting that directly into the formula like so:
And you can cut out the need for column B by putting the year and the month-day value directly into the formula too:
Note the two places where we’ve altered the formula: firstly at the start, with
YEAR(B2); and secondly in the other point of the formula where B2 was mentioned:
TEXT(B2,"mmdd"). In this case we don’t need the
TEXT function to tell us the month-day value because we know it will always be
"0505" (I’ve shown you the other approach in case your dates are not always the same).
TODAY to calculate an age against today’s date
The same formula can be adjusted to apply to today’s date using the
This function does exactly what you might expect: returns today’s date. For that reason it has no ingredients at all: it is typed like this:
In our case we can either use the formula
=TODAY() in all the cells in column B, or put it directly into the formula where any references to that column were used, like so:
…But by this point you’d be forgiven for having lost track of how you got here.
The problem with long formulae like this – and the equivalent for an election date above – is that when you come back to the spreadsheet later it may not make much sense.
Worse, if anyone else has to use the spreadsheet they are likely to struggle even more to understand what is being calculated.
For those reasons it is often better to have columns with each stage of the calculation clearly labelled, rather than combine all the calculations into one complex, albeit impressive formula.
Making it easy to understand: breaking the formula back up
So let’s take a moment to break those stages down:
- We begin with a column of dates of birth, called ‘DATE OF BIRTH’
- Then we create a second column to hold the date(s) we’re using as the start of our calculation: either today’s date, in which case each cell will hold the formula
=TODAY(), or a date in the past such as an election day. That date will be copied down the whole column. This will be called ‘TODAY’S DATE’ or ‘ELECTION DATE’. So far, it’s going to be pretty easy for any colleagues – or our future self – to understand this spreadsheet.
- Next we create a column to calculate the year of the later date minus the year of the date of birth. We call this ‘AGE MAYBE ONE YEAR OUT’ or ‘AGE APPROX’ or something similar.
- In the fourth column we write a formula to check if the age is a year out: specifically if the month-day value of the date of birth is greater than the month-day value of the date we’re subtracting from. When copied down the column should be filled with results that are either
FALSE. That column is called ‘AGE ONE YEAR OUT?’
- Finally we create a column that corrects that age where the value in the fourth column is
TRUE(the month and day or birth is later than the month and day of the date we’re subtracting from). We title that column ‘CORRECTED AGE’.
How many columns you have depends on how clear you want the spreadsheet to be to yourself and others. For example it may be worth combining the last two into one column simply called ‘CORRECTING AGE’.
But the point remains: there’s no point combining multiple calculations into one powerful formula if the formula isn’t going to make sense to you later, or if your data is going to be used by colleagues or people who come after you.
Other ways of calculating ages: the unsupported DATEDIF function
There are many different ways to arrive at the same results. Ian Balboa, for example, notes that Excel used to have a function for calculating the difference between two dates:
DATEDIF. But “The bad news is that for some reason Microsoft no longer supports this function in Excel.”
What this means is that if you begin typing
=DATEDIF in a modern version of Excel, you will notice that the software does not seem to recognise it in the way that it does with other functions: it doesn’t attempt to finish the word ‘DATEDIF’ when you begin typing it; and it doesn’t bring up a tooltip hinting at the parameters that the function needs, as it does with other functions.
DATEDIF function will work in Excel – but it’s not supported: the software won’t suggest it, or help you write it very much unless you go as far as to type the function and open the parenthesis, and then click on the link that appears in the tooltip. And that’s only in some versions of Excel.
This makes it a difficult function to use – because it does have some very specific ingredients.
DATEDIF needs three ingredients:
- A start date
- An end date
- And the units you want to count: for example the difference in years, months or days. These are indicated by specific codes, detailed below.
The first two ingredients are relatively straightforward: the only thing you need to remember is to place dates in quotation marks, e.g.
"05/05/2005" – otherwise it will treat the date as a calculation: i.e. 5 divided by 5 divided by 2005.
You can include dates formatted as numbers – for example, 38477 for the fifth of May 2005 – but the extra effort involved in finding out the numerical equivalent for any date makes that pretty pointless. Just stick some quotation marks around it!
And of course you can use cell references instead to point at a cell containing a date – as long as Excel is treating that cell as a date and not as a string of text (text will be aligned left; numbers and dates aligned right).
The final ingredient – the units you want to count – is expressed as a one- or two-letter code in quotation marks. The three single character codes simply measure whole years, months or days as follows…
"Y"will return the number of whole years between the start and end dates. This addresses the problem of birth dates that occur later than the end date.
"M"will return the number of whole months between the start and end dates.
"D"will return the number of days between both dates.
…but the three double-character codes essentially calculate remainders left over once the whole years or months are counted. The key in each is the second character:
"MD"will return the number of days between the start and end date after whole years and whole months are counted. In other words, if someone is 12 years, five months and three days old, this code will return
3(the number of days left). The maximum possible result, then, is 30 (31 days being a whole month).
"YM"will return the number of whole months between the start and end date after subtracting whole years. So again, if there are 12 years, five months and three days between two dates, the result will be
5(the number of whole months left after years are excluded). The maximum possible result, then, is 11 (12 months being a whole year).
"YD"is a little different: it assumes your two dates are less than a year apart, and counts the number of days between them. The maximum possible result here is 364. This is very similar to using
"D"to simply calculate the difference between two dates – the one difference is that limit: two dates which are more than a year apart will still return the result
As I say, because Excel doesn’t support this function any more you will need to have these codes to hand when you use it.
The #NUM! error
If you get a #NUM! error when using the DATEDIF function it is most likely because the start date is later than the end date. The function doesn’t expect this, and cannot give you a negative result.
You can combine the two sets of codes to calculate how old someone is in both years and months. For example, the following formula will calculate how old someone was on election day in years, based on a date of birth in cell A2, and an election date in cell B2:
In the next cell you can use a similar formula to tell you how many months old they were in addition to those years:
The result of both gives you the full picture:
DATEDIF in Google Sheets
DATEDIF function is much better supported in Google’s own spreadsheets tool, Google Sheets. Not only does Google Sheets suggest
DATEDIF if you begin typing ‘DATE’ but it also has a comprehensive ‘hint’ box that appears once you open the parenthesis, including a link to even more information.
Watching out for leap years in other calculations
Yet another method for calculating ages comes from journalism professor Steve Doig, who uses the following formula:
“The first part of the formula calculates the number of days between the two dates, which is then divided by 365,25 (the 0,25 accounts for leap years) to produce the years.
The leap year point is really important: remember that Excel stores dates as the number of days since 1900 – so years with 366 days will have those extra days added into the total. This is another example of identifying possible problems in calculations and building in corrections.
This is also the first appearance of the function
DATE, which bears some explanation.
DATE function converts any date into a number. This might seem a bit pointless given that Excel does this anyway: type 05/05/2005 into Excel and it will assume you mean May 5th 2005 and store it accordingly, as the number 37015 (although it will continue to format it as
Indeed, you could rewrite the example formula above to strip out the
DATE function entirely, so that it looked like so (with UK date formatting):
However, dates do not always play nicely. The
DATE function is a useful way of making sure that your date is understood correctly by Excel.
DATE function takes three ingredients: a year, a month, and a day of the month – in that order. If you have a date (or dates) which has been formatted unusually, this ensures that they are interpreted correctly.
But in order to do so, you’ll need to be able to extract the different parts of any given ‘date’ – and that’s the subject of the next chapter on
- Ages can be particularly interesting angles to explore in any dataset: the oldest, youngest, or general demographic makeup of people in a dataset (younger, older) can all be newsworthy.
There are a number of different ways of calculating an age, but none are straightforward.
You can subtract one date’s year from another date’s year – but some results could be one year older than the person’s age if their birthday occurred later in the year than the date you are subtracting from.
Using a function for error correction helps you identify the results you need to change.
In this case, we can extract the month-day value of a date of birth and test if it is greater than the month-day value of the more recent date, using the
"mmdd"as our second ingredient.
FALSEresult (known as a Boolean) is also equal to 1 or 0. So
FALSEcan be used as 1 and 0 in any calculations – quite handy if you need to subtract one from a number when something is TRUE.
DATEDIFfunction allows you to calculate the difference between any two dates in terms of days, whole months or whole years – so you can use it to calculate ages.
DATEDIFfunction is not supported in modern versions of Excel, so you have to know which ingredients are used, and in what order.
DATEDIFfunction is, however, supported in Google Sheets, where you can find help on using it.
The three ingredients of the
DATEDIFfunction are, in order: the earlier date, the later date, and the units you want to count in: years (
"Y"), months (
"M") or days (
"D"), or ‘remainder’ months or days once whole years or months have been counted.
If you get a
#NUM!error when using the
DATEDIFfunction it is most likely because the start date is later than the end date.
Another alternative is to count the number of days between a date of birth and the more recent date, and divide that by 365.25 to get an age including decimal places.
.25in that calculation is to account for leap years where there are 366 days.
If you want to calculate an age based on today’s date, and ensure that the date is correct whenever we open the spreadsheet, you can use the
TODAYfunction to generate today’s date.
NOWfunction does exactly the same, but also adds the current time.
NOWfunctions have no parameters: so they use empty parentheses like so:
Find the story: how old are Guantanamo prisoners?
The spreadsheet was found on the site for the Center for the Study of Human Rights in the Americas by using the advanced search
"Date of Birth" Guantanamo filetype:xls.
It has six fields (columns): a simple index number, most likely added by the creator of the sheet; a date of birth; a name; an ‘ISN’ code (you’d have to find out what this means); citizenship details; and their place of birth.
- Firstly, check that the date of birth is being treated as a number by Excel. How would you do this?
- Insert a new column between current columns B and C which gives us the age of each individual based on their date of birth. What formula will you use? Do you need to use more than one column?
- What sort of error checking might you need – whether in the sheet itself or through follow up phonecalls?
- Now that you have their ages what stories might you tell about those?