So the problem is twofold: there is no consistent unit, and there are data other than the currency in the cell. Ideally, what we would have are data like this:

The GRAIN dataset has a column called Proposed Investment. This records the amount of cash paid for land in US Dollars. However they are recorded as text, not as numbers. This means the spreadsheet can’t use these values to do the mathematical operations required to make totals, averages, or sort the numbers from highest to lowest. Further, the data have not been entered in a consistent form. Here are some examples from the dataset:

Where there is no data, a warning sign like #VALUE! will be shown.

Select the range H1 to H417 using the mouse, and paste the formula into that range. You will see that if there is any data in Column G, a new value will be displayed in column H, as below:

Create a new column H called “Projected Investment (US$ millions) to the immediate right of the current column G, Projected Investment. We will use column H as a working column to display the outcomes of our calculations.

We can solve this with a combination of automation and old-fashioned hand correction of data. A part solution using a combination of formulas

That’s some crazy stuff, dude! Explain yourself.

It’s complex but a good exercise in thinking about what data is, and how spreadsheets can process text quite effectively by combining different functions into formulas.

Let’s start with the simplest and most common sort of case from the GRAIN database:

US$77 million

We want to turn this into a number that the spreadsheet can work with:

US$77,000,000

There are two things that we can immediately do: specify the currency as an attribute of all numbers in the column “Projected investment” so we know that all numbers in this column are US$. This removes the need to put the text “US$” in each cell:

77,000,000

As nearly all the entries are over 1 million in size, it’s sensible to specify that all numbers in the column “Projected investment” are in millions. This removes the need to include the trailing zeros – the 000,000 – in the cells. This leaves us with:

77

So, the actual task the formula needs to do is to change “US$77 million” to “77”. We want to remove everything but the number 77, with as little potential for error as possible and in a way that can be applied to as many of the other data in the ‘Projected Investment’ column as possible.

This is where the LEFT function comes in. Look at the value we want to change: US$77 million. Count the characters, including the spaces: there are 13 in total. The LEFT function reads the value, and displays only characters to the left of and including the character number you give it. Here’s how it works on the value “US$77 Million”:

Formula You see Which is… =LEFT(“US$77 million”,2) US The first 2 characters =LEFT(“US$77 million”,3) US$ The first 3 characters =LEFT(“US$77 million”,5) US$77 The first 5 characters

In the above examples we included in the formula the actual text that we wanted to analyse using LEFT. We can specify which cell we want to analyse (this is called cell referencing). For example, in the spreadsheet we might have:

row G Formula in column I Outcome in column I 22 US$77 million =LEFT(G22,5) US$77 23 US$56 million =LEFT(G23,5) US$56 24 US$45 million =LEFT(G24,5) US$45

Building the formula this way enables it to be copied down a column, as the cell numbers will update automatically as the position of the formula changes. We can further improve the formula and remove some of the text that we ask LEFT to analyse. This is where the SUBSTITUTE function is useful. Here’s how it works, then we’ll apply it in combination with the LEFT function:

row G Formula in column I Outcome in column I 22 US$77 million =SUBSTITUTE(G22,”US$”,””) 77 million 23 US$56 million =SUBSTITUTE(G23,”US$”,””) 56 million 24 US$45 million =SUBSTITUTE(G24,”US$”,””) 45 million

The SUBSTITUTE function takes the content of a cell (eg. G22, which has the text US$77 million), looks in it for the specific text you tell it to (in this case “US$”), then substitutes it with what you tell it to (in this case, for “”, which is nothing at all) and prints the result (77 million).

We can combine SUBSTITUTE with LEFT. So, in the below, LEFT does its work on text that has already had characters removed through the SUBSTITUTE function:

row G Formula in column I Outcome in column I 22 US$77 million =LEFT(SUBSTITUTE(G22,”US$”,””),2)) 77 23 US$56 million =LEFT(SUBSTITUTE(G23,”US$”,””),2)) 56 24 US$45 million =LEFT(SUBSTITUTE(G24,”US$”,””),2)) 45

So, we have the numbers we need now but there is a problem. Not all the original numbers recorded in ‘Projected Investment’ are 2 digits long. Here’s what happens if we run this formula on a more varied set of data in the G column:

row G Formula in column I Outcome in column I 22 US$7710 million =LEFT(SUBSTITUTE(G22,”US$”,””),2)) 77 23 US$5.34 million =LEFT(SUBSTITUTE(G23,”US$”,””),2)) 24 US$450 million =LEFT(SUBSTITUTE(G24,”US$”,””),2)) 45

Uh oh! You can clearly see there are mistakes in the outcome column. This is because we have told LEFT to show only 2 characters each time (remember we have removed the “US$” using SUBSTITUTE, so LEFT doesn’t count those). However, to show the correct figure for “US$7710 million” in row 22, LEFT would have to count 4 characters. So how can we give LEFT the correct number of characters?

Look at the values again. They have something else in common: yes, they have a space separating the number from the text “millions”. Its position will vary each time but we can find it tell the LEFT function to show it where the number ends in each case. The SEARCH function can be used to do this. It works by looking through data you give it for a character you specify, and then tells you the position of that character:

row G Formula in column I Outcome in column I 22 US$7710 million =SEARCH(” “,G22) 8 23 US$5.34 million =SEARCH(“ “,G23) 8 24 US$450 million =SEARCH(“ “, G24) 7

So, in row 22, we are looking for a space (“ “) in the text in cell G22 (US$7710 million). Count from the left, that space is in position number 8. We can give this number 8 to the LEFT function:

row G Formula in column I Outcome in column I 22 US$7710 million =LEFT(G22,(SEARCH(“ “,G22))) US$7710 23 US$5.34 million =LEFT(G23,(SEARCH(“ “,G23))) US$5.34 24 US$450 million =LEFT(G24,(SEARCH(“ “,G24))) US$450

Note that these outcomes also include the space after the number. Let’s add the SUBSTITUTE function back into the formula: Wherever there is a cell reference (eg. G22) we can put a SUBSTITUTE function removing the text US$:

row G Formula in column I Outcome in column I 22 US$7710 million =LEFT(SUBSTITUTE(G22,”US$”,””), (SEARCH(“ “,SUBSTITUTE(G22,”US$”,””)))) 7710 23 US$5.34 million =LEFT(SUBSTITUTE(G23,”US$”,””), (SEARCH(“ “,SUBSTITUTE(G23,”US$”,””)))) 5.34 24 US$450 million =LEFT(SUBSTITUTE(G24,”US$”,””), (SEARCH(“ “,SUBSTITUTE(G24,”US$”,””)))) 450

That explanation help?