I am just picking up Excel, but have experience with R and Stata. Does Excel have a missing symbol convention? That is, does Excel have something like R's NA or Stata's .?
For example, I evaluate an IF() statement and want to return a value that will be omitted from later calculations I use "NA". Is this the correct approach in Excel? It seems like I am missing a basic concept, but I can't get Google to give me a better answer.
To make this more concrete, I have a conditional like =IF([@[Div Dummy]]=1,"NA",EOMONTH(A8,0)). Is there a way to get a numeric "missing value" placeholder so that I don't get warnings about conflicted data types?
3 Answers
You're almost there. You can use NA() in your formulas to return that "error" value. It's very useful when charting because it's not charted (unlike 0).
4You can leave cells empty and use (if I remember correctly) =isblank() to do the conditional. And it's worth noting that many numerical functions like =sum() and =product() will ignore empty cells, ala sum(x, na.rm=TRUE) in R.
But no, this is one of dozens of reasons why Excel is not a serious tool for statistical data. (Although it's not terrible for simple simulation and optimization problems.)
2I find array formulas in Excel very useful for solving a lot of these kinds of problems. If you don't know array formulas yet, here's a example:
Data Col A Col B Col C
Row 1: Fruit Number Color
Row 2: Apple 5 Green
Row 3: (leave A3 blank) 10 Yellow
Row 4: Peach 6 (leave C4 blank)
Row 5: Grape 6 PurpleThen, enter this in another cell, somewhere:
=AVERAGE(IF((A2:A5<>"")*(C2:C5<>""),B2:B5))To make it an array, you MUST use Ctrl+Shift+Enter. Your formula will look like this:
{=AVERAGE(IF((A2:A5<>"")*(C2:C5<>""),B2:B5))}but do not type the curly braces; they will appear automatically when you use Ctrl+Shift+Enter.
In that formula "" stands for missing, <> means not equal to, and the * means AND. So, that formula says, If A2:A5 is not missing and C2:C5 is not missing, then return the average of B2:B5. In this case, the result is 5 (average of 4 and 6). 5 and 10 are excluded because values are missing in A3 and C4.
Now try this:
=AVERAGE(IF((A2:A5<>"")+(C2:C5<>""),B2:B5))Enter as an array.
The only difference is + instead of *. + means OR. So, in this case the answer is 6.25, because all of the values in B have a non-missing value in A or C.
AVERAGE can be replaced by a number of other functions such as MAX, MIN, SUM, COUNT, and you can put conditionals in front of the array: e.g.,
=IF(B2>3,AVERAGE(IF((A2:A5<>"")+(C2:C5<>""),B2:B5)),"N/A")Enter as an array formula. So, if some value gives permission, then run the array, otherwise return "N/A". The permission value can be anywhere else on your spreadsheet and does not have to be contained in an array.
You can also filter by values within the target column: for example:
=AVERAGE(IF(((A2:A5<>"")+(C2:C5<>""))*(B2:B5>4),B2:B5))so, if (A2:A5 is not missing OR C2:C5 is not missing) AND B2:B5 is greater than 4, the answer is 7 because it will take the average of 5, 6, and 10.
You can also do this...
=AVERAGE(IF((A2:A5="Apple")+(A2:A5="Pear"),B2:B5))(note that "Apple" and "Pear" can be replaced with cell references). The answer is 4.5 because it will average 4 and 5.
or this: First enter =10/0 in B3. You will get a divide by zero error that will replace the value 10.
=SUM(IF((ISNUMBER(B2:B5)),B2:B5))Enter as array.
So, this looks across the array B2:B5 and will only include numbers. Since #DIV/0 is not a number, it will sum 4, 5, 6 for an answer of 15.
I'm sure there are limits somewhere on the number of conditionals within an array but I've never run into a situation where I ran out. As well, this is such a flexible tool, only really limited by your creativity. Lot's of interesting things you can do with embedded MAX and MIN functions within an array formula :)
Be aware though that like other kinds of formulas in Excel, array formulas use resources and can slow your spreadsheet.
I find these particularly useful for creating summary reporting in tables that can be highly customized. Because these formulas can be dragged, if you are careful with your absolute references ($), you can very quickly create a large number of these.