1 Attachment(s)
SUMPRODUCT Step By Step - Part 3
3 More On Conditions
3.1 Overview
In the previous chapters we have explored the basics of SUMPRODUCT, and learnt how to use it to count with any number of conditions, over single or multiple ranges.
It doesn’t take much effort to extend the counting function into summing, and that will be covered in this chapter, as well as an introduction into using the double unary, and to OR conditions.
In this chapter, all of the examples will be working off of the data in Figure 5 - Comms Data table.
http://img133.imageshack.us/img133/3339/fig31.png
Figure 5 - Comms Data table
3.2 Summing With Multiple Conditions
We will start with a simple multi-condition sum, to calculate the total number of sales in the first quarter of the year 2009. This is multi-condition as it requires testing the date to be between 1st Jan and 31st March.
As we saw in the previous chapter, it is straightforward to use SUMPRODUCT to count the number of entries between these two dates with
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1))
To extend this into a sum formula, it is just necessary to add the range to be summed, the value range.
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1))*(D2:D20)))
Breaking this formula down we have:
- ((B2:B20>=DATE(2009,1,1)) - this tests the date range to be greater than or equal to 1st Jan 2009
- ((B2:B20<DATE(2009,4,1)) - this tests the date range to be less than 1st Apr 2009
- Using the * operator on these two conditions creates an AND condition that gives all of the items within the required date range, the rows where the date is greater than or equal to 1st Jan 2009 AND less than 1st Apr 2009, or within the first quarter of 2009
- (D2:D20) – the range to be summed is presented as a simple range.
The * operator is used here not to force an AND condition, but to simply multiply the amounts by the results of the multiple conditional tests, which is then summed.
3.3 Using The Double Unary
We have previously used the * operator as a way of performing AND operations, condition 1 AND condition 2. It is not that * acts as an AND, it is because each conditional test returns an array of TRUE or FALSE values, and multiplying these TRUE/FALSE arrays gives a result array of 1s and 0s. Using the formula
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1))
The first condition (B2:B20>=DATE(2009,1,1) returns an array
{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE ;TRUE;TRUE;TRUE;TRUE}
The second condition (B2:B20<DATE(2009,4,1) returns an array
{FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;FA LSE;FALSE;TRUE;TRUE;TRUE}
The * operator multiplies these two arrays together,
{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE ;TRUE;TRUE;TRUE;TRUE}*
{FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;FA LSE;FALSE;TRUE;TRUE;TRUE}
giving a resultant array of {0;1;1;1;1;1;1;1;1;0;1;1;1;1;0;0;1;1;1}. This array is the summed to give a result of 15.
If we extend this to including the values in column B, the formula
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1))*(D2:D20))
gives a result of 1738, because the intermediate arrays are
{TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE ;TRUE;TRUE;TRUE;TRUE}*
{FALSE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;TRUE;TRUE;FA LSE;FALSE;TRUE;TRUE;TRUE}*
{700;75;125;500;100;75;250;88;25;18;22;333;33;20;15;40;17;25;50}
giving a resultant array of {0;75;125;500;100;75;250;88;25;0;22;333;33;20;0;0;17;25;50}.
This array is then summed to give a result of 1738.
In the last formula, the values are not involved in a conditional test, just multiplied as is.
The important thing to realize here is that * operator is utilised here to coerce the TRUE/FALSE values to 1 or 0, which is required so that the basic math operations, multiply and add (what SUMPRODUCT does), have something to work on.
In addition to the * operator, many other operators can be used. We will address most very briefly later, but there is one that you will come across a great deal if you search the internet for solutions, and that is the double unary, --.
The double unary does not directly replace the * operator, it has to be used differently. The above formula with the double unary would not be
=SUMPRODUCT((B2:B20>=DATE(2009,1,1))--(B2:B20<DATE(2009,4,1))--(D2:D20))
but would be
=SUMPRODUCT(--(B2:B20>=DATE(2009,1,1)),--(B2:B20<DATE(2009,4,1)),D2:D20)
In the previous formula, the * operator was used to multiply each element of two arrays to get another array of 1s and 0s. In this formula, the double unary coerces each array of TRUE/FALSE values to 1 or 0 directly. SUMPRODUCT manages the multiplication of each (numeric) array and sum the results, this is SUMPRODUCT in its most basic form as described in section 1.1 SUMPRODUCT Basics.
The double unary is my preferred way of using SUMPRODUCT for multiple conditional tests because it is closer to basic SUMPRODUCT usage. Using the * operator means that the multiplication, or product, part of the formula is handled by the * operator, and SUMPRODUCT just does the summing.
[We could also use =SUMPRODUCT((B2:B20>=DATE(2009,1,1))*(B2:B20<DATE(2009,4,1)),D2:D20) which still uses the * operator to force the TRUE/FALSE arrays to a final array of 1s and 0s, and SUMPRODUCT will multiply that resultant array by the value array and sum the results]
There are some instances where the * operator doesn’t work, and some where the double unary doesn’t work, these will be covered in detail later.
3.4 OR Conditions
You may well be aware that OR conditions are managed in an IF test by using the OR function, such as =IF(OR(J1>20,K1="X"),"Match","No match"). Unfortunately this construct does not work in multiple conditional tests in SUMPRODUCT.
But consider an alternative IF test. This formula =IF((J1>20)+(K1="X"),"Match","No match") produces the same result as the previous example, so let’s examine it some more.
Each of the two conditions, (J1>20) and (K1="X"), return TRUE or FALSE., so we are in familiar territory already. This formula works because of the way that Excel adds TRUE/FALSE values. Table 1 - OR Table shows the results of each combination of TRUE and FALSE.
TRUE TRUE 2
TRUE FALSE 1
FALSE TRUE 1
FALSE FALSE 0
Table 1 - OR Table
Each result except FALSE/FALSE is non-zero, and Excel will treat any non-zero value as TRUE, zero as FALSE. In other words, by adding the conditions, if either condition is TRUE the result is TRUE.
3.4.1 Counting OR Conditions
We can use this approach within SUMPRODUCT as well, by adding the conditions using the + operator. As an example, to calculate the total number of sales of ADSL Routers or sales to Bell Labs we can use
=SUMPRODUCT((C2:C20="ADSL Routers")+(A2:A20="Bell Labs"))
which gives the result 8.
Breaking this down we can see that (C2:C20="ADSL Routers") returns the array
{FALSE;TRUE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE;FALSE ;FALSE;TRUE;FALSE;TRUE;FALSE;FALSE}
and (A2:A20="Bell Labs") returns the array
{TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRU E;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE}
Adding these arrays gives a resultant array {1;1;0;0;0;1;0;0;1;0;0;0;1;0;1;0;1;1;0} using the table rules shown above.
Summing this resultant array gives the final result of 8.
It should be clear that the + operator is acting in a similar way to the * operator, that is it is explicitly performing the math on the conditions, leaving SUMPRODUCT to simply sum the resultant array.
Unfortunately, there is a twist in this little technique. In our previous example we showed how to count the sales to Bell Labs OR for ADSL Routers. If we repeat this for Verizon instead of Bell Labs using
=SUMPRODUCT(((A2:A20="Verizon")+(C2:C20="ADSL Routers")))
Looking at the data we can see there are 7 items that match these criteria, but this formula returns 9. So what has gone wrong? If you look carefully at the data you can see that rows 3 and 16 are for Verizon and are ADSL Routers, so these items are being double-counted (look back at Table 1 - OR Table, you can see that TRUE+TRUE returns 2). Fortunately this is easily remedied, we can just test the OR conditions for > 0, because the results of meeting either or both of the conditions are both > 0 whereas neither conditions met is = 0. Testing for > 0 gives a TRUE/FALSE value, so we have to again coerce this to 1s and 0s. The net result of this is the following formula
=SUMPRODUCT(--((A2:A20="Verizon")+(C2:C20="ADSL Routers")>0))
which returns the correct result of 7.
3.4.2 Summing OR Conditions – OR and AND
Calculating the quantity with an OR condition is possible just as it is with the AND condition, by combining OR and AND operations. A little care has to be exercised to ensure that the correct operator precedence is followed. If we were to just add the quantity using a * operator
=SUMPRODUCT((A2:A20="Bell Labs")+(C2:C20="ADSL Routers")*(D2:D20))
returns the result of 210, which is clearly too few. If we look at the final resultant array, we can see that it returns {1;75;0;0;0;75;0;0;25;0;0;0;1;0;15;0;17;1;0} which sums to 210. We can see here that each first conditional tests that are TRUE returns a value of 1 into the resultant array, not the quantity. This is because the * operator is processed before the + operator, so this part of the formula (C2:C20="ADSL Routers")*(D2:D20) is evaluated first, and the other condition is ORed with this resultant array, rather than the first two conditions being ORed before that resultant array is multiplied by the value array.
This is easily rectified with an extra set of brackets
=SUMPRODUCT(((A2:A20="Bell Labs")+(C2:C20="ADSL Routers"))*(D2:D20))
which returns a final resultant array of {700;75;0;0;0;75;0;0;25;0;0;0;33;0;15;0;17;25;0}, which gives the correct result
of 965. This works because brackets are evaluated before the * operator.
It should be clear that this version of the formula
=SUMPRODUCT(((A2:A20="Bell Labs")+(C2:C20="ADSL Routers")),D2:D20)
also works.
[We could include the double unary before the ORed tests, =SUMPRODUCT(--((C2:C20="ADSL Routers")+(A2:A20="Bell Labs")),D2:D20), but this would be superfluous as the + operator explicitly coerces the conditions into 1/0 values]
3.4.3 NOT OR Conditions
Whereas AND conditions easily accommodate not equal values
=SUMPRODUCT(--(A2:A20="Verizon"),--(C2:C20<>"ADSL Routers"))
it is not so simple for OR conditions. This formula
=SUMPRODUCT(((A2:A20="Verizon")+(C2:C20<>"ADSL Routers")))
returns 18 not 2 as it would appear it should. The reason for this will be covered in the next step.
3.5 Summary
Now we have learnt how to use SUMPRODUCT to sum items that match multiple conditions as well as counting those multiple conditions, and we have also covered the use of the double unary as against the * operator.
Beyond that, we have extended our capabilities by using SUMPRODUCT to cater for OR tests as well as AND tests.
Next time we will look at including extra functions within the test, beyond simple equals, not equals, etc.