Monday, December 27, 2010

How to replicate a "GROUP BY" clause in Excel

As a Data analyst, I often use the “group by” clause in sql to aggregate records against a key field. For instance, to bring back records where a field has greater than 10 entries I’d use a statement like “select city, count(*) from city_sales group by city having count(*) > 10”
However I’m currently in a role where I can’t use a database :( So, can Excel do the business? Well yes, pivot tables can provide a workaround. It’s a bit more involved than using a decent database (and obviously you’d need that database if you’re dealing with hundreds of thousands of rows) but at least it’s an option.
Here’s a short example. Supposing I have various sales records for different cities (see data below). I’ve just listed a few records, but there could be 100s or 1000s of records. Supposing I wanted to see the total sales figure, for those cities with greater than 3 sales?

What I’d do is create a couple of pivot tables. The first one involves putting the city in the row label, and then putting the inv_num field in the data values area. Note that I have to change the “Value field settings” here to show the count (and not the sum). It’s then easy enough to change the sort by to a descending sort on the count of inv_num.

I now create a similar pivot table, but this time I put the Amount field into the Data Value, and this time keep the formula as “Sum of”.

It’s now a simple matter to copy the relevant data, and then use VLOOKUP to bring back the sum of sales for those cities you’ve brought back.

Remember I can also put a field in the Report filter (equivalent to a Where clause in sql) so I could (for example) select only cities from a specified area.
I still prefer sql by far for this type of calculation, but if Excel is your only tool, at least we have this option of the pivot tables.
Hope this is of use to someone out there!

No comments:

Post a Comment