Chapter 8 Aggregation functions {aggregation-functions}

Since we have already dived deep into aggregations in ??, this chapter will expand on aggregations by going through some of the other aggregation functions found in Oracle.

8.1 Median and quantiles

Medians are easy to compute, simply switch AVG with MEDIAN and you are set. Quartiles, deciles and percentiles give you a lot of options, and so they are not implemented in the same way - hence, the syntax is a bit trickier. Let's look at an example, again from the transactions table, where we compare the median, quartiles, P10 and P90 of the unitprice between countries.

SELECT country, 
PERCENTILE_DISC(0.1) WITHIN GROUP(ORDER BY unitprice) AS p10_price,
PERCENTILE_DISC(0.25) WITHIN GROUP(ORDER BY unitprice) AS q1_price,
MEDIAN(unitprice) AS median_price,
PERCENTILE_DISC(0.5) WITHIN GROUP(ORDER BY unitprice) AS p50_price,
PERCENTILE_DISC(0.75) WITHIN GROUP(ORDER BY unitprice) AS q3_price,
PERCENTILE_DISC(0.9) WITHIN GROUP(ORDER BY unitprice) AS p90_price
FROM trx
GROUP BY country
ORDER BY COUNT(1) DESC
FETCH FIRST 10 ROWS ONLY
(#tab:pctl_sql)Displaying records 1 - 10
COUNTRY P10_PRICE Q1_PRICE MEDIAN_PRICE P50_PRICE Q3_PRICE P90_PRICE
United Kingdom 0.65 1.25 2.1 2.1 4.1 8.0
Germany 0.55 1.25 1.9 1.9 3.8 8.2
France 0.55 1.25 1.8 1.8 3.8 8.0
EIRE 0.55 1.25 2.1 2.1 4.2 8.5
Spain 0.65 1.25 2.1 2.1 4.2 8.5
Netherlands 0.42 0.85 1.4 1.4 2.5 5.0
Belgium 0.55 1.25 1.9 1.9 4.2 9.9
Switzerland 0.42 1.25 1.7 1.7 3.8 8.0
Portugal 0.65 1.25 1.6 1.6 3.2 6.0
Australia 0.55 1.25 1.8 1.8 3.8 6.0

To abbreviate the output, we have ordered the output by number of sales, and limited the result to the ten countries with the most sales.

The MEDIAN function is simple, taking only the unitprice as input. As you can see from the median_price and p50_price columns, this is equivalent to taking the 50th percentile.

The PERCENTILE_DISC function which returns the percentiles, is somewhat more involved. In order to understand the logic, remember what percentiles (and quartiles and medians) really are: They are the value at a given observation, after the data has been sorted in ascending order by that value. Therefore, the PERCENTILE_DISC function only takes the percentile as an argument. The variable we are interested in, unitprice, is specified in the WITHIN GROUP statement, as the variable we are ordering by. Because the SQL syntax is quite consistent, you could even add some confusion to your calculations by altering the sort order. Try it yourself by writing ORDER BY unitprice DESC in the WITHIN GROUP statement for p10_price. Suddenly you will have the 90th percentile instead of the 10th.

A second, smaller cause for confusion is the function name, PERCENTILE_DISC. For our purposes, this is the normal percentile function we want to use. But there is a different percentile function, PERCENTILE_CONT, which tries to linearly interpolate the values and return the given percentile from that interpolated function.

8.2 First values {first-values}

Another interesting thing to grab, is the first or last value of some variable, ordered by some other variable. To illustrate this, ask yourself: Are customers likely to buy larger quantities of products with a low unit price, or a high unit price?

If we want to investigate this, we can select the minimum quantity purchased of the cheapest product sold in each country, and the maximum quantity sold of the most expensive product sold in each country.

SELECT country, 
  AVG(quantity) KEEP (DENSE_RANK FIRST ORDER BY unitprice) AS min_q_cheap_products,
  AVG(quantity) KEEP (DENSE_RANK LAST ORDER BY unitprice) AS max_q_expensive_products
FROM trx
GROUP BY country
ORDER BY COUNT(1) DESC
FETCH FIRST 10 ROWS ONLY
Table 8.1: Displaying records 1 - 10
COUNTRY MIN_Q_CHEAP_PRODUCTS MAX_Q_EXPENSIVE_PRODUCTS
United Kingdom 1 -1.0
Germany 1 0.0
France 1 0.0
EIRE 68 0.0
Spain 11 -1.0
Netherlands 144 -1.0
Belgium 39 1.0
Switzerland 1 3.1
Portugal 20 0.0
Australia 103 1.0

8.3 Aggregating text

Aggregating numbers is one thing, but averages and sums don't lend themselves as nicely to text data. What Oracle can do however, is to concatenate text for you. Say, for example, you want to collect all the products of an order in a single cell. This can be done with the special LISTAGG() function, which takes two arguments: The text field, and whatever character string you want to separate the concatenated cells by. In addition (because this just couldn't be this simple), the function must be followed by a WITHIN GROUP statement, in which you specify how the cells should be ordered before concatenation.

SELECT INVOICENO, COUNT(1) AS number_of_goods, LISTAGG(DESCRIPTION, ', ') WITHIN GROUP ( ORDER BY stockcode) AS goods
FROM TRX
--WHERE INVOICENO IN('536384', '536385')
GROUP BY INVOICENO
FETCH FIRST 3 ROWS ONLY
Table 8.2: 3 records
INVOICENO NUMBER_OF_GOODS GOODS
536365 7 GLASS STAR FROSTED T-LIGHT HOLDER, SET 7 BABUSHKA NESTING BOXES, WHITE METAL LANTERN, RED WOOLLY HOTTIE WHITE HEART., KNITTED UNION FLAG HOT WATER BOTTLE, CREAM CUPID HEARTS COAT HANGER, WHITE HANGING HEART T-LIGHT HOLDER
536366 2 HAND WARMER RED POLKA DOT, HAND WARMER UNION JACK
536367 12 HOME BUILDING BLOCK WORD, LOVE BUILDING BLOCK WORD, RECIPE BOX WITH METAL HEART, IVORY KNITTED MUG COSY , BOX OF VINTAGE ALPHABET BLOCKS, BOX OF VINTAGE JIGSAW BLOCKS , POPPY'S PLAYHOUSE BEDROOM , POPPY'S PLAYHOUSE KITCHEN, FELTCRAFT PRINCESS CHARLOTTE DOLL, DOORMAT NEW ENGLAND, ASSORTED COLOUR BIRD ORNAMENT, BOX OF 6 ASSORTED COLOUR TEASPOONS

In this query, we select three columns, two of which are calculated in the aggregation. The COUNT is simple enough, the LISTAGG deserves some further explanation. The item description is in the aptly named DESCRIPTION field, which is the first argument to LISTAGG. The second argument is a comma followed by space, in single quotes, specifying that we want the descriptions separated by a space and a comma - which makes the text readable.

The WITHIN GROUP clause is not optional even if you couldn't care less about the order in which the descriptions appear. You can, of course, arrange the items alphabetically by ordering by the description.