Wednesday, December 26, 2012

Rank Functions

0 comments
RANK(), DENSE_RANK() and ROW_NUMBER() are three important reporting functions in Postgresql. These are also used in Oracle.

RANK
Retuns unique serial number for each row, except for rows that have duplicate values, in which case the same ranking is assigned
and a gap appears in the sequence for each duplicate ranking.

DENSE_RANK
Retunrs unique serial number for each row, except for rows that have duplicate values, in which case the same ranking is assigned.

ROW_NUMBER
Returns unique serial number for each row. For rows that have duplicate values, numbers are randomly assigned.

Let we have a table as follows:-

SELECT * FROM t1;

 id  | points 
----+----
  1 | 19
  2 | 86
  3 | 24
  4 |  5
  5 | 56
  6 | 48
  7 | 55
  8 | 74
  9 | 69
 10 | 34
 11 | 96
 12 | 32
 13 | 35
 14 | 92
 15 | 18
 16 | 85
 17 | 25
 18 | 88
 19 | 59
 20 | 78
(20 rows)

There is no duplicate value in "points" column.
So using the following query, we get same result for rank(), dense_rank() and row_number().

SELECT *,
RANK() OVER( ORDER BY points DESC ) rnk,
DENSE_RANK() OVER(ORDER BY points DESC ) drank,
ROW_NUMBER() OVER( ORDER BY points DESC ) rnum
FROM t1;

  id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 20 | 78 |   6 |     6 |    6
  8 | 74 |   7 |     7 |    7
  9 | 69 |   8 |     8 |    8
 19 | 59 |   9 |     9 |    9
  5 | 56 |  10 |    10 |   10
  7 | 55 |  11 |    11 |   11
  6 | 48 |  12 |    12 |   12
 13 | 35 |  13 |    13 |   13
 10 | 34 |  14 |    14 |   14
 12 | 32 |  15 |    15 |   15
 17 | 25 |  16 |    16 |   16
  3 | 24 |  17 |    17 |   17
  1 | 19 |  18 |    18 |   18
 15 | 18 |  19 |    19 |   19
  4 |  5 |  20 |    20 |   20

Now, removing a random row from the table and execute the query again to see if the result is changed:

BEGIN;
DELETE from t1 where id=8;

SELECT *,
RANK() OVER( ORDER BY points DESC ) rnk,
DENSE_RANK() OVER(ORDER BY points DESC ) drank,
ROW_NUMBER() OVER( ORDER BY points DESC ) rnum
FROM t1;

 id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 20 | 78 |   6 |     6 |    6
  9 | 69 |   7 |     7 |    7
 19 | 59 |   8 |     8 |    8
  5 | 56 |   9 |     9 |    9
  7 | 55 |  10 |    10 |   10
  6 | 48 |  11 |    11 |   11
 13 | 35 |  12 |    12 |   12
 10 | 34 |  13 |    13 |   13
 12 | 32 |  14 |    14 |   14
 17 | 25 |  15 |    15 |   15
  3 | 24 |  16 |    16 |   16
  1 | 19 |  17 |    17 |   17
 15 | 18 |  18 |    18 |   18
  4 |  5 |  19 |    19 |   19
(19 rows)

ROLLBACK;

Again, there is no difference among the results of rank(), dense_rank() and row_number().

Now, I interted a new rocord in the table to make a duplicate value in "points" column.

INSERT INTO t1 VALUES (21, 78);
Now we have two records with 78 points. Using the query we got result :-

 id  | points  | rnk | drank | rnum
----+----+-----+-------+------
 11 | 96 |   1 |     1 |    1
 14 | 92 |   2 |     2 |    2
 18 | 88 |   3 |     3 |    3
  2 | 86 |   4 |     4 |    4
 16 | 85 |   5 |     5 |    5
 21 | 78 |   6 |     6 |    6
 20 | 78 |   6 |     6 |    7
  8 | 74 |   8 |     7 |    8
  9 | 69 |   9 |     8 |    9
 19 | 59 |  10 |     9 |   10
  5 | 56 |  11 |    10 |   11
  7 | 55 |  12 |    11 |   12
  6 | 48 |  13 |    12 |   13
 13 | 35 |  14 |    13 |   14
 10 | 34 |  15 |    14 |   15
 12 | 32 |  16 |    15 |   16
 17 | 25 |  17 |    16 |   17
  3 | 24 |  18 |    17 |   18
  1 | 19 |  19 |    18 |   19
 15 | 18 |  20 |    19 |   20
  4 |  5 |  21 |    20 |   21
(21 rows)

The difference is obvious.
For rank() function in "rnk" column, we get rank 6 for both ID 21 and 20 and the next rank is 8 and the last rank is 21. Here rank 7 is not used as rank 6 gets two records.
For dense_rank() function in "drank" column, we get rank 6 for both ID 21 and 20 and the next rank is 7 and the last rank is 20. Here no gap is created.
For row_number() function in "rnum" column, we get two different values for the same "points" as it only gives the row number instead of ranking.

Leave a Reply

Labels