Resources for further learning
For more in depth information about PostgresSql and SQL, the following resources could help.
- PostgreSQL installation
- Aggregrate functions in SQL
- SELECT in SQL
- Matching functions in SQL
- INSERTing data in SQL
- How to UPDATE data in SQL
- How to DELETE data in SQL
- SQL overview in PostgreSQL
- PSQL commands for PostgreSQL
Linear Interpolation
What is Linear Interpolation?
Linear interpolation estimates a value between two known points by assuming a straight-line relationship. Formula:
Example Dataset:
10, 11, 12, 40
The position is as follows: |Position|Value| |——–|—–| |0|10| |1|11| |2|12| |3|40|
We want the 50th percentile (median) using PERCENTILE_CONT(0.5).
Number of elements: 4
Position of desired percentile:
(4 - 1) * 0.5 = 1.5
Calculation of interpolated value:
11 + ( ((1.5 − 1) * (12 − 11)) / (2 - 1) ) =>
11 + ( (0.5 * 1) / 1) =>
11 + 0.5 = 11.5
Variance
What is Variance?
Variance is a statistical measure that shows how spread out the numbers in a dataset are. If all numbers are close to each other, the variance is low. If all numbers are very different from each other, the variance is high.
Formula:
Example 1
Dataset:
10, 11, 12, 40
Calculation of the variance:
- Calculate the average of the dataset:
(10 + 11 + 12 + 40) / 4 = 18,25 - Apply the variance formula:
( (10-18,25)² + (11-18,25)² + (12-18,25)² + (40-18,25)² ) / 4 => (68,0625 + 52,5625 + 39,0625 + 473,0625) / 4 => 632,75 / 4 = 158,1875
Example 2
Dataset:
10, 11, 12, 14
Calculation of the variance:
- Calculate the average of the dataset:
(10 + 11 + 12 + 14) / 4 = 11,75 - Apply the variance formula:
( (10-11,75)² + (11-11,75)² + (12-11,75)² + (14-11,75)² ) / 4 => (3,0625 + 0,5625 + 0,0625 + 5,0625) / 4 => 8,75 / 4 = 2,1875Standard deviation
What is Standard deviation?
Standard deviation is a statistical measure that shows how much the numbers in a dataset deviate from the average.
Formula:
Example 1
Dataset:
10, 11, 12, 40
- Calculate the variance
( (10-18,25)² + (11-18,25)² + (12-18,25)² + (40-18,25)² ) / 4 = 158,1875
- Take the square root of the variance
√158,1875 => 12,577
Example 2
Dataset:
10, 11, 12, 14
- Calculate the variance
( (10-11,75)² + (11-11,75)² + (12-11,75)² + (14-11,75)² ) / 4 = 2,1875
- Take the square root of the variance
√2,1875 => 1,479
Calculate with SQL in PostgreSQL
Try this SQL statement to see how PostgreSQL performs these calculations automatically. The dataset used here is the same as in Example 1.
WITH data(value) AS (
VALUES
(10),
(11),
(12),
(40)
)
SELECT
COUNT(*) AS n,
AVG(value) AS average,
VAR_POP(value) AS variance,
STDDEV_POP(value) AS standard_deviation,
PERCENTILE_CONT(0.5)
WITHIN GROUP (ORDER BY value) AS median
FROM data;
The WITH clause defines a Common Table Expression (CTE), which creates a temporary result set that can be used like a table within this query. CTEs are explained in more detail in a later workshop.
The WITHIN GROUP clause defines the order in which values are processed by the percentile calculation. It defines a preprocessing step that orders the input values before the PERCENTILE_CONT calculation is performed.