Query Optimization – Writing Efficient SQL
- Bulk Inserts
- Insert .. Select
- Returning clausule
- Common Table Expressions (CTE’s)
- upserts
- Window functions
Bulk Inserts
Instead of inserting rows one by one, use multi-row INSERT:
INSERT INTO products (id, name, price)
VALUES
(1, 'Keyboard', 49.99),
(2, 'Mouse', 19.99),
(3, 'Monitor', 199.99);
This is much faster because it requires only one transaction and one parse/plan phase.
For very large datasets, prefer COPY (direct load from CSV or STDIN).
COPY products (id, name, price)
FROM '/path/to/products.csv' DELIMITER ',' CSV HEADER;
INSERT … SELECT
Insert the result of a query directly into another table:
INSERT INTO archive_orders (id, created_at, amount)
SELECT id, created_at, amount
FROM orders
WHERE created_at < now() - interval '1 year';
Avoids fetching rows into the application and re-inserting them.
RETURNING Clause
Get inserted/updated values immediately without an extra select:
INSERT INTO users (username, created_at)
VALUES ('alice', now())
RETURNING id;
Useful when you need the new primary key directly.
Common Table Expressions (CTEs) — WITH Clause
Use CTEs to combine multiple steps into one statement:
WITH moved AS (
DELETE FROM queue
WHERE status = 'ready'
RETURNING id, payload
)
INSERT INTO processing (id, payload, started_at)
SELECT id, payload, now() FROM moved;
This combines a delete + insert in a single transactional step.
Optimizing Inserts and Updates with Upserts
Context
We want to store page view counts per URL per hour.
CREATE TABLE page_views (
url TEXT NOT NULL,
time_bucket TIMESTAMP NOT NULL,
views INT NOT NULL,
PRIMARY KEY (url, time_bucket)
);
A web application reports one “page view” at a time. For each (url, time_bucket) combination, we need to increase the counter.
Part 1 — Naïve approach (without upsert)
Step 1. Insert some sample data:
INSERT INTO page_views (url, time_bucket, views)
VALUES ('/home', '2025-09-04 10:00', 5);
Step 2. Now add a new page view for the same (url, time_bucket).
The naïve approach requires multiple steps:
-- Step 2a: Check if record exists
SELECT views FROM page_views
WHERE url = '/home' AND time_bucket = '2025-09-04 10:00';
-- Step 2b: If record exists → update
UPDATE page_views
SET views = views + 1
WHERE url = '/home' AND time_bucket = '2025-09-04 10:00';
-- Step 2c: If record does not exist → insert
INSERT INTO page_views (url, time_bucket, views)
VALUES ('/home', '2025-09-04 10:00', 1);
👉 Problems:
- Multiple queries per insert.
- Risk of race conditions if many clients update at the same time.
- Higher latency.
Part 2 — Upsert (PostgreSQL)
Postgres provides a single statement that does both insert and update:
INSERT INTO page_views (url, time_bucket, views)
VALUES ('/home', '2025-09-04 10:00', 1)
ON CONFLICT (url, time_bucket)
DO UPDATE SET views = page_views.views + EXCLUDED.views;
- If the record does not exist → a new row is inserted.
- If the record exists → it is updated (
page_views.views + EXCLUDED.views). Thepage_views.viewsrefers to the currentviewsand theEXCLUDED.viewsto the new views you tried to insert.
👉 Advantages:
- One query instead of multiple.
- Atomic and concurrency-safe.
- Fewer roundtrips → better performance.
A similar syntax also exists for MySQL and MariaDB.
-
PostgreSQL
PostgreSQL SQL Insert documentation
👉 Tip: Ctrl+click (Windows) or Cmd+click (Mac) to open the framework documentation in a new tab.
MERGE (PostgreSQL 15+)
SQL-standard alternative to upsert, more flexible for complex synchronization:
MERGE INTO inventory AS i
USING new_inventory AS n
ON i.sku = n.sku
WHEN MATCHED THEN
UPDATE SET quantity = i.quantity + n.quantity
WHEN NOT MATCHED THEN
INSERT (sku, quantity) VALUES (n.sku, n.quantity);
Questions
🧠 Why is the upsert approach more efficient?
Click to reveal the answer
Without `UPSERT` (`INSERT ... ON CONFLICT DO UPDATE`), applications often have to: 1. First run a `SELECT` to check if the row exists. 2. Then run either an `INSERT` or an `UPDATE`. That means **two queries and two network roundtrips**. With an `UPSERT`, the database handles both cases in **one atomic query**, which reduces latency and load.🧠 How does it help prevent race conditions?
Click to reveal the answer
If two clients try to insert the same key at the same time: - With manual `SELECT` + `INSERT`, you may get a conflict (duplicate key error). - With `UPSERT`, PostgreSQL detects the conflict and applies the `DO UPDATE` logic automatically. This ensures consistency **without requiring extra application-side locking**.🧠 Can you think of situations where you should not use upserts?
Click to reveal the answer
Upserts are very convenient, but there are a few cases where they may not be ideal: - **High write contention:** if many transactions frequently update the same row, the conflict checks can become a bottleneck. - **Complex update logic:** sometimes you need more control than a simple `DO UPDATE` allows. - **Auditing requirements:** when you need to distinguish clearly between an `INSERT` and an `UPDATE` for logging or compliance reasons. In those cases, a more explicit application workflow may be better.Window Functions
Window functions let you perform aggregate-like calculations across sets of rows, but without collapsing the result into a single row per group. Each row keeps its identity, while still carrying the aggregated information.
Example 1 — Aggregation with GROUP BY
SELECT customer_id, SUM(amount) AS total_per_customer
FROM orders
GROUP BY customer_id;
This query collapses rows: you only get one row per customer.
Example 2 — Window function
SELECT id, customer_id, amount,
SUM(amount) OVER (PARTITION BY customer_id) AS total_per_customer
FROM orders;
Here each row keeps its own id and amount, but also shows the total per customer.
PARTITION BY customer_id→ defines the “window” (the set of rows per customer).- You can also use
ORDER BYinside the window to compute running totals, ranks, or moving averages.
Visual: GROUP BY vs Window Function
flowchart TB
subgraph Input["Orders table"]
O1["id=1, customer=Alice, amount=50"]
O2["id=2, customer=Alice, amount=30"]
O3["id=3, customer=Bob, amount=70"]
end
subgraph GroupBy["GROUP BY customer_id"]
GA["Alice, total=80"]
GB["Bob, total=70"]
end
subgraph Window["SUM(amount) OVER (PARTITION BY customer_id)"]
WA1["id=1, Alice, amount=50, total=80"]
WA2["id=2, Alice, amount=30, total=80"]
WB1["id=3, Bob, amount=70, total=70"]
end
Input --> GroupBy
Input --> Window
Why it helps
- Avoids writing multiple subqueries.
- Runs with a single table scan, which is faster.
- Very powerful for analytics (rankings, percentiles, cumulative sums).
Example 3 — Running total
SELECT id, order_date, amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
This computes a cumulative sum ordered by date, directly in one query.
Example 4 - Running total per customer
SELECT id, customer_id, order_date, amount,
SUM(amount) OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS running_total
FROM orders;
This computes a cumulative sum per customer ordered by date.
👉 In short: window functions combine the detail view of each row with the power of aggregates, in one efficient query.