The Relational Model
A relational database organises data into tables (relations) — rows are records, columns are attributes. Tables are linked by keys. This model, proposed by Edgar Codd in 1970, underpins PostgreSQL, MySQL, SQLite, and Oracle — still the most widely-used database family.
| Concept | Meaning |
|---|---|
| Primary Key | Unique identifier for each row; cannot be NULL |
| Foreign Key | Column referencing a primary key in another table; enforces referential integrity |
| Unique Constraint | Column value must be unique across all rows (e.g., email) |
| NOT NULL | Column must always have a value |
| Index | Data structure that speeds up reads on a column (at the cost of slower writes) |
SQL — Core Syntax
SELECT and Filtering
-- Basic SELECT
SELECT id, name, email FROM users WHERE active = true;
-- Sorting and limiting
SELECT * FROM products ORDER BY price DESC LIMIT 10;
-- Aggregation
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000; -- HAVING filters groups (WHERE filters rows)
-- Subquery
SELECT name FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);
JOINs
JOINs combine rows from two or more tables based on a related column. Understanding JOINs is the most tested SQL skill in interviews.
-- INNER JOIN — only rows that match in BOTH tables
SELECT o.id, u.name, o.total
FROM orders o
INNER JOIN users u ON u.id = o.user_id;
-- LEFT JOIN — all rows from left table, matching rows from right (NULL if no match)
SELECT u.name, o.id AS order_id
FROM users u
LEFT JOIN orders o ON o.user_id = u.id; -- includes users with no orders (NULL order_id)
-- RIGHT JOIN — opposite of LEFT JOIN (prefer LEFT JOIN for clarity)
-- FULL OUTER JOIN — all rows from both tables (NULL where no match)
-- SELF JOIN — table joins with itself
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;
Window Functions
-- ROW_NUMBER, RANK, DENSE_RANK — rank rows within a group
SELECT name, salary, department,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;
-- Running total
SELECT date, amount,
SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions;
-- LAG / LEAD — access previous/next row value
SELECT date, revenue,
LAG(revenue, 1) OVER (ORDER BY date) AS prev_day_revenue
FROM daily_sales;
Normalization
Normalization is the process of organising a database to reduce data redundancy and ensure data integrity. It works by applying a series of rules called Normal Forms (NF).
| Form | Rule | Violation example |
|---|---|---|
| 1NF | Atomic values, no repeating groups, unique rows | Storing "Alice, Bob" in a single "members" column |
| 2NF | 1NF + no partial dependencies (every non-key column depends on the whole PK) | Order table stores customer_name — depends on customer_id, not order_id |
| 3NF | 2NF + no transitive dependencies (no non-key column depends on another non-key column) | Employee table stores department_name and department_manager — manager depends on dept, not employee |
| BCNF | 3NF + every determinant is a candidate key | Rarely violated once 3NF is reached; needed for complex composite keys |
Normalise vs denormalise
Normalisation reduces redundancy and prevents update anomalies — ideal for write-heavy OLTP systems. Denormalisation (joining data ahead of time) reduces JOIN overhead for read-heavy OLAP/analytics systems. Real-world systems often normalise the source of truth (relational DB) and denormalise for query optimisation (read replicas, data warehouses).
Transactions and ACID
A transaction is a group of SQL operations that execute as a single unit. Transactions are the mechanism that ensures data stays consistent even when multiple operations happen concurrently or when failures occur. ACID is the set of properties that guarantee this.
| Property | Guarantee |
|---|---|
| Atomicity | All operations succeed or all are rolled back — no partial updates |
| Consistency | A transaction brings the DB from one valid state to another, respecting all constraints |
| Isolation | Concurrent transactions don't interfere with each other (appear to execute serially) |
| Durability | Once committed, a transaction's effects persist even if the system crashes immediately after |
-- Classic bank transfer: debit one account, credit another
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE id = 1; -- debit
UPDATE accounts SET balance = balance + 500 WHERE id = 2; -- credit
COMMIT;
-- If anything fails between BEGIN and COMMIT, ROLLBACK reverts both changes
Isolation Levels
| Level | Dirty Read | Non-repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted | Yes | Yes | Yes |
| Read Committed | No | Yes | Yes |
| Repeatable Read | No | No | Yes |
| Serializable | No | No | No |
Indexing
An index is a data structure that allows the DB to find rows matching a condition without scanning the entire table. Without an index, a query like WHERE email = 'a@b.com' scans every row — O(n). With an index on email, it's O(log n).
Most databases use B-tree indexes by default — balanced tree, good for equality and range queries. Hash indexes are O(1) for equality but useless for ranges. Full-text indexes enable efficient text search.
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at); -- composite index
-- Index usage rules:
-- 1. Index on columns used in WHERE, JOIN, ORDER BY, GROUP BY
-- 2. Composite index: leftmost prefix rule — (a, b, c) helps WHERE a=? and WHERE a=? AND b=?
-- but NOT WHERE b=? alone
-- 3. SELECT * forces table scan even with index; select specific columns
-- 4. Low-cardinality columns (booleans) — not worth indexing; DB will skip it
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42; -- inspect query plan
NoSQL Overview
NoSQL databases trade relational constraints for scalability, flexibility, or specific query patterns. There is no single NoSQL model — the term covers four distinct paradigms.
| Type | Model | Examples | Best for |
|---|---|---|---|
| Document | JSON-like documents | MongoDB, Firestore | Flexible schemas, nested data |
| Key-Value | Hash map at scale | Redis, DynamoDB | Caching, sessions, leaderboards |
| Column-Family | Rows with dynamic columns | Cassandra, HBase | Write-heavy, time-series, wide tables |
| Graph | Nodes and edges | Neo4j, Neptune | Social graphs, recommendation engines |
Key Takeaways
- Understand the four JOIN types — INNER, LEFT, RIGHT, FULL OUTER. LEFT JOIN is the most commonly used.
- Normalise to 3NF for write-heavy systems; denormalise strategically for read performance.
- ACID transactions prevent data corruption in concurrent systems. Know what each letter means.
- Index on columns used in WHERE, JOIN, ORDER BY. Composite indexes follow the leftmost prefix rule.
- NoSQL is not a replacement for SQL — pick based on data model and access pattern, not hype.