Database Management · contentintech

The Relational Model

A relational database organises data into tables (relations) — rows are records, columns are attributes. Tables are linked by keys. This model, proposed by Edgar Codd in 1970, underpins PostgreSQL, MySQL, SQLite, and Oracle — still the most widely-used database family.

Concept	Meaning
Primary Key	Unique identifier for each row; cannot be NULL
Foreign Key	Column referencing a primary key in another table; enforces referential integrity
Unique Constraint	Column value must be unique across all rows (e.g., email)
NOT NULL	Column must always have a value
Index	Data structure that speeds up reads on a column (at the cost of slower writes)

SQL — Core Syntax

SELECT and Filtering

-- Basic SELECT
SELECT id, name, email FROM users WHERE active = true;

-- Sorting and limiting
SELECT * FROM products ORDER BY price DESC LIMIT 10;

-- Aggregation
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;  -- HAVING filters groups (WHERE filters rows)

-- Subquery
SELECT name FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);

JOINs

JOINs combine rows from two or more tables based on a related column. Understanding JOINs is the most tested SQL skill in interviews.

-- INNER JOIN — only rows that match in BOTH tables
SELECT o.id, u.name, o.total
FROM orders o
INNER JOIN users u ON u.id = o.user_id;

-- LEFT JOIN — all rows from left table, matching rows from right (NULL if no match)
SELECT u.name, o.id AS order_id
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;  -- includes users with no orders (NULL order_id)

-- RIGHT JOIN — opposite of LEFT JOIN (prefer LEFT JOIN for clarity)

-- FULL OUTER JOIN — all rows from both tables (NULL where no match)

-- SELF JOIN — table joins with itself
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;

Window Functions

-- ROW_NUMBER, RANK, DENSE_RANK — rank rows within a group
SELECT name, salary, department,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- Running total
SELECT date, amount,
       SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions;

-- LAG / LEAD — access previous/next row value
SELECT date, revenue,
       LAG(revenue, 1) OVER (ORDER BY date) AS prev_day_revenue
FROM daily_sales;

Normalization

Normalization is the process of organising a database to reduce data redundancy and ensure data integrity. It works by applying a series of rules called Normal Forms (NF).

Form	Rule	Violation example
1NF	Atomic values, no repeating groups, unique rows	Storing "Alice, Bob" in a single "members" column
2NF	1NF + no partial dependencies (every non-key column depends on the whole PK)	Order table stores customer_name — depends on customer_id, not order_id
3NF	2NF + no transitive dependencies (no non-key column depends on another non-key column)	Employee table stores department_name and department_manager — manager depends on dept, not employee
BCNF	3NF + every determinant is a candidate key	Rarely violated once 3NF is reached; needed for complex composite keys

Normalise vs denormalise

Normalisation reduces redundancy and prevents update anomalies — ideal for write-heavy OLTP systems. Denormalisation (joining data ahead of time) reduces JOIN overhead for read-heavy OLAP/analytics systems. Real-world systems often normalise the source of truth (relational DB) and denormalise for query optimisation (read replicas, data warehouses).

Transactions and ACID

A transaction is a group of SQL operations that execute as a single unit. Transactions are the mechanism that ensures data stays consistent even when multiple operations happen concurrently or when failures occur. ACID is the set of properties that guarantee this.

Property	Guarantee
Atomicity	All operations succeed or all are rolled back — no partial updates
Consistency	A transaction brings the DB from one valid state to another, respecting all constraints
Isolation	Concurrent transactions don't interfere with each other (appear to execute serially)
Durability	Once committed, a transaction's effects persist even if the system crashes immediately after

-- Classic bank transfer: debit one account, credit another
BEGIN;
  UPDATE accounts SET balance = balance - 500 WHERE id = 1;  -- debit
  UPDATE accounts SET balance = balance + 500 WHERE id = 2;  -- credit
COMMIT;
-- If anything fails between BEGIN and COMMIT, ROLLBACK reverts both changes

Isolation Levels

Level	Dirty Read	Non-repeatable Read	Phantom Read
Read Uncommitted	Yes	Yes	Yes
Read Committed	No	Yes	Yes
Repeatable Read	No	No	Yes
Serializable	No	No	No

Indexing

An index is a data structure that allows the DB to find rows matching a condition without scanning the entire table. Without an index, a query like WHERE email = 'a@b.com' scans every row — O(n). With an index on email, it's O(log n).

Most databases use B-tree indexes by default — balanced tree, good for equality and range queries. Hash indexes are O(1) for equality but useless for ranges. Full-text indexes enable efficient text search.

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);  -- composite index

-- Index usage rules:
-- 1. Index on columns used in WHERE, JOIN, ORDER BY, GROUP BY
-- 2. Composite index: leftmost prefix rule — (a, b, c) helps WHERE a=? and WHERE a=? AND b=?
--    but NOT WHERE b=? alone
-- 3. SELECT * forces table scan even with index; select specific columns
-- 4. Low-cardinality columns (booleans) — not worth indexing; DB will skip it

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;  -- inspect query plan

NoSQL Overview

NoSQL databases trade relational constraints for scalability, flexibility, or specific query patterns. There is no single NoSQL model — the term covers four distinct paradigms.

Type	Model	Examples	Best for
Document	JSON-like documents	MongoDB, Firestore	Flexible schemas, nested data
Key-Value	Hash map at scale	Redis, DynamoDB	Caching, sessions, leaderboards
Column-Family	Rows with dynamic columns	Cassandra, HBase	Write-heavy, time-series, wide tables
Graph	Nodes and edges	Neo4j, Neptune	Social graphs, recommendation engines

Key Takeaways

Understand the four JOIN types — INNER, LEFT, RIGHT, FULL OUTER. LEFT JOIN is the most commonly used.
Normalise to 3NF for write-heavy systems; denormalise strategically for read performance.
ACID transactions prevent data corruption in concurrent systems. Know what each letter means.
Index on columns used in WHERE, JOIN, ORDER BY. Composite indexes follow the leftmost prefix rule.
NoSQL is not a replacement for SQL — pick based on data model and access pattern, not hype.