Learn/cs fundamentals/Database Management
Intermediate~20 min read

Database Management

Relational model, SQL, normalization, transactions (ACID), indexing, and NoSQL overview.

SQLACIDNormalizationIndexing

The Relational Model

A relational database organises data into tables (relations) — rows are records, columns are attributes. Tables are linked by keys. This model, proposed by Edgar Codd in 1970, underpins PostgreSQL, MySQL, SQLite, and Oracle — still the most widely-used database family.

ConceptMeaning
Primary KeyUnique identifier for each row; cannot be NULL
Foreign KeyColumn referencing a primary key in another table; enforces referential integrity
Unique ConstraintColumn value must be unique across all rows (e.g., email)
NOT NULLColumn must always have a value
IndexData structure that speeds up reads on a column (at the cost of slower writes)

SQL — Core Syntax

SELECT and Filtering

-- Basic SELECT
SELECT id, name, email FROM users WHERE active = true;

-- Sorting and limiting
SELECT * FROM products ORDER BY price DESC LIMIT 10;

-- Aggregation
SELECT department, COUNT(*) AS headcount, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;  -- HAVING filters groups (WHERE filters rows)

-- Subquery
SELECT name FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);

JOINs

JOINs combine rows from two or more tables based on a related column. Understanding JOINs is the most tested SQL skill in interviews.

-- INNER JOIN — only rows that match in BOTH tables
SELECT o.id, u.name, o.total
FROM orders o
INNER JOIN users u ON u.id = o.user_id;

-- LEFT JOIN — all rows from left table, matching rows from right (NULL if no match)
SELECT u.name, o.id AS order_id
FROM users u
LEFT JOIN orders o ON o.user_id = u.id;  -- includes users with no orders (NULL order_id)

-- RIGHT JOIN — opposite of LEFT JOIN (prefer LEFT JOIN for clarity)

-- FULL OUTER JOIN — all rows from both tables (NULL where no match)

-- SELF JOIN — table joins with itself
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;

Window Functions

-- ROW_NUMBER, RANK, DENSE_RANK — rank rows within a group
SELECT name, salary, department,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

-- Running total
SELECT date, amount,
       SUM(amount) OVER (ORDER BY date) AS running_total
FROM transactions;

-- LAG / LEAD — access previous/next row value
SELECT date, revenue,
       LAG(revenue, 1) OVER (ORDER BY date) AS prev_day_revenue
FROM daily_sales;

Normalization

Normalization is the process of organising a database to reduce data redundancy and ensure data integrity. It works by applying a series of rules called Normal Forms (NF).

FormRuleViolation example
1NFAtomic values, no repeating groups, unique rowsStoring "Alice, Bob" in a single "members" column
2NF1NF + no partial dependencies (every non-key column depends on the whole PK)Order table stores customer_name — depends on customer_id, not order_id
3NF2NF + no transitive dependencies (no non-key column depends on another non-key column)Employee table stores department_name and department_manager — manager depends on dept, not employee
BCNF3NF + every determinant is a candidate keyRarely violated once 3NF is reached; needed for complex composite keys

Normalise vs denormalise

Normalisation reduces redundancy and prevents update anomalies — ideal for write-heavy OLTP systems. Denormalisation (joining data ahead of time) reduces JOIN overhead for read-heavy OLAP/analytics systems. Real-world systems often normalise the source of truth (relational DB) and denormalise for query optimisation (read replicas, data warehouses).

Transactions and ACID

A transaction is a group of SQL operations that execute as a single unit. Transactions are the mechanism that ensures data stays consistent even when multiple operations happen concurrently or when failures occur. ACID is the set of properties that guarantee this.

PropertyGuarantee
AtomicityAll operations succeed or all are rolled back — no partial updates
ConsistencyA transaction brings the DB from one valid state to another, respecting all constraints
IsolationConcurrent transactions don't interfere with each other (appear to execute serially)
DurabilityOnce committed, a transaction's effects persist even if the system crashes immediately after
-- Classic bank transfer: debit one account, credit another
BEGIN;
  UPDATE accounts SET balance = balance - 500 WHERE id = 1;  -- debit
  UPDATE accounts SET balance = balance + 500 WHERE id = 2;  -- credit
COMMIT;
-- If anything fails between BEGIN and COMMIT, ROLLBACK reverts both changes

Isolation Levels

LevelDirty ReadNon-repeatable ReadPhantom Read
Read UncommittedYesYesYes
Read CommittedNoYesYes
Repeatable ReadNoNoYes
SerializableNoNoNo

Indexing

An index is a data structure that allows the DB to find rows matching a condition without scanning the entire table. Without an index, a query like WHERE email = 'a@b.com' scans every row — O(n). With an index on email, it's O(log n).

Most databases use B-tree indexes by default — balanced tree, good for equality and range queries. Hash indexes are O(1) for equality but useless for ranges. Full-text indexes enable efficient text search.

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);  -- composite index

-- Index usage rules:
-- 1. Index on columns used in WHERE, JOIN, ORDER BY, GROUP BY
-- 2. Composite index: leftmost prefix rule — (a, b, c) helps WHERE a=? and WHERE a=? AND b=?
--    but NOT WHERE b=? alone
-- 3. SELECT * forces table scan even with index; select specific columns
-- 4. Low-cardinality columns (booleans) — not worth indexing; DB will skip it

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 42;  -- inspect query plan

NoSQL Overview

NoSQL databases trade relational constraints for scalability, flexibility, or specific query patterns. There is no single NoSQL model — the term covers four distinct paradigms.

TypeModelExamplesBest for
DocumentJSON-like documentsMongoDB, FirestoreFlexible schemas, nested data
Key-ValueHash map at scaleRedis, DynamoDBCaching, sessions, leaderboards
Column-FamilyRows with dynamic columnsCassandra, HBaseWrite-heavy, time-series, wide tables
GraphNodes and edgesNeo4j, NeptuneSocial graphs, recommendation engines

Key Takeaways

  • Understand the four JOIN types — INNER, LEFT, RIGHT, FULL OUTER. LEFT JOIN is the most commonly used.
  • Normalise to 3NF for write-heavy systems; denormalise strategically for read performance.
  • ACID transactions prevent data corruption in concurrent systems. Know what each letter means.
  • Index on columns used in WHERE, JOIN, ORDER BY. Composite indexes follow the leftmost prefix rule.
  • NoSQL is not a replacement for SQL — pick based on data model and access pattern, not hype.