Reference

Technical Terms

Clear definitions and explanations for indexing, databases, distributed systems, and backend concepts — with use cases and trade-offs.

69 terms

ACID

ACID is an acronym for database transaction properties: Atomicity, Consistency, Isolation, Durability.

Read definition

B-tree

A B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic…

Read definition

Backpressure

Backpressure is a technique to prevent system overload by signaling to upstream components to slow down when downstream is overwhelmed.

Read definition

BASE

BASE is an acronym contrasting ACID for distributed systems: Basically Available, Soft state, Eventually consistent. It describes a model where availability is…

Read definition

Bloom Filter

A Bloom filter is a space-efficient probabilistic data structure to test whether an element is in a set. It can tell “possibly in set” or “definitely not…

Read definition

Caching

Caching stores frequently accessed data in a faster storage (often memory) to speed up retrieval.

Read definition

CAP Theorem

The CAP theorem (Brewer’s theorem) states that in the presence of a network partition, a distributed data store can only guarantee two of the following three:…

Read definition

CDC (Change Data Capture)

CDC is the process of tracking and capturing changes in a database and delivering change events (inserts/updates/deletes) to downstream systems.

Read definition

Checkpointing

Checkpointing is the process of flushing all committed changes from the write-ahead log (WAL) to the main database storage, and then clearing the…

Read definition

Circuit Breaker

A circuit breaker is a pattern where a system stops sending requests to a failing service for a period, allowing it to recover and preventing resource…

Read definition

Clustering

In databases, clustering can refer to either grouping similar data together or deploying multiple nodes for scalability/HA. A database cluster is a set of…

Read definition

Columnar vs Row Storage

Columnar storage stores data column-by-column (all values of one column together). Row-based storage stores data row-by-row (complete records…

Read definition

Compaction

Compaction is the process of merging and recycling data files to reclaim space and improve read performance, especially in Log-Structured Merge (LSM) tree…

Read definition

Composite Index

A composite index (multi-column index) indexes multiple columns together in one index. It creates a single B-tree keyed by the tuple of column…

Read definition

Connection Pooling

Connection pooling maintains a pool of open database connections that can be reused, reducing the overhead of opening/closing connections.

Read definition

Consensus

Consensus refers to the process whereby a group of distributed nodes agree on a single value or decision. It is critical in systems that need consistency…

Read definition

Consistent Hashing

Consistent hashing is a technique to distribute keys across nodes so that minimal keys need to be moved when nodes join/leave.

Read definition

Covering Index

A covering index is a (secondary or composite) index that includes all columns needed for a query. The index “covers” the query, so the database can…

Read definition

Data Lake

A data lake is a centralized repository that stores large volumes of raw data (structured, semi-structured, unstructured) in its native format, usually in a…

Read definition

Data Locality

Data locality is the principle of placing computation (queries, processing) close to where the data resides, rather than moving large data sets across the…

Read definition

Data Migration

Data migration involves moving data between systems or formats (e.g., during an upgrade or platform change).

Read definition

Data Skew

Data skew is an uneven distribution of data (or workload) across partitions or nodes. Some partitions get more data/traffic than others.

Read definition

Data Warehouse

A data warehouse is a centralized repository that stores curated, cleaned, and structured data optimized for queries and reporting, typically in a columnar…

Read definition

Deadlock

A deadlock occurs when two or more transactions block each other indefinitely, each holding a resource the other needs.

Read definition

Denormalization

Denormalization is the process of intentionally adding redundancy to a database schema to improve read/query performance. In a denormalized design, data…

Read definition

Dirty Read

A dirty read is when a transaction reads data written by another uncommitted transaction. If that other transaction rolls back, the reading transaction got…

Read definition

Eventual Consistency

Eventual consistency is a weak consistency model where if no new updates are made, all replicas of data will eventually converge to the same…

Read definition

Exactly-Once Semantics

Guarantees that an operation (like processing a message) is performed only once, despite retries and failures.

Read definition

Failover

Failover is the automatic or manual switching of operations from a failed primary system to a secondary (standby) system.

Read definition

Federation

Federation refers to a setup where multiple autonomous databases or services coordinate to respond to queries, often by delegating parts of a query to…

Read definition

Geo-Replication

Geo-replication is replicating data across geographically distributed data centers.

Read definition

Hash Partitioning

Hash partitioning assigns data to shards by hashing a key and using the hash result to pick a partition.

Read definition

High Availability (HA)

High availability means a system is continuously operational and accessible with minimal downtime (often 99.99% uptime or better).

Read definition

Horizontal Scaling

Horizontal scaling (scale-out) means adding more machines (nodes) to a system to handle increased load.

Read definition

Hot Partition (Hot Shard)

A hot partition (or hot shard/key) is a data partition that receives disproportionately high traffic, causing resource saturation.

Read definition

Idempotency

Idempotency means an operation can be applied multiple times without changing the result beyond the initial application. In distributed systems, it ensures…

Read definition

Indexing

A database index is a data structure that improves the speed of data retrieval on a table at the cost of additional writes and storage. It works like a book’s…

Read definition

Leader–Follower Replication

Leader–Follower (also called master-slave) replication is a strategy where one node (the leader) receives all write operations, and one or more follower…

Read definition

Load Balancing

Load balancing is distributing incoming requests or tasks across multiple servers/resources to optimize performance and avoid overload.

Read definition

Lock Escalation

Lock escalation is the process of converting many fine-grained locks (like row or page locks) into a coarser lock (like a table lock) to reduce lock management…

Read definition

Logical vs Physical Replication

Logical replication replicates database changes at a logical level (e.g. SQL statements or row change events). Physical replication copies the exact data files…

Read definition

LSM Tree

A Log-Structured Merge-tree (LSM tree) is a data structure optimized for high write volumes. It buffers writes in memory and periodically merges them to…

Read definition

Materialized Views

A materialized view is a database object that stores the result of a query as a physical table. Unlike a regular (virtual) view, it contains actual…

Read definition

Multi-Leader Replication

Multi-leader (also called multi-master) replication allows multiple nodes (leaders) to accept writes. Data is replicated between leaders to keep them in…

Read definition

MVCC (Multi-Version Concurrency Control)

MVCC is a concurrency control method that keeps multiple versions of data records to allow non-blocking reads. Instead of locking, transactions see the…

Read definition

Normalization

Normalization is the process of organizing database schema to reduce redundancy and improve integrity. Typically, data is divided into multiple tables…

Read definition

Optimistic Locking

Optimistic locking is a concurrency control strategy where a transaction proceeds without locking resources, checking for conflicts only at commit. Typically…

Read definition

Pessimistic Locking

Pessimistic locking is the strategy of locking data resources before accessing them to prevent conflicts. A transaction acquires locks on rows it will read or…

Read definition

Phantom Read

A phantom read occurs when a transaction re-reads rows matching a condition and finds new rows that were inserted or deleted by another transaction after the…

Read definition

Query Planner (Cost-Based Optimizer)

The query planner/optimizer is a component that determines the most efficient way to execute a database query. A cost-based optimizer (CBO) estimates the…

Read definition

Quorum

In distributed systems, a quorum is the minimum number of votes (or nodes) that must agree to perform an operation. It is a technique to ensure consistency…

Read definition

Range Partitioning

Range partitioning divides data by ordered key ranges. Each shard holds a contiguous range of key values.

Read definition

Read Replicas

A read replica is a read-only copy of a database instance. The primary database handles writes; read replicas asynchronously replicate data from the primary…

Read definition

Read Skew

Read skew is an anomaly where a transaction reads outdated data in one table or column while reading updated related data in another, leading to inconsistency…

Read definition

Rebalancing (Data Rebalancing)

Rebalancing is redistributing data across nodes or partitions to ensure even load and resource utilization.

Read definition

Resharding

Resharding is changing the number or configuration of shards (data partitions) in a sharded database, moving data between shards accordingly.

Read definition

Schema Evolution

Schema evolution refers to managing changes to the database schema over time without disrupting services.

Read definition

Schema Registry

A schema registry is a centralized service to store and enforce data schema versions for messages/events (common in streaming systems like Kafka).

Read definition

Secondary Indexes

A secondary index is any index on a table column that is not the primary key. It is stored separately from the table rows.

Read definition

Snapshot Isolation

Snapshot isolation is a transaction isolation level where each transaction operates on a snapshot of the database as of the start of the transaction.…

Read definition

Split-Brain

Split-brain is a failure scenario in clustered systems where network failures cause two (or more) segments of the cluster to believe they are the sole primary…

Read definition

Strong Consistency

Strong consistency means every read receives the most recent write (or an error). All nodes see the same data at the same time.

Read definition

Three-Phase Commit (3PC)

Three-phase commit is an extension of 2PC designed to reduce blocking. It adds a “pre-commit” phase so participants can make a safe decision without indefinite…

Read definition

Throttling and Rate Limiting

Throttling controls resource usage by limiting the rate of requests. Rate limiting enforces a maximum number of requests in a time window.

Read definition

Time-Series Partitioning

Partitioning data by time (e.g., daily/weekly/monthly partitions). Each partition holds data for a time range.

Read definition

Two-Phase Commit (2PC)

Two-phase commit is an atomic commitment protocol for distributed transactions. A coordinator asks all involved nodes (participants) whether they can commit.…

Read definition

Vertical Scaling

Vertical scaling (scale-up) is increasing the resources (CPU, RAM, disk) of a single machine.

Read definition

Write Skew

Write skew is an anomaly where two concurrent transactions each read overlapping data and then write non-overlapping fields, resulting in an overall invalid…

Read definition

Write-Ahead Logging (WAL)

Write-Ahead Logging is a technique where changes are first recorded in a log before being applied to the database files. It ensures atomicity and…

Read definition

Loading…