Loading…
Loading…
Categories
Popular tags
Reference
Clear definitions and explanations for indexing, databases, distributed systems, and backend concepts — with use cases and trade-offs.
69 terms
ACID is an acronym for database transaction properties: Atomicity, Consistency, Isolation, Durability.
Read definitionA B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic…
Read definitionBackpressure is a technique to prevent system overload by signaling to upstream components to slow down when downstream is overwhelmed.
Read definitionBASE is an acronym contrasting ACID for distributed systems: Basically Available, Soft state, Eventually consistent. It describes a model where availability is…
Read definitionA Bloom filter is a space-efficient probabilistic data structure to test whether an element is in a set. It can tell “possibly in set” or “definitely not…
Read definitionCaching stores frequently accessed data in a faster storage (often memory) to speed up retrieval.
Read definitionThe CAP theorem (Brewer’s theorem) states that in the presence of a network partition, a distributed data store can only guarantee two of the following three:…
Read definitionCDC is the process of tracking and capturing changes in a database and delivering change events (inserts/updates/deletes) to downstream systems.
Read definitionCheckpointing is the process of flushing all committed changes from the write-ahead log (WAL) to the main database storage, and then clearing the…
Read definitionA circuit breaker is a pattern where a system stops sending requests to a failing service for a period, allowing it to recover and preventing resource…
Read definitionIn databases, clustering can refer to either grouping similar data together or deploying multiple nodes for scalability/HA. A database cluster is a set of…
Read definitionColumnar storage stores data column-by-column (all values of one column together). Row-based storage stores data row-by-row (complete records…
Read definitionCompaction is the process of merging and recycling data files to reclaim space and improve read performance, especially in Log-Structured Merge (LSM) tree…
Read definitionA composite index (multi-column index) indexes multiple columns together in one index. It creates a single B-tree keyed by the tuple of column…
Read definitionConnection pooling maintains a pool of open database connections that can be reused, reducing the overhead of opening/closing connections.
Read definitionConsensus refers to the process whereby a group of distributed nodes agree on a single value or decision. It is critical in systems that need consistency…
Read definitionConsistent hashing is a technique to distribute keys across nodes so that minimal keys need to be moved when nodes join/leave.
Read definitionA covering index is a (secondary or composite) index that includes all columns needed for a query. The index “covers” the query, so the database can…
Read definitionA data lake is a centralized repository that stores large volumes of raw data (structured, semi-structured, unstructured) in its native format, usually in a…
Read definitionData locality is the principle of placing computation (queries, processing) close to where the data resides, rather than moving large data sets across the…
Read definitionData migration involves moving data between systems or formats (e.g., during an upgrade or platform change).
Read definitionData skew is an uneven distribution of data (or workload) across partitions or nodes. Some partitions get more data/traffic than others.
Read definitionA data warehouse is a centralized repository that stores curated, cleaned, and structured data optimized for queries and reporting, typically in a columnar…
Read definitionA deadlock occurs when two or more transactions block each other indefinitely, each holding a resource the other needs.
Read definitionDenormalization is the process of intentionally adding redundancy to a database schema to improve read/query performance. In a denormalized design, data…
Read definitionA dirty read is when a transaction reads data written by another uncommitted transaction. If that other transaction rolls back, the reading transaction got…
Read definitionEventual consistency is a weak consistency model where if no new updates are made, all replicas of data will eventually converge to the same…
Read definitionGuarantees that an operation (like processing a message) is performed only once, despite retries and failures.
Read definitionFailover is the automatic or manual switching of operations from a failed primary system to a secondary (standby) system.
Read definitionFederation refers to a setup where multiple autonomous databases or services coordinate to respond to queries, often by delegating parts of a query to…
Read definitionGeo-replication is replicating data across geographically distributed data centers.
Read definitionHash partitioning assigns data to shards by hashing a key and using the hash result to pick a partition.
Read definitionHigh availability means a system is continuously operational and accessible with minimal downtime (often 99.99% uptime or better).
Read definitionHorizontal scaling (scale-out) means adding more machines (nodes) to a system to handle increased load.
Read definitionA hot partition (or hot shard/key) is a data partition that receives disproportionately high traffic, causing resource saturation.
Read definitionIdempotency means an operation can be applied multiple times without changing the result beyond the initial application. In distributed systems, it ensures…
Read definitionA database index is a data structure that improves the speed of data retrieval on a table at the cost of additional writes and storage. It works like a book’s…
Read definitionLeader–Follower (also called master-slave) replication is a strategy where one node (the leader) receives all write operations, and one or more follower…
Read definitionLoad balancing is distributing incoming requests or tasks across multiple servers/resources to optimize performance and avoid overload.
Read definitionLock escalation is the process of converting many fine-grained locks (like row or page locks) into a coarser lock (like a table lock) to reduce lock management…
Read definitionLogical replication replicates database changes at a logical level (e.g. SQL statements or row change events). Physical replication copies the exact data files…
Read definitionA Log-Structured Merge-tree (LSM tree) is a data structure optimized for high write volumes. It buffers writes in memory and periodically merges them to…
Read definitionA materialized view is a database object that stores the result of a query as a physical table. Unlike a regular (virtual) view, it contains actual…
Read definitionMulti-leader (also called multi-master) replication allows multiple nodes (leaders) to accept writes. Data is replicated between leaders to keep them in…
Read definitionMVCC is a concurrency control method that keeps multiple versions of data records to allow non-blocking reads. Instead of locking, transactions see the…
Read definitionNormalization is the process of organizing database schema to reduce redundancy and improve integrity. Typically, data is divided into multiple tables…
Read definitionOptimistic locking is a concurrency control strategy where a transaction proceeds without locking resources, checking for conflicts only at commit. Typically…
Read definitionPessimistic locking is the strategy of locking data resources before accessing them to prevent conflicts. A transaction acquires locks on rows it will read or…
Read definitionA phantom read occurs when a transaction re-reads rows matching a condition and finds new rows that were inserted or deleted by another transaction after the…
Read definitionThe query planner/optimizer is a component that determines the most efficient way to execute a database query. A cost-based optimizer (CBO) estimates the…
Read definitionIn distributed systems, a quorum is the minimum number of votes (or nodes) that must agree to perform an operation. It is a technique to ensure consistency…
Read definitionRange partitioning divides data by ordered key ranges. Each shard holds a contiguous range of key values.
Read definitionA read replica is a read-only copy of a database instance. The primary database handles writes; read replicas asynchronously replicate data from the primary…
Read definitionRead skew is an anomaly where a transaction reads outdated data in one table or column while reading updated related data in another, leading to inconsistency…
Read definitionRebalancing is redistributing data across nodes or partitions to ensure even load and resource utilization.
Read definitionResharding is changing the number or configuration of shards (data partitions) in a sharded database, moving data between shards accordingly.
Read definitionSchema evolution refers to managing changes to the database schema over time without disrupting services.
Read definitionA schema registry is a centralized service to store and enforce data schema versions for messages/events (common in streaming systems like Kafka).
Read definitionA secondary index is any index on a table column that is not the primary key. It is stored separately from the table rows.
Read definitionSnapshot isolation is a transaction isolation level where each transaction operates on a snapshot of the database as of the start of the transaction.…
Read definitionSplit-brain is a failure scenario in clustered systems where network failures cause two (or more) segments of the cluster to believe they are the sole primary…
Read definitionStrong consistency means every read receives the most recent write (or an error). All nodes see the same data at the same time.
Read definitionThree-phase commit is an extension of 2PC designed to reduce blocking. It adds a “pre-commit” phase so participants can make a safe decision without indefinite…
Read definitionThrottling controls resource usage by limiting the rate of requests. Rate limiting enforces a maximum number of requests in a time window.
Read definitionPartitioning data by time (e.g., daily/weekly/monthly partitions). Each partition holds data for a time range.
Read definitionTwo-phase commit is an atomic commitment protocol for distributed transactions. A coordinator asks all involved nodes (participants) whether they can commit.…
Read definitionVertical scaling (scale-up) is increasing the resources (CPU, RAM, disk) of a single machine.
Read definitionWrite skew is an anomaly where two concurrent transactions each read overlapping data and then write non-overlapping fields, resulting in an overall invalid…
Read definitionWrite-Ahead Logging is a technique where changes are first recorded in a log before being applied to the database files. It ensures atomicity and…
Read definition