Database Glossary
1NF (First Normal Form)
In a database, 1NF means each table cell contains a single value, eliminating repeating groups or arrays. This basic rule ensures data integrity and simplicity.
2NF (Second Normal Form)
2NF builds on 1NF, ensuring each non-key attribute depends on the entire primary key. This reduces data redundancy and improves data consistency.
3NF (Third Normal Form)
In 3NF, a table is in 2NF and has no transitive dependencies, where a non-key attribute depends on another non-key attribute. This normalization level enhances data integrity and reduces redundancy.
4NF (Fourth Normal Form)
4NF eliminates multi-level dependencies, where a non-key attribute depends on another non-key attribute, which in turn depends on the primary key. This advanced normalization level ensures data consistency and reduces redundancy.
ACID (Atomicity, Consistency, Isolation, Durability)
ACID is a set of database transaction properties that ensure reliability and security. It guarantees atomicity (all-or-nothing), consistency (data integrity), isolation (separate transactions), and durability (permanent storage).
ADDM (Automatic Database Management)
ADDM is a feature that automates database management tasks, such as performance tuning and troubleshooting, to improve database efficiency and reduce administrative burdens.
ADR (Active Data Replication)
ADR is a data replication technique that ensures real-time data consistency across multiple databases, improving data availability and reducing latency.
ADR base
An ADR base is a centralized repository that stores metadata and configuration information for active data replication.
Advanced index compression
Advanced index compression is a technique that reduces the storage size of indexes, improving query performance and reducing storage costs.
Advanced row compression
Advanced row compression is a technique that reduces the storage size of rows, improving query performance and reducing storage costs.
Alert log
An alert log is a database log that records critical events, such as errors and warnings, to help administrators troubleshoot and resolve issues.
Analytic function
An analytic function is a database function that performs complex calculations and data analysis, such as aggregations and rankings.
Analytic query
An analytic query is a database query that performs complex data analysis, such as aggregations and rankings, to support business intelligence and decision-making.
Antijoin
An antijoin is a database operation that returns all rows from one table that do not have matching rows in another table.
Access control list (ACL)
An ACL is a list of permissions that define access rights to a database or its objects, ensuring secure data access and management.
Active record ORM
Active record ORM is an object-relational mapping technique that simplifies database interactions by providing a direct mapping between database tables and application objects.
Anti-caching
Anti-caching is a technique that prevents caching of frequently updated data, ensuring that applications always retrieve the latest data from the database.
Atomicity
Atomicity is a database property that ensures that database transactions are executed as a single, indivisible unit, maintaining data consistency and integrity.
Attributes
Attributes are columns or fields in a database table that define the structure and properties of the data.
Authentication
Authentication is the process of verifying the identity of users or applications before granting access to a database or its objects.
Authorization
Authorization is the process of determining the access rights and permissions of users or applications to a database or its objects.
Availability
Availability refers to the ability of a database to provide access to data and perform operations without interruption or downtime.
BASE
BASE is an acronym that stands for Basic Availability, Soft-state, and Eventual consistency, which are the key principles of a distributed database system.
BCNF (Boyce-Codd Normal Form)
BCNF is a database normalization technique that ensures each table cell contains a single value, eliminating transitive dependencies and improving data integrity.
Blue-green deployments
Blue-green deployments are a deployment strategy that involves running two identical production environments, one with the new version and one with the old version, to minimize downtime and ensure smooth rollbacks.
Bottleneck
A bottleneck is a performance constraint in a database system that slows down data processing and retrieval, often caused by inadequate resources or inefficient queries.
CAP theorem
The CAP theorem states that it is impossible for a distributed database system to simultaneously guarantee Consistency, Availability, and Partition tolerance.
CRUD (Create, Read, Update, Delete)
CRUD is an acronym that represents the four basic operations that can be performed on data in a database: Create, Read, Update, and Delete.
Cache
A cache is a temporary storage area that holds frequently accessed data to improve performance and reduce latency.
Cache-aside
Cache-aside is a caching strategy that stores data in a cache layer separate from the main database, improving performance and reducing latency.
Cache invalidation
Cache invalidation is the process of removing outdated or invalid data from a cache to ensure that the cache remains consistent with the main database.
Canary releases
Canary releases are a deployment strategy that involves releasing a new version of a database system to a small group of users before rolling it out to the entire user base.
Candidate key
A candidate key is a column or set of columns in a database table that can uniquely identify each row.
Cascade
Cascade is a database operation that deletes or updates related rows in a table when a row is deleted or updated.
Cassandra
Cassandra is a distributed NoSQL database system that provides high availability and scalability for large-scale data storage and processing.
Check constraint
A check constraint is a rule that ensures data integrity by restricting the values that can be entered into a column or table.
Cluster
A cluster is a group of computers or nodes that work together to provide high availability and scalability for a database system.
Collation
Collation is the process of defining the order and sorting of characters in a database system, often used for language-specific sorting and searching.
Collections
Collections are groups of data objects that are stored and managed together in a database system.
Column
A column is a vertical arrangement of cells in a database table that contains a specific piece of information.
Column database
A column database is a type of database that stores data in columns instead of rows, optimized for analytical workloads.
Column family
A column family is a group of columns in a column database that are stored together on disk, improving query performance.
Command query responsibility segregation
Command query responsibility segregation is a design pattern that separates commands (write operations) from queries (read operations) to improve system scalability and maintainability.
Commit
Commit is a database operation that saves changes to a transaction, making them permanent and visible to other users.
Composite key
A composite key is a primary key that consists of multiple columns, used to uniquely identify each row in a database table.
Concurrency
Concurrency is the ability of a database system to handle multiple transactions simultaneously, improving system performance and responsiveness.
Connection pooling
Connection pooling is a technique that reuses existing database connections to improve system performance and reduce overhead.
Consistency
Consistency is a database property that ensures that data is accurate and reliable, often achieved through transactions and locking mechanisms.
Constraint
A constraint is a rule that ensures data integrity by restricting the values that can be entered into a column or table.
Cursor
A cursor is a control structure that enables traversal over the records in a database table, often used for iterative processing.
Dark launching
Dark launching is a deployment strategy that involves releasing a new feature or functionality to a small group of users without announcing it publicly.
Data
Data is the information stored in a database system, often in the form of tables, rows, and columns.
Data definition language (DDL)
DDL is a language used to define the structure and organization of a database, including tables, indexes, and relationships.
Data independence
Data independence is the ability of a database system to change its physical storage or data structure without affecting the application code.
Data mapper ORM
Data mapper ORM is an object-relational mapping technique that provides a layer of abstraction between the application code and the database, simplifying data access and manipulation.
Data type
A data type is a classification of data that determines its format, range, and behavior, such as integer, string, or date.
Database
A database is a collection of organized data that is stored in a way that allows for efficient retrieval and manipulation.
Database abstraction layer
A database abstraction layer is a software layer that provides a uniform interface to different database systems, simplifying database access and manipulation.
Database administrator (DBA)
A DBA is a person responsible for designing, implementing, and maintaining a database system, ensuring its performance, security, and reliability.
Database engine
A database engine is the core software component that manages and processes data in a database system, providing query optimization, indexing, and caching.
Database management system (DBMS)
A DBMS is a software system that allows users to define, create, maintain, and manipulate databases, providing a layer of abstraction between the user and the physical storage.
Database model
A database model is a conceptual representation of a database system, including its structure, relationships, and constraints.
Database proxy
A database proxy is a software layer that sits between the application code and the database, providing caching, security, and performance enhancements.
Dataset
A dataset is a collection of data, often used for machine learning, data analysis, or data visualization.
Denormalization
Denormalization is the process of intentionally denormalizing a database schema to improve performance, often by duplicating data or using summary tables.
Dirty read
A dirty read is a phenomenon where a transaction reads data that has not been committed by another transaction, potentially leading to inconsistent data.
Distributed database
A distributed database is a database system that is spread across multiple physical locations, often to improve performance, scalability, and availability.
Document
A document is a self-contained piece of data, often in a JSON or XML format, used in NoSQL databases and document-oriented databases.
Document database
A document database is a type of NoSQL database that stores data in documents, often in a JSON or XML format.
Durability
Durability is a database property that ensures that once a transaction has been committed, its effects are permanent and cannot be rolled back.
Encoding
Encoding is the process of converting data into a format that can be stored or transmitted efficiently, often using techniques such as compression or encryption.
Encrypted transport
Encrypted transport is a method of securing data in transit by encrypting it, often using protocols such as SSL/TLS or HTTPS.
Ephemerality
Ephemerality is a property of data that is temporary or short-lived, often used in caching or messaging systems.
Ephemeral storage
Ephemeral storage is a type of storage that is temporary or short-lived, often used in caching or messaging systems.
Eventual consistency
Eventual consistency is a consistency model that allows for temporary inconsistencies in a distributed system, often used in NoSQL databases.
Eviction
Eviction is the process of removing data from a cache or other temporary storage, often due to memory constraints or expiration policies.
Entity
An entity is a thing or concept that has existence and can be described with a set of attributes or properties, often used in data modeling.
Expand and contract pattern
The expand and contract pattern is a design pattern that allows for flexible and scalable data modeling, often used in NoSQL databases.
Extract-transform-load (ETL)
ETL is a process of extracting data from multiple sources, transforming it into a standardized format, and loading it into a target system, often used in data warehousing.
Feature flags
Feature flags are a technique used to toggle the availability of features or functionality in a system, often used in agile development and continuous integration.
Field
A field is a single element of data in a database or data structure, often corresponding to a column or attribute.
Flat-file database
A flat-file database is a type of database that stores data in a plain text file, often used for simple data storage and retrieval.
Functional dependency
Functional dependency is a relationship between two attributes in a database, where the value of one attribute determines the value of another.
Foreign key
A foreign key is a field in a database table that refers to the primary key of another table, establishing a relationship between the two tables.
Full-text search
Full-text search is a technique used to search for specific words or phrases within a large body of text, often used in search engines and document databases.
Graph database
A graph database is a type of database that stores data as nodes and edges, often used to model complex relationships and networks.
GraphQL
GraphQL is a query language for APIs that allows for flexible and efficient data retrieval, often used in modern web and mobile applications.
HTAP database
HTAP (Hybrid Transactional and Analytical Processing) database is a type of database that supports both transactional and analytical workloads, often used in real-time analytics and reporting.
Hierarchical database
A hierarchical database is a type of database that organizes data in a tree-like structure, often used in file systems and document management systems.
Horizontal scaling
Horizontal scaling is a technique used to increase the capacity of a database system by adding more nodes or servers, often used in distributed databases and cloud computing.
Hot backup
A hot backup is a type of backup that is performed while the database is still online and available, often used in high-availability and disaster recovery scenarios.
In-memory database
An in-memory database is a type of database that stores data in RAM (Random Access Memory) instead of disk storage, often used for high-performance and real-time applications.
Index
An index is a data structure that improves the speed of data retrieval by providing a quick way to locate specific data, often used in databases and file systems.
Ingesting
Ingesting is the process of loading data into a database or data warehouse, often used in data integration and ETL (Extract, Transform, Load) processes.
Inner join
An inner join is a type of database join that returns only the rows that have matching values in both tables, often used in data integration and reporting.
Isolation
Isolation is a database property that ensures that each transaction operates independently, without interference from other transactions.
Isolation levels
Isolation levels are a set of rules that define the degree of isolation between transactions, often used to balance concurrency and consistency.
Join
A join is a database operation that combines rows from two or more tables, based on a common column or condition.
Key
A key is a column or set of columns in a database table that uniquely identifies each row, often used as a primary key or foreign key.
Key-value database
A key-value database is a type of NoSQL database that stores data as a collection of key-value pairs, often used for simple data storage and retrieval.
Left join
A left join is a type of database join that returns all rows from the left table, and the matching rows from the right table, if any.
Lexeme
A lexeme is a unit of language, such as a word or phrase, often used in natural language processing and text analysis.
Locale
A locale is a set of cultural and linguistic preferences, such as language, currency, and date format, often used in internationalization and localization.
Lock
A lock is a mechanism used to synchronize access to a shared resource, such as a database table or row, often used to prevent concurrent updates.
MariaDB
MariaDB is a relational database management system, forked from MySQL, often used for web and cloud applications.
Microservice architecture
Microservice architecture is a software design pattern that structures an application as a collection of small, independent services, often used in cloud-native and distributed systems.
Migration (database, schema)
Migration is the process of transferring data from one database or schema to another, often used in database upgrades and refactoring.
MongoDB
MongoDB is a NoSQL document-oriented database, often used for big data and real-time web applications.
Monolithic architecture
Monolithic architecture is a software design pattern that structures an application as a single, self-contained unit, often used in traditional and legacy systems.
Multiversion concurrency control (MVCC)
MVCC is a concurrency control mechanism that allows multiple versions of data to coexist, often used in databases and file systems.
MySQL
MySQL is a relational database management system, often used for web and cloud applications.
Neo4j
Neo4j is a graph database, often used for modeling complex relationships and networks.
Network database
A network database is a type of database that stores data as a network of interconnected nodes, often used in social media and recommendation systems.
NewSQL
NewSQL is a category of relational databases that provide high performance and scalability, often used in big data and real-time analytics.
NoSQL
NoSQL is a category of databases that do not use the traditional relational model, often used in big data and real-time web applications.
Node
A node is a single point in a network or graph, often used in graph databases and social media analysis.
Nonrepeatable read
A nonrepeatable read is a phenomenon where a transaction reads data that has been modified by another transaction, potentially leading to inconsistent data.
Normalization
Normalization is the process of organizing data in a database to minimize data redundancy and improve data integrity.
OLAP database
An OLAP (Online Analytical Processing) database is a type of database designed for analytical workloads, often used in business intelligence and data warehousing.
OLTP database
An OLTP (Online Transactional Processing) database is a type of database designed for transactional workloads, often used in e-commerce and financial applications.
ORM
ORM (Object-Relational Mapping) is a technique that maps objects in an application to tables in a relational database, often used in software development.
Object relational impedance mismatch
Object relational impedance mismatch is a problem that occurs when trying to map objects in an application to tables in a relational database, often due to differences in data modeling and querying.
Optimistic concurrency control
Optimistic concurrency control is a technique that allows multiple transactions to access the same data simultaneously, without locking, often used in high-availability and distributed systems.
Outer join
An outer join is a type of database join that returns all rows from both tables, with null values in the columns where there are no matches.
Parameterized query
A parameterized query is a query that uses placeholders for values, often used to prevent SQL injection attacks and improve performance.
Persistence
Persistence is the ability of a system to store and retrieve data, often used in databases and file systems.
Persistent storage
Persistent storage is a type of storage that retains data even when power is turned off, often used in hard drives and solid-state drives.
Pessimistic concurrency control
Pessimistic concurrency control is a technique that locks data to prevent concurrent updates, often used in high-availability and distributed systems.
Phantom read
A phantom read is a phenomenon where a transaction reads data that has been inserted or deleted by another transaction, potentially leading to inconsistent data.
PostgreSQL
PostgreSQL is a relational database management system, often used for web and cloud applications.
Precision (searching)
Precision is a measure of the accuracy of a search result, often used in information retrieval and search engines.
Primary key
A primary key is a column or set of columns in a database table that uniquely identifies each row, often used as a unique identifier.
Query
A query is a request for specific data or information, often used in databases and search engines.
Query builder
A query builder is a tool or library that helps construct database queries, often used in software development.
Query language
A query language is a language used to define and execute database queries, often used in relational databases and NoSQL databases.
Query planner
A query planner is a component of a database system that optimizes and executes database queries, often used in relational databases and NoSQL databases.
Raft consensus algorithm
Raft is a consensus algorithm used in distributed systems to ensure consistency and fault tolerance, often used in cloud-native and distributed databases.
Read committed isolation level
Read committed is an isolation level that ensures that a transaction sees only committed data, often used in relational databases and NoSQL databases.
Read operation
A read operation is a database operation that retrieves data from a database, often used in queries and reports.
Read-through caching
Read-through caching is a caching strategy that retrieves data from the cache or the underlying database, often used in high-performance and real-time applications.
Read uncommitted isolation level
Read uncommitted is an isolation level that allows a transaction to see uncommitted data, often used in relational databases and NoSQL databases.
Recall
Recall is a measure of the proportion of relevant data that is retrieved in a search result, often used in information retrieval and search engines.
Record
A record is a single entry in a database table, often consisting of multiple fields or columns.
Redis
Redis is an in-memory data store that can be used as a database, cache, or message broker, often used in real-time web applications and microservices.
Relational database
A relational database is a type of database that organizes data into tables with defined relationships, often used in enterprise applications and data warehousing.
Relational database management system (RDBMS)
An RDBMS is a software system that manages and provides access to a relational database, often used in enterprise applications and data warehousing.
Repeatable read isolation level
Repeatable read is an isolation level that ensures that a transaction sees a consistent view of the data, even if other transactions are modifying the data, often used in relational databases and NoSQL databases.
Replication
Replication is the process of creating multiple copies of data, often used in distributed systems and high-availability architectures.
Right join
A right join is a type of database join that returns all rows from the right table, and the matching rows from the left table, if any.
Role-based access control (RBAC)
RBAC is a security model that grants access to resources based on a user’s role, often used in enterprise applications and cloud computing.
Row
A row is a single entry in a database table, often consisting of multiple fields or columns.
Serial scanning
Serial scanning is a technique used to retrieve data from a database, often used in relational databases and NoSQL databases.
Sentinel Value
A sentinel value is a special value used to indicate the end of a data structure or the absence of data, often used in programming languages and data processing.
SQL
SQL (Structured Query Language) is a language used to manage and manipulate data in relational databases, often used in enterprise applications and data warehousing.
SQL injection
SQL injection is a type of security vulnerability that occurs when an attacker injects malicious SQL code into a database, often used in web applications and cloud computing.
SQLite
SQLite is a self-contained, file-based relational database management system, often used in mobile and embedded systems.
Sanitizing input
Sanitizing input is the process of cleaning and validating user input data to prevent security vulnerabilities and ensure data integrity.
Scaling
Scaling refers to the ability of a system to handle increased load or demand, often used in cloud computing and distributed systems.
Schema
A schema is a blueprint or structure of a database, defining the relationships between tables, columns, and data types.
Serialization
Serialization is the process of converting data into a format that can be stored or transmitted, often used in data processing and communication.
Server
A server is a computer or device that provides services or resources to other computers or devices over a network, often used in cloud computing and distributed systems.
Service-oriented architecture (SOA)
SOA is a software design pattern that structures an application as a collection of services, often used in cloud-native and distributed systems.
Shard
A shard is a horizontal partition of a database, often used in distributed databases and cloud computing.
Stale data
Stale data is data that is outdated or no longer valid, often used in caching and data processing.
Standard column family
A standard column family is a type of column family in a column-family NoSQL database, often used in big data and real-time analytics.
Stemming
Stemming is a process of reducing words to their base form, often used in natural language processing and text analysis.
Stop words
Stop words are common words that are ignored in search queries and text analysis, often used in information retrieval and search engines.
Storage engine
A storage engine is a component of a database system that manages data storage and retrieval, often used in relational databases and NoSQL databases.
Stored procedure
A stored procedure is a precompiled SQL program that can be executed on a database, often used in relational databases and data warehousing.
Super column family
A super column family is a type of column family in a column-family NoSQL database, often used in big data and real-time analytics.
Superkey
A superkey is a set of columns that uniquely identifies each row in a database table, often used in relational databases and data modeling.
Table
A table is a collection of related data in a database, often consisting of rows and columns.
Table aliases
Table aliases are temporary names given to tables in a database query, often used in complex queries and data analysis.
Three-tier architecture
Three-tier architecture is a software design pattern that structures an application into three layers: presentation, application, and data, often used in enterprise applications and cloud computing.
Token
A token is a unit of data or a symbol in a programming language, often used in parsing and lexical analysis.
Transaction
A transaction is a sequence of operations that are executed as a single, all-or-nothing unit, often used in relational databases and data processing.
Two-phase commit
Two-phase commit is a protocol used to ensure atomicity and consistency in distributed transactions, often used in cloud computing and distributed systems.
Two-phase locking
A concurrency control mechanism that ensures database transactions are executed in two phases: a growing phase where locks are acquired, and a shrinking phase where locks are released, to prevent data inconsistencies.
Upsert
A database operation that combines the insert and update operations, allowing you to add a new record if it doesn’t exist, or update an existing record if it does.
Value
A single piece of data stored in a database, such as a number, text, or date, that represents a specific attribute or characteristic of an entity.
Vertical scaling
A method of increasing a database’s capacity by adding more resources, such as CPU or memory, to a single server, to improve performance and handle growing workloads.
Vertices
In graph databases, vertices (or nodes) represent entities or objects, connected by edges that define relationships between them, forming a graph structure.
View
A virtual table based on the result of a query, providing a simplified and organized way to access and manipulate data, without physically storing the data.
Volatile storage
A type of storage that loses its data when power is turned off, such as RAM, which is used to temporarily store data during database operations.
Wide-column store
A type of NoSQL database that stores data in a column-family format, optimized for large amounts of data and high-performance queries.
Write-ahead logging (WAL)
A technique used to ensure database consistency by logging all changes before applying them to the database, allowing for efficient recovery in case of a failure.
Wildcard
A special character or symbol used in search queries to represent unknown or variable characters, allowing for flexible and pattern-based searching.
Weight (search)
A numerical value assigned to a search result, indicating its relevance or importance, used to rank and prioritize search results.
Write-around caching
A caching strategy that writes data directly to the underlying storage, bypassing the cache, to ensure data consistency and avoid cache coherence issues.
Write-back caching
A caching strategy that writes data to the cache first, and then lazily writes it to the underlying storage, improving performance but risking data loss in case of a failure.
Write operation
A database operation that modifies or inserts new data into the database, such as an insert, update, or delete operation.
Write-through caching
A caching strategy that writes data to both the cache and the underlying storage simultaneously, ensuring data consistency and improving performance.
XML (eXtensible Markup Language)
A markup language used to store and transport data in a format that is both human-readable and machine-readable, often used for data exchange and integration.
XQuery
A query language used to retrieve and manipulate data stored in XML format, providing a flexible and powerful way to extract and transform data.
Zero-day attack
A cyber attack that exploits a previously unknown vulnerability in a database or system, often before a patch or fix is available, making it challenging to defend against.