top of page

Data Management Glossary of Terms

​
Algorithm - A set of instructions or rules designed to perform a specific task or solve a problem.  
​
Analytics - The systematic computational data analysis, often used to discover insights and support decision-making.  
 
Artificial Intelligence (AI) - The simulation of human intelligence in machines programmed to think, learn, and adapt like humans.  
 
Big Data - Vast data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.  
 
Bias (in AI) - A tendency in AI models to produce systematically prejudiced results due to incorrect assumptions in the machine learning process.  
 
Clustering - A machine learning technique that groups data points with similar characteristics.  
 
Cloud Computing - The delivery of computing services (servers, storage, databases, networking, software, etc.) over the
Internet ("the cloud") to offer faster innovation, flexible resources, and economies of scale.  
 
Data Analytics - The science of analyzing raw data to make conclusions about that information, often with the help of software and algorithms.  
 
Data Architecture - A set of standards and models used to organize, manage, and store data to support an organization's needs, including the design of databases and data flows.  
 
Data Cleansing - Identifying and correcting (or removing) inaccurate records from a database to ensure data quality.  
 
Data Culture - The collective mindset and behaviors within an organization that promote data-driven decision-making and the strategic use of data as a valuable asset.  
 
Data Governance - The overall management of data availability, usability, integrity, and security in an organization.  
 
Data Lake - An extensive repository of raw, unstructured, and structured data that can be processed and analyzed for various purposes.  
 
Data Management - The practice of collecting, keeping, and using data securely, efficiently, and cost-effectively, ensuring data quality and compliance.  
 
Data Mart - A subset of a data warehouse designed to focus on specific business functions or departments, providing faster access to relevant data.  
 
Data Mining - The process of discovering patterns and knowledge from large amounts of data using statistical, machine learning, and AI techniques.  
 
Data Science - An interdisciplinary field that uses scientific methods, algorithms, and systems to extract insights and knowledge from structured and unstructured data.  
 
Data Warehouse - A centralized repository for structured data that is used for reporting and analysis, typically involving ETL processes and star or snowflake schemas.  
 
Deep Learning - A subset of machine learning involving neural networks with many layers, typically used for complex pattern recognition tasks like image and speech recognition.  
 
Dimension - A structure in a star schema or snowflake schema that categorizes facts and measures to enable users to answer business questions. Dimensions often contain descriptive information about the data.  
 
ETL (Extract, Transform, Load) - A process used in data warehousing to extract data from various sources, transform it into a suitable format, and load it into a destination database.  
 
Fact Table - A central table in a star schema or snowflake schema of a data warehouse that stores quantitative data for analysis and is surrounded by dimension tables.  
 
Foreign Key (FK) - A field (or collection of fields) in one table that uniquely identifies a row of another table, creating a relationship between the two tables.  
 
Index - A database structure that improves the speed of data retrieval on a table or view by providing quick lookup capabilities.  
 
Internet of Things (IoT) - A network of physical objects (devices, vehicles, buildings) embedded with sensors, software, and other technologies to connect and exchange data with other devices and systems over the internet.  
 
Join (SQL) - An SQL operation that combines rows from two or more tables based on a related column between them.  
 
Key-Value Store - A type of NoSQL database that stores data as key-value pairs, where each key is unique, and its associated value can be a string, JSON object, or other types.  
 
Machine Learning - A branch of AI that involves training algorithms to learn from and make predictions or decisions based on data.  
 
Metadata - Data that provides information about other data, such as details about data origin, structure, or context.  
 
Neural Network - A series of algorithms, modeled after the human brain, that attempts to recognize underlying relationships in a set of data through a process that mimics how the human brain operates.  

Normalization - A process in database design that organizes columns and tables of a database to reduce data redundancy and improve data integrity.  
 
OLAP (Online Analytical Processing) - A category of software tools that provide analysis of data stored in a database, often used in business intelligence applications for multidimensional queries.  
 
Overfitting - A modeling error in machine learning where a model is too closely fitted to the specific data set, capturing noise instead of underlying patterns, and performs poorly on new data.  
 
Primary Key (PK) - A unique identifier for a row within a table. Each table should have one primary key, which cannot contain NULL values.  
 
Predictive Analytics - The use of historical data, statistical algorithms, and machine learning techniques to predict future outcomes.  
 
Privacy by Design - An approach that ensures privacy and data protection are built into systems and processes from the outset.  
 
Query - A request for data or information from a database, often written in SQL (Structured Query Language).  
 
Regression - A statistical method used in data science and machine learning to model and analyze the relationships between variables.  
 
Snowflake Schema - A type of database schema used in data warehousing, where dimension tables are normalized, splitting data into additional tables to reduce redundancy.  
 
Star Schema - A type of database schema used in data warehousing where a central fact table is surrounded by dimension tables, creating a star-like structure.  
 
Structured Data - Data that is organized in a predefined format (e.g., rows and columns in databases), making it easy to search and analyze.  
 
Supervised Learning - A type of machine learning where a model is trained on labeled data (data that contains the correct output) and learns to predict outputs from new input data.  
 
Text Mining - The process of deriving meaningful information from text data, often using techniques such as natural language processing (NLP).  
 
Training Data - The dataset used to train a machine learning algorithm to make accurate predictions or decisions.  
 
Unstructured Data - Data that is not organized in a predefined way, such as text, images, and videos, making it harder to analyze.  

bottom of page