top of page

Data Management Glossary of Terms

Algorithm - A set of instructions or rules designed to perform a specific task or solve a problem.

Analytics - The systematic computational data analysis, often used to discover insights and support decision-making.

Artificial Intelligence (AI) - The simulation of human intelligence in machines programmed to think, learn, and adapt like humans.

Big Data - Vast data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.

Bias (in AI) - A tendency in AI models to produce systematically prejudiced results due to incorrect assumptions in the machine learning process.

Clustering - A machine learning technique that groups data points with similar characteristics.

Cloud Computing - The delivery of computing services (servers, storage, databases, networking, software, etc.) over the
Internet ("the cloud") to offer faster innovation, flexible resources, and economies of scale.

Data Analytics - The science of analyzing raw data to make conclusions about that information, often with the help of software and algorithms.

Data Architecture - A set of standards and models used to organize, manage, and store data to support an organization's needs, including the design of databases and data flows.

Data Cleansing - Identifying and correcting (or removing) inaccurate records from a database to ensure data quality.

Data Culture - The collective mindset and behaviors within an organization that promote data-driven decision-making and the strategic use of data as a valuable asset.

Data Governance - The overall management of data availability, usability, integrity, and security in an organization.

Data Lake - An extensive repository of raw, unstructured, and structured data that can be processed and analyzed for various purposes.

Data Management - The practice of collecting, keeping, and using data securely, efficiently, and cost-effectively, ensuring data quality and compliance.

Data Mart - A subset of a data warehouse designed to focus on specific business functions or departments, providing faster access to relevant data.

Data Mining - The process of discovering patterns and knowledge from large amounts of data using statistical, machine learning, and AI techniques.

Data Science - An interdisciplinary field that uses scientific methods, algorithms, and systems to extract insights and knowledge from structured and unstructured data.

Data Warehouse - A centralized repository for structured data that is used for reporting and analysis, typically involving ETL processes and star or snowflake schemas.

Deep Learning - A subset of machine learning involving neural networks with many layers, typically used for complex pattern recognition tasks like image and speech recognition.

Dimension - A structure in a star schema or snowflake schema that categorizes facts and measures to enable users to answer business questions. Dimensions often contain descriptive information about the data.

ETL (Extract, Transform, Load) - A process used in data warehousing to extract data from various sources, transform it into a suitable format, and load it into a destination database.

Fact Table - A central table in a star schema or snowflake schema of a data warehouse that stores quantitative data for analysis and is surrounded by dimension tables.

Foreign Key (FK) - A field (or collection of fields) in one table that uniquely identifies a row of another table, creating a relationship between the two tables.

Index - A database structure that improves the speed of data retrieval on a table or view by providing quick lookup capabilities.

Internet of Things (IoT) - A network of physical objects (devices, vehicles, buildings) embedded with sensors, software, and other technologies to connect and exchange data with other devices and systems over the internet.

Join (SQL) - An SQL operation that combines rows from two or more tables based on a related column between them.

Key-Value Store - A type of NoSQL database that stores data as key-value pairs, where each key is unique, and its associated value can be a string, JSON object, or other types.

Machine Learning - A branch of AI that involves training algorithms to learn from and make predictions or decisions based on data.

Metadata - Data that provides information about other data, such as details about data origin, structure, or context.

Neural Network - A series of algorithms, modeled after the human brain, that attempts to recognize underlying relationships in a set of data through a process that mimics how the human brain operates.

Normalization - A process in database design that organizes columns and tables of a database to reduce data redundancy and improve data integrity.

OLAP (Online Analytical Processing) - A category of software tools that provide analysis of data stored in a database, often used in business intelligence applications for multidimensional queries.

Overfitting - A modeling error in machine learning where a model is too closely fitted to the specific data set, capturing noise instead of underlying patterns, and performs poorly on new data.

Primary Key (PK) - A unique identifier for a row within a table. Each table should have one primary key, which cannot contain NULL values.

Predictive Analytics - The use of historical data, statistical algorithms, and machine learning techniques to predict future outcomes.

Privacy by Design - An approach that ensures privacy and data protection are built into systems and processes from the outset.

Query - A request for data or information from a database, often written in SQL (Structured Query Language).

Regression - A statistical method used in data science and machine learning to model and analyze the relationships between variables.

Snowflake Schema - A type of database schema used in data warehousing, where dimension tables are normalized, splitting data into additional tables to reduce redundancy.

Star Schema - A type of database schema used in data warehousing where a central fact table is surrounded by dimension tables, creating a star-like structure.

Structured Data - Data that is organized in a predefined format (e.g., rows and columns in databases), making it easy to search and analyze.

Supervised Learning - A type of machine learning where a model is trained on labeled data (data that contains the correct output) and learns to predict outputs from new input data.

Text Mining - The process of deriving meaningful information from text data, often using techniques such as natural language processing (NLP).

Training Data - The dataset used to train a machine learning algorithm to make accurate predictions or decisions.

Unstructured Data - Data that is not organized in a predefined way, such as text, images, and videos, making it harder to analyze.

bottom of page