Data Lake
A data lake is a storage system that holds large volumes of raw data in its native format, structured or unstructured, until it is needed. It complements a data warehouse by retaining flexible, low-cost raw data for analytics, machine learning, and AI.
Where a warehouse stores cleaned, structured data for fast querying, a lake keeps raw events and varied formats, which is useful for data science and model training.
Many organizations run both, raw data in the lake and resolved, modeled data in the warehouse, so they keep full history and a clean, queryable customer view.
Related terms
Put the vocabulary to work on your data
Get a free First-Party Data Readiness Review, or browse the full glossary and guide library.