Home » Modern Data Lake Workshop

Modern Data Lake Workshop

Architecture, Analytics, and Real-World Use Cases

Main Speaker

Ori Nakar

Learning Tracks

Data

Course ID

42791

Date

01/12/2025

Time

Daily seminar
9:00-16:30

Location

John Bryce ECO Tower, Homa Umigdal 29 Tel-Aviv

Overview

In this hands-on workshop, you’ll learn the basics of data lakes and see why so many organizations are adopting them. We’ll cover how data lakes work, how they compare to traditional databases and big data tools, and what makes them powerful. You’ll build your own data lake from the ground up, using an object store, a metastore, a query engine, and analytics tools. With the query engine, you’ll explore and manipulate your data to better understand how it flows and how the system works. We’ll also introduce analytics tools with real-world big data analytics use cases — which you can try on your own datasets. As we go deeper, you’ll learn about advanced topics like Apache Iceberg tables for handling updates and deletes, along with key aspects of managing a data lake: security, best practices, and controlling costs.

Who Should Attend

Data Engineers: Focused on data ingestion, transformation, and management.
Developers: Integrating applications with data lakes via APIs and SDKs.
Database Administrators (DBAs): Add a new technology stack, migrate existing databases to a data lake.

Prerequisites

Course Contents

Foundations of Data Lakes

What is a data lake?
Data lakes vs. traditional databases: key differences
Why data lakes? Benefits and common use cases
Data lake architecture overview
Data formats in data lakes: Intro to Apache Parquet

Hands-On Workshop Setup

Introduction to Docker
- Key concepts: containers, images, and tags
- Essential Docker commands

Object Store Integration

Set up your own object store with MinIO
Load data into the object store
Explore and browse stored data

Query Engine & Metastore

Overview: what are a metastore and a query engine?
Deploy Hive Metastore and Trino
Create and query tables using Trino CLI

Data Transformation & Optimization

Convert CSV to Parquet using Trino (Tier 1 → Tier 2)
Generate Tier 3 data for a use case

Data Analytics

Connect your own analytics tool (e.g., Apache Zeppelin) to the data lake

Advanced Topics

Apache Iceberg: table format with full CRUD support

Wrap-Up

Summary of key learnings
Resources for continuing your data lake journey