How do I stay up to date with the latest features in Data Lake Stack?

Use feature.delivery to track releases from 19 Data Lake Stack repositories in one chronological view. Simply select the repositories you want to monitor and get automatic updates when new features are released.

What's new in Data Lake Stack?

Stay informed about the latest Data Lake Stack updates by monitoring releases from key repositories including Apache Spark, Apache Hive, Hadoop Common and more.

How to track latest features in Data Lake Stack?

feature.delivery consolidates releases from multiple GitHub repositories into a single timeline, making it easy to track new features, bug fixes, and updates across the entire Data Lake Stack ecosystem.

https://feature.delivery

?l=

Use this link to track latest updates across the 19 repositories in Data Lake Stack

Staying up-to-date with latest features of the
Data Lake Stack in 2026

How does it work?

feature.delivery is a free, web-based platform that helps developers track the latest releases from multiple GitHub repositories — all in one streamlined, chronological view. By centralizing release information across tools, libraries, and frameworks, feature.delivery makes it easier than ever to stay on top of the updates throughout your development stack.

Checkout this 1 minute intro video to see it in action

The Data Lake Stack (AWS S3, Apache Spark, Apache Hive) stack offers a powerful, scalable, and cost-effective solution for managing big data analytics. Leveraging the durability and scalability of AWS S3, the distributed processing capabilities of Apache Spark, and the flexible data warehousing features of Apache Hive, this stack enables businesses to store, process, and analyze massive volumes of structured and unstructured data efficiently. It supports seamless data ingestion, high-performance querying, and integration with a broad ecosystem of open source tools, making it ideal for data engineering, ETL, and advanced analytics workflows.

Here's a breakdown of the Data Lake Stack into different categories

Core Data Lake Libraries

These foundational libraries power the key components of the data lake stack, enabling scalable storage, distributed computation, and efficient querying. They form the backbone of any modern data lake solution.

Apache Spark

apache/spark

Apache Hive

apache/hive

Hadoop Common

apache/hadoop

Cloud Storage Integration

Libraries and connectors that enable seamless interaction between big data processing frameworks and cloud-native storage solutions such as AWS S3.

Hadoop AWS

apache/hadoop/tree/trunk/hadoop-tools/hadoop-aws

s3fs

dask/s3fs

AWS SDK for Java

aws/aws-sdk-java

Data Lake Table Formats

Modern open-source table formats designed for big data analytics, providing ACID transactions, schema evolution, and time travel capabilities.

Apache Hudi

apache/hudi

Apache Iceberg

apache/iceberg

Delta Lake

delta-io/delta

Data Ingestion and ETL

Tools for ingesting, transforming, and loading data into the data lake from a variety of sources, supporting batch and streaming pipelines.

Apache NiFi

apache/nifi

Apache Airflow

apache/airflow

StreamSets Data Collector

streamsets/datacollector

Data Catalog and Metadata Management

Solutions for discovering, cataloging, and governing data assets in the data lake, ensuring data quality and compliance.

Apache Atlas

apache/atlas

Amundsen

amundsen-io/amundsen

Query Engines

High-performance distributed SQL engines for interactive and batch querying of data stored in the data lake.

PrestoDB

prestodb/presto

Trino

trinodb/trino

Data Visualization and Exploration

Open source tools for exploring, visualizing, and analyzing data residing in the data lake, enabling better business insights.

Apache Superset

apache/superset

Redash

getredash/redash

Data Security and Governance

Libraries and frameworks to secure, audit, and govern access to sensitive data in the data lake environment.

Apache Ranger

apache/ranger

Discover the latest features, improvements, and innovations in the Data Lake Stack (AWS S3, Apache Spark, Apache Hive) stack by visiting the repositories listed above. Click on each URL to explore the official releases, detailed documentation, and community contributions for this powerful data lake architecture.