Tip of the Apache Iceberg

by Jay Cuthrell
Share and discuss on LinkedIn or HN

Fudge Sunday readers will recall my use of songs as inspiration. While the newsletter is not going back to the series format, the lyrics of SchoolHouse Rock - Mother Necessity are appropriate.

Source: YouTube — watch?v=aBx-ilTzLec

The topic of ever more responsive, effective, and efficient data analytics is not new to Fudge Sunday newsletter. Today, there are rapidly updating cloud native projects and stories from the communities and companies involved in Hudi, Iceberg, Presto, Spark, Superset, Trino, and many more.

Shot

Just 132 days ago, Compute.AI was mentioned in Fudge Sunday #214 “Are You Gonna Go Parquet”.

A few readers were curious about how to “kick the tires” and get a demo of Compute.AI.

Chaser

Last week, the first public GitHub repository appeared for Compute.AI. Practitioners can now launch a containerized deployment of the ComputeAI SQL engine.

The Compute.AI team has posted updates on LinkedIn for the announcement itself as well as a video demo on the integration with Jupyter Notebook.

Source: YouTube — YouTube

Also, there is a video demo on the integration with Apache Superset.

Source: YouTube — YouTube

As such, it’s time to add to the topics list. 🤓

Getting Informed

When reading about cloud native projects, it is common to see references to the Apache Software Foundation (ASF). Over almost a quarter of a century, ASF has grown to almost 300 projects.

One such project, Apache Iceberg, began in 2017 within Netflix and was donated to ASF in 2018 to promote the efficacy and longevity of a modern approach to low-level standards involving very large tables. This rapid promotion of the project within ASF is because the community is focused on ensuring Iceberg does certain things very well.

Then, in 2021, a commercial entity known as Tabular was formed and funded to simplify, secure, and streamline the adoption of Iceberg.

Source: Techmeme — Tabular, which provides an independent storage platform based on Apache Iceberg, raised $26M led by Altimeter Capital, bringing its total funding to $37M
Source: YouTube — Why You Shouldn’t Care About Iceberg | Tabular

Commercial options for Iceberg besides Tabular are growing. There are recognizable companies like Cloudera and Snowflake, but there are smaller companies like ClickHouse and Starburst that deserve attention as well — not to mention the Hudi play from companies like Onehouse.

Source: Techmeme — ClickHouse, which provides an online analytical processing database management system, raises $250M at a $2B valuation led by Coatue and Altimeter
Source: Techmeme — Starburst, which offers the Trino distributed SQL query engine, raises a $250M Series D led by Alkeon at a $3.35B valuation, after raising $100M in January 2021
Source: Techmeme — Onehouse, a cloud data lake service built on the Apache Hudi platform, raised a $25M Series A led by Addition and Greylock, bringing its total funding to $33M

In 2024, expect to see more community activities including practitioner challenges that promote the art of the possible drawing upon various projects mentioned in this newsletter.

Source: morling.dev — The One Billion Row Challenge
Source: 1 billion rows challenge in PostgreSQL and ClickHouse — 1 billion rows challenge in PostgreSQL and ClickHouse

So, what will be the next big thing in Apache Iceberg and related projects?

Until then… Place your bets!

Disclosure

I am linking to my disclosure.


p.s. As I’ve gotten older, I have come to appreciate getting snail mail. If you have time to drop me a postcard that would be amazing.

Topics:

✍️ 🤓 Edit on Github 🐙 ✍️

Share and discuss on LinkedIn or HN
  • Get Fudge Sunday each week