Are You Gonna Go Parquet

by Jay Cuthrell
Share and discuss on LinkedIn or HN

This week we take a look at the past, present, and future of Apache Parquet.

This week’s musical inspiration in title and lyrics:

Getting Informed

A year goes by fast. In fact, a year ago I penned the footnote laden “You Get A Line and I’ll Get A Poll Result” (2022).

Just over ten years ago, Apache Parquet was released as a more efficient file format with an emphasis on _speed to read data_. At the time, “big data” was seeing regular coverage within the Apache Hadoop ecosystem that had developed over the prior seven years.

Today, the need to feed A.I. and M.L. models is part of the growing interest in “data analytics” with increasing emphasis on _speed to value_. In fact, as I mentioned back in “Dig Your Own SQL” (2021), the market emphasis on _speed to value_ is only going to accelerate.

Now it’s time for reading 📖, watching 📺, and listening 🎧 suggestions:

So that’s why you’ve got to try 🎶

Ten years ago, the hottest companies of “big data” in 2013 according to CIO were probably companies many IT folks with longer lived careers would recognize. But where are they now?

Perhaps you are wondering why you don’t see two of the most talked about “big data” companies of recent years like Databricks or Snowflake. Well, it’s worth noting that current “data analytics” companies like Databricks took their first funding in 2013 and Snowflake would not launch until 2014.

We must engage and rearrange 🎶

Setting aside “big data” circa 2013, what is the importance of Apache Parquet to modern “data analytics” companies today? Here’s a quick scan for Apache Parquet references across just a few names.

While the list above is not exhaustive in any way, it shows that the adoption of Apache Parquet reflects ten years of progress. It is also worth noting that PrestoDB began in 2013, Apache Spark began in 2014, and PrestoSQL aka Trino began in 2019.

So, it’s fair to say… this is just the tip of the Apache Iceberg… which began in 2017.

So, what will be the next big thing in Apache Parquet and other speed to value related technologies?

Until then… Place your bets!

Disclosure

I am linking to my disclosure.

🤓

Topics:

✍️ 🤓 Edit on Github 🐙 ✍️

Share and discuss on LinkedIn or HN
  • Get Fudge Sunday each week