Are You Gonna Go Parquet

This week we take a look at the past, present, and future of Apache Parquet.

This week’s musical inspiration in title and lyrics:

Getting Informed

A year goes by fast. In fact, a year ago I penned the footnote laden “You Get A Line and I’ll Get A Poll Result” (2022).

Just over ten years ago, Apache Parquet was released as a more efficient file format with an emphasis on _speed to read data_. At the time, “big data” was seeing regular coverage within the Apache Hadoop ecosystem that had developed over the prior seven years.

Today, the need to feed A.I. and M.L. models is part of the growing interest in “data analytics” with increasing emphasis on _speed to value_. In fact, as I mentioned back in “Dig Your Own SQL” (2021), the market emphasis on _speed to value_ is only going to accelerate.

Now it’s time for reading 📖, watching 📺, and listening 🎧 suggestions:

So that’s why you’ve got to try 🎶

Ten years ago, the hottest companies of “big data” in 2013 according to CIO were probably companies many IT folks with longer lived careers would recognize. But where are they now?

Perhaps you are wondering why you don’t see two of the most talked about “big data” companies of recent years like Databricks or Snowflake. Well, it’s worth noting that current “data analytics” companies like Databricks took their first funding in 2013 and Snowflake would not launch until 2014.

We must engage and rearrange 🎶

Setting aside “big data” circa 2013, what is the importance of Apache Parquet to modern “data analytics” companies today? Here’s a quick scan for Apache Parquet references across just a few names.

While the list above is not exhaustive in any way, it shows that adoption of Apache Parquet reflects ten years of progress. It is also worth noting that PrestoDB began in 2013, Apache Spark began in 2014, and PrestoSQL aka Trino began in 2019.

So, it’s fair to say… this is just the tip of the Apache Iceberg… which began in 2017.

So, what will be the next big thing in Apache Parquet and other speed to value related technologies?

Until then… Place your bets!

Work Plug

As a reminder, after a +25 year walkabout, I’m an IBMer (again). For 2023, in “Work Plug”, I share a new link each week that is educational, accessible, and relevant to platform engineering from fellow IBMers[1] in the wider IBM Community.

Stay tuned!


I am linking to my disclosure.

  1. Shout out to Marina Danilevsky ↩︎

✍️ 🤓 Edit on Github 🐙 ✍️