Events and Efficiency

by Jay Cuthrell
Share and discuss on LinkedIn or HN

This week we take a look at what’s next in events and efficiency in runtimes.

Housekeeping

Now, back to our regularly scheduled Fudge Sunday newsletter! 🤓

Getting Informed

Runtime and runtimes have many computing definitions. However, a C-suite simple way to think of a runtime is what amount of time any software application workload takes to run — because time is money.

As many esteemed clouderati have provocatively stated, the cloud is just someone else’s computer… and that same someone has the benefit of immensely more resources than most… in addition to a mechanism to surface that computer to be easily accessible, elastically expandable… and while in use… there is absolutely a meter that is always running.

So, the argument goes, that runtime optimizations have immense value. You know you’ll be chopping wood — but how long it takes to finish the job depends on how sharp the instrument that you’ll be renting (public cloud) or the investment to own axe handles (private cloud) and periodically sharpen your ax.

In terms of public clouds, there are several options for runtimes. Generally speaking, serverless and low wait times were early differentiators but that gap has closed and the favoritisms are more likely related to committed spending for a primary cloud service provider and the sharpness of the FinOps pencils in use at an organization.

For example, Azure offers Fabric which is based on Apache Spark and has continued to be more integrated across other elements of Azure analytics offerings. GCP offers serverless Spark and integrates BigQuery, Vertex AI, and DataPlex. OCI offers Data Flow as a fully managed Apache Spark service.

By comparison (only to show the timelines involved), AWS offers Amazon EMR (which was originally named Amazon Elastic MapReduce) and has consistently blogged about improvements to price and performance for workloads like Spark (as well as Hadoop, Hive, Presto, Trino, etc.) and a simple path to consume as the industry has shifted from spinning disks to solid state disks to drastically lower latency in-memory approaches for real-time stream processing or batch processing.

So, this isn’t new… and Apache Spark has been around since 2010. But, the sharpening of axes continues.

The Axis of Axes

Previously, I mentioned going down the rabbit hole on Velox and Comet. It’s time to share what I’ve been taking in and what I’ve digested so far.

Velox

Velox has the Meta/Facebook name attached to it. As a C++ library, it enjoys a large potential contributor audience.

It wasn’t 100% clear in my research even after watching videos from the VeloxCon agenda on where this project is headed — but it was interesting to see IBM, Meta/Facebook, Pinterest, Intel, and Microsoft presenting.

Comet

Comet is very new by comparison. As a Rust library, it has a smaller contributor audience for now.

Other Axes

Of course, there are still the ClickHouse solutions, the StarRocks solutions, the Sneller solutions, the Compute AI solutions 🤓 , and what will likely be new players in this space for the foreseeable future. Older axes benefit from grinding.

Of course, I try to be fair in my newsletter as I have no ax to grind.

So… 🤓

What’s the probability of an industry-altering technology emerging that will fundamentally change our perceptions of time to value in processing ever more data in a far faster economic way than the years before?

Until then… place your bets.

Disclosure

I am linking to my disclosure.


p.s. As I’ve gotten older, I have come to appreciate getting snail mail. If you have time to drop me a postcard, I’m going to be scanning the picture side of the postcards I’ve received and link to a Fudge Sunday Reader Postcards gallery (with suitable redactions and filtering for greater anonymity) as a newsletter trailer of sorts. Stay tuned! ✉️

Topics:

✍️ 🤓 Edit on Github 🐙 ✍️

Share and discuss on LinkedIn or HN
  • Get Fudge Sunday each week