Pipes Before Pipelines

by Jay Cuthrell
Share and discuss on LinkedIn or HN

Years ago, I was a writer for ReadWriteWeb which is now known as ReadWrite. The channel I wrote for on the site was ReadWriteHack.

As the name implies, I would read about a new tool or concept then I would write a short “hack” with links to try out the tool or concept. Readers were encouraged to read, write, and “hack” around too.

One of my ReadWriteHack articles in late 2010 was about Yahoo Pipes. Today, the technique I briefly outlined is one we take for granted using tools like Grammarly or Eleventy’s Inclusive Language Plugin.

Source: ReadWrite — Hack of the Day: Stronger Wording by Script or Click

Yahoo Pipes eventually went offline in late 2015. However, the memory of Yahoo Pipes lives on.

Must Read: A Love Letter to Yahoo Pipes

Yahoo Pipes glimpsed into a lowcode nocode future of user friendlier integration platform as a service (iPaaS). Enjoy this marvelous Glenn Fleishman retelling — it is a true love letter to Yahoo Pipes.

Source: retool.com — Pipe Dreams: The life and times of Yahoo Pipes

Getting Informed

Getting directions from one place to another in Google Maps and Apple Maps (setting tropes aside) on modern smartphones is an amazing fusion of web technologies, mobile computation, speech recognition, natural language processing, telecom, global positioning systems, low energy radio transmission telemetry, and constantly enriched near real-time data pipelines. When it works — it is a magic trick that we become accustomed to being normal and commonplace.

Data Pipelines Then and Now

Almost three decades ago, data pipelines for mapping visualization and web mapping were remarkable events to share with a growing community of enthusiasts. The data sizes were, for the time, quite massive and eventually crossed into dozens of terabytes (TB).

For example, many readers might recall or use Google Earth. There was also MSN Virtual Earth.

Source: CNET — Google Earth vs Virtual Earth 3D: War of the world viewers

By 2030, the marketing brand pledge and product capabilities in web mapping will have outpaced science fiction. Again, as a riff on Arthur C. Clarke’s original law:

Any sufficiently advanced product is indistinguishable from marketing.

Today, the modern Google Maps experience is increasingly leveraging A.I. to provide a preview of directions that would have been science fiction in the early days of web mapping. The product marketing name choice of “Immersive View” is an accurate claim.

Source: Google — New ways AI is making Maps more immersive

Sounds of Enrichment, Data Pipelines, and the Future

Readers might recall a recent post on the past, present, and future of Apache Parquet.

Source: Jay Cuthrell — Are You Gonna Go Parquet

Readers will probably recognize the familiar three-step pattern of update, upgrade, upskill/upswing/upstream/upsize or variants like modernize, automate, transform that are vendor specific etc. Or, if you assume that change is hard, this is the hard work of introducing change into IT operations management (ITOM).

Updating and modernizing a technology might be as simple as versions of software or generations of hardware. For anyone who carries a pager, here be dragons.

Upgrading and automating a technology might be an existential threat to the people who did things the legacy (something that previously worked) way. For anyone who has managed teams with legacy skills, here be dragons.

Upskilling and transforming the day-to-day ways of embracing and adopting a newly updated upgraded technology might prove to be an even greater challenge than the prior steps. For anyone who has led a so-called digital transformation, the William Gibson quote about the future being here but not being evenly distributed yet will likely ring true — and here be dragons.

It is important to consider where A.I. claims are woven into both the marketing of increasingly digital product experiences as well as the underpinning of the IT processes that will need to enable data enrichment for a digital product reality. Luckily, more and more thoughtful content is being created by product marketing teams that cater to data pipeline practitioners.

We can’t all be winners in the Financial Modeling World Cup let alone the Football/Soccer World Cup, but with A.I. there is a promise of a future where being okay at Microsoft Excel might come with an A.I. ability that moves us past Bruce Sterling’s “Clippy++” “spicy autocomplete” towards true copilots that help us all bend it like Beckham.

Source: webflow.io — Our Ten Year Thesis
Source: Generational — Generative AI for modern business intelligence
Source: Airbyte — Why AI shouldn’t reinvent ETL | Airbyte
Source: Ascend.io — The Case for Automated ETL Pipelines
Source: prefect.io — A platform approach to workflow orchestration
Source: Pecan AI — What is Predictive GenAI?
Source: dremio.com — Using Generative AI as a Data Engineer | Dremio
Source: Polestarsolutions — Generative AI Redefining Data Engineering- Polestar Solution
Source: Monte Carlo Data — Generative AI And The Future Of Data Engineering
Source: WEKA — Extracting the Signals from the Noise
Source: featurebyte.com — FeatureByte Copilot: Revolutionizing Feature Ideation with Generative AI | FeatureByte
Source: prophecy.io — Introducing Prophecy Data Copilot - generative AI for data engineering
Source: Amazon — AWS Announces Amazon Q to Reimagine the Future of Work
Source: McKinsey & Company — The data dividend: Fueling generative AI
Source: SnapLogic — A New Data Frontier: Using GenAI to Integrate and Automate the Enterprise With AWS
Source: DataStax — Streaming real time data pipelines using generative AI | DataStax

At the same time, open-source software is increasingly being shared by organizations that are at the forefront of practical applications of data pipelines. Kudos to Stripe, RudderStack, Dagster Labs and others.

Source: GitHub — GitHub - stripe/veneur: A distributed, fault-tolerant pipeline for observability data
Source: GitHub — GitHub - rudderlabs/rudder-server: Privacy and Security focused Segment-alternative, in Golang and React
Source: GitHub — GitHub - dagster-io/dagster-open-platform: Dagster Labs’ open-source data platform, built with Dagster.

Lastly, data pipelines are not exclusively the domain of customer data of what you bought that might result in a suggestion of what to buy next in the sense of e-commerce. It is important to remember that there are real-world implications around getting real-world data, real-world evidence, and real-time data in the healthcare and life sciences world too (SPACE, SPIFD, SPIFD2).

Source: regulations.gov — Regulations.gov
Source: elsevier.com — ScienceDirect

So, what will be the next big thing for data pipelines?

Until then… Place your bets!

Disclosure

I am linking to my disclosure.

Topics:

✍️ 🤓 Edit on Github 🐙 ✍️

Share and discuss on LinkedIn or HN
  • Get Fudge Sunday each week