Pipes Before Pipelines

December 17, 2023

Years ago, I was a writer for ReadWriteWeb which is now known as ReadWrite. The channel I wrote for on the site was ReadWriteHack.

As the name implies, I would read about a new tool or concept then I would write a short “hack” with links to try out the tool or concept. Readers were encouraged to read, write, and “hack” around too.

One of my ReadWriteHack articles in late 2010 was about Yahoo Pipes. Today, the technique I briefly outlined is one we take for granted using tools like Grammarly or Eleventy’s Inclusive Language Plugin.

https://readwrite.com/hack-of-the-day-stronger-wordi/

Yahoo Pipes eventually went offline in late 2015. However, the memory of Yahoo Pipes lives on.

Must Read: A Love Letter to Yahoo Pipes

Yahoo Pipes glimpsed into a lowcode nocode future of user friendlier integration platform as a service (iPaaS). Enjoy this marvelous Glenn Fleishman retelling — it is a true love letter to Yahoo Pipes.

https://retool.com/pipes

Getting Informed

Getting directions from one place to another in Google Maps and Apple Maps (setting tropes aside) on modern smartphones is an amazing fusion of web technologies, mobile computation, speech recognition, natural language processing, telecom, global positioning systems, low energy radio transmission telemetry, and constantly enriched near real-time data pipelines. When it works — it is a magic trick that we become accustomed to being normal and commonplace.

Data Pipelines Then and Now

Almost three decades ago, data pipelines for mapping visualization and web mapping were remarkable events to share with a growing community of enthusiasts. The data sizes were, for the time, quite massive and eventually crossed into dozens of terabytes (TB).

For example, many readers might recall or use Google Earth. There was also MSN Virtual Earth.

https://www.cnet.com/tech/services-and-software/google-earth-vs-virtual-earth-3d-war-of-the-world-viewers/

By 2030, the marketing brand pledge and product capabilities in web mapping will have outpaced science fiction. Again, as a riff on Arthur C. Clarke’s original law:

Any sufficiently advanced product is indistinguishable from marketing.

Today, the modern Google Maps experience is increasingly leveraging A.I. to provide a preview of directions that would have been science fiction in the early days of web mapping. The product marketing name choice of “Immersive View” is an accurate claim.

https://blog.google/products/maps/google-maps-updates-io-2023/

Sounds of Enrichment, Data Pipelines, and the Future

Readers might recall a recent post on the past, present, and future of Apache Parquet.

https://fudge.org/archive/are-you-gonna-go-parquet/

Readers will probably recognize the familiar three-step pattern of update, upgrade, upskill/upswing/upstream/upsize or variants like modernize, automate, transform that are vendor specific etc. Or, if you assume that change is hard, this is the hard work of introducing change into IT operations management (ITOM).

Updating and modernizing a technology might be as simple as versions of software or generations of hardware. For anyone who carries a pager, here be dragons.

Upgrading and automating a technology might be an existential threat to the people who did things the legacy (something that previously worked) way. For anyone who has managed teams with legacy skills, here be dragons.

Upskilling and transforming the day-to-day ways of embracing and adopting a newly updated upgraded technology might prove to be an even greater challenge than the prior steps. For anyone who has led a so-called digital transformation, the William Gibson quote about the future being here but not being evenly distributed yet will likely ring true — and here be dragons.

It is important to consider where A.I. claims are woven into both the marketing of increasingly digital product experiences as well as the underpinning of the IT processes that will need to enable data enrichment for a digital product reality. Luckily, more and more thoughtful content is being created by product marketing teams that cater to data pipeline practitioners.

We can’t all be winners in the Financial Modeling World Cup let alone the Football/Soccer World Cup, but with A.I. there is a promise of a future where being okay at Microsoft Excel might come with an A.I. ability that moves us past Bruce Sterling’s “Clippy++” “spicy autocomplete” towards true copilots that help us all bend it like Beckham.

https://textql.webflow.io/blog/ten-year-thesis
https://www.generational.pub/p/generative-ai-for-bi
https://airbyte.com/blog/why-ai-shouldnt-reinvent-etl
https://www.ascend.io/blog/the-case-for-automated-etl-pipelines/
https://www.prefect.io/blog/a-platform-approach-to-workflow-orchestration
https://www.pecan.ai/blog/what-is-predictive-genai/
https://www.dremio.com/blog/using-generative-ai-as-a-data-engineer/
https://www.polestarllp.com/blog/generative-ai-redefining-data-engineering
https://www.montecarlodata.com/blog-generative-ai-data-engineering/
https://www.weka.io/blog/cloud-storage/extracting-the-signals-from-the-noise/
https://featurebyte.com/resources/introducing-featurebyte-copilot
https://www.prophecy.io/blog/introducing-prophecy-data-copilot
https://press.aboutamazon.com/2023/11/aws-announces-amazon-q-to-reimagine-the-future-of-work
https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-data-dividend-fueling-generative-ai
https://www.snaplogic.com/blog/genai-integrate-automate-enterprise-aws
https://www.datastax.com/blog/streaming-real-time-data-pipelines-using-generative-ai

At the same time, open-source software is increasingly being shared by organizations that are at the forefront of practical applications of data pipelines. Kudos to Stripe, RudderStack, Dagster Labs and others.

https://github.com/stripe/veneur
https://github.com/rudderlabs/rudder-server
https://github.com/dagster-io/dagster-open-platform

Lastly, data pipelines are not exclusively the domain of customer data of what you bought that might result in a suggestion of what to buy next in the sense of e-commerce. It is important to remember that there are real-world implications around getting real-world data, real-world evidence, and real-time data in the healthcare and life sciences world too (SPACE, SPIFD, SPIFD2).

https://www.regulations.gov/document/FDA-2023-D-2318-0003/comment
https://doi.org/10.1016/j.imr.2023.101000

So, what will be the next big thing for data pipelines?

Until then… Place your bets!

Disclosure

I am linking to my disclosure.

View this page on GitHub.