Newsletter

Stay Ahead in Data Engineering

Join thousands of data engineers and leaders who receive our weekly newsletter. Get insights, best practices, and exclusive content delivered to your inbox.

No spam. Unsubscribe at any time.

What You'll Get

Insights that move the needle

Every edition is packed with actionable content — from deep-dive technical articles to strategic perspectives on where data engineering is headed.

Expert Insights

Data engineering best practices, architectural patterns, and industry trends from practitioners who've been in the trenches.

Product Updates

Be the first to know about new Belvedere features, platform capabilities, and integration announcements.

Exclusive Content

Guides, templates, and resources available only to subscribers — practical tools you can use immediately.

What We've Been Writing About

View all

A Write-Audit-Publish (WAP) Skill for Agentic Data Pipelines

Haydn Strauss4 min readData EngineeringPublished July 14, 2026

AI agents are great at building data pipelines that look like they work until you dig into the results.

Write-audit-publish (WAP) helps fix that. Stage the data, audit it against a declared contract, and only publish once every clause passes. Netflix popularized this pattern in 2017.

A pipeline that finishes successfully is not the same as one whose output is correct.

We’ve built a number of internal skills to make our own data pipelines safer, and this one felt useful enough to release as a free WAP skill for coding agents.

The first test was on Netflix’s Top 10 dataset. The initial run stopped at the gate. Our contract said every film should have “N/A” as the season title, but the agent found nine rows that didn’t match. The contract was wrong, not the data. We fixed it, started a fresh run, and the second attempt published cleanly, with the total reconciling to exactly 185,656,120,000 hours viewed.

We ran it again on an NFL play-by-play pipeline (converting play description strings into structured stat tables). It caught a parser bug that left 1,723 completed passes without matching receptions, exactly the kind of thing a "successful" run hides.

Below, we dig a bit more into how the skill works. Give it a read, or point your coding agent at this URL and try it yourself.

Read whole article

Belvedere: Your Agentic Data Manager for Mission Operations

Brian Frutchey1 min readProductPublished June 25, 2026

Belvedere uses AI agents to build the data pipeline, not to be the pipeline. The agents handle the design work: profiling sources, drafting transforms, and wiring governance. What they produce is a transparent, repeatable pipeline your team can read, audit, and run cheaply.

This walkthrough shows how that plays out for mission operations, starting from raw, fragmented sources, scoping a data contract, and landing a governed data product with lineage and access controls intact.

Want to see it on your own data? Book a demo and we'll tailor it to your environment.

Read whole article

The Foundation Behind Reliable AI Agent Analytics

Haydn Strauss4 min readAnalyticsPublished June 16, 2026

Anthropic recently published how it runs self-service analytics on Claude. One result caught my eye: context + skills took its analytics agent from 21% accuracy to consistently above 95%.

Highlighting that generating SQL is the easy part, the hard part is everything underneath it: canonical datasets, a semantic layer, lineage, maintained skills, and provenance on every answer.

That jump came from the foundation, not a bigger model. With the context right, the agent on top matters much less.

Why Agents Alone Fail

In addition to cost, three context problems keep coming up.

Entity ambiguity. "Active users" or "revenue" has several definitions in the warehouse. The agent picks one and writes correct SQL against the wrong data.
Staleness. The definition was right when written. Then the pipeline changed and the skill was never updated.
Retrieval failure. The right definition exists somewhere, but the agent can't find it, or grabs the wrong version.

Two of these, staleness and retrieval, can't be fixed easily by prompting alone. They need the context to be a versioned, owned asset wired to the pipeline it describes.

Anthropic tried the shortcut of handing the agent the raw query corpus, and accuracy barely moved. As they put it: "The information was there, the agent saw it, and it still didn't use it."

Read whole article

View all articles