Blog

Engineering notes

Design rationale and war stories from building SQE — replacing a Trino fork with Rust, surviving the Iceberg rebase, the caching layers, the DuckDB-shaped surprise, and shipping column-level lineage. Synced from the source repository.

44 posts

3 Jul 2026
One Ranger policy, uniform access across SQE and Spark

A lakehouse rarely has one query engine. Spark writes the tables, SQE serves the interactive queries, and both touch the same Iceberg data. The usual failure is policy drift: each engine has its own access-control plugin, the same masking intent gets translated twice, and the two translations disagree. We avoided that by not translating twice. SQE and Spark read the same Apache Ranger hive service, so one policy written once enforces the same way in both. The SSN reads xxx-xx-1111 whichever engine ran the query.

securityrangersparkgovernance
2 Jul 2026
A big write shouldn't take down the node

Reads on SQE spill to disk when they run out of memory. Writes did not. A large CTAS, a wide MERGE, or an oversized client upload could balloon a coordinator buffer past its memory limit and get the process OOM-killed, which takes every other query on that node with it. We gave the write path the same memory discipline the read path already had: pool-track the buffers that must exist so an oversized write fails as one typed error, and stream the ones that never needed to buffer at all.

write-pathicebergreliabilitydatafusion
2 Jul 2026
Metabase connected, then showed zero tables

A BI tool connecting to a query engine is not one query. It is a scripted handshake of a dozen metadata calls, and every one has to look exactly like Trino or the tool gives up without an error. We pointed a real Metabase at SQE's Trino endpoint and watched it fail six different ways: a PREPARE the parser rejected, a SHOW TABLES column that collapsed every table into one, catalogs it never enumerated, quoted identifiers that matched nothing, and a timestamp type signature the JDBC driver refused to parse. None of them threw. They just showed nothing.

trino-compatibilitymetabasesupersetjdbc
25 Jun 2026
The fix that fixed nothing: SSB, dynamic filters, and a 4x bug

SSB is the one benchmark suite SQE loses to Trino, 2.5x at scale factor 10, while it wins everything else. The obvious cause was a single-threaded fact-table scan, so we built a planner rule to parallelize it. A correctness smoke caught the rule returning 240 million rows from a 60-million-row table, a latent 4x duplication bug we then fixed. Then the parallel scan turned out to change nothing: SSB did not move. The real cause is a dynamic filter that is a huge win on clustered fact tables and pure overhead on SSB's uniformly-distributed one. A slow number is a hypothesis, and the fix you are sure of can be perf-neutral.

benchmarksperformancedatafusiontesting
19 Jun 2026
When GRANT becomes a Ranger policy

SQE got a second access-control backend: write GRANT/REVOKE as Apache Ranger policy and let Polaris enforce it. The protocol took an afternoon. The identity model took the week, because Polaris federation does not work the way the token suggests.

rangerpolarissecurityiceberg
19 Jun 2026
One mask, and Spark and SQE agree to the byte

SQE's fine-grained Ranger backend does row filters, column masks, role-conditional masking, and tags. The kicker: the same Ranger policy on the same Polaris catalog produces byte-exact identical masked output in SQE and in standard Apache Spark. We proved it live, and the first run failed for a reason no amount of reading could have predicted.

rangersparksecurityiceberg
19 Jun 2026
Snowflake's governance model on open Iceberg

Snowflake gives you masking policies, row access policies, object tags, and a GRANT model. SQE gives you the same primitives on Apache Ranger and open Iceberg, enforced by plan rewrite and shared across engines. Here is the mapping, the one real edge SQE has, and the gaps we have not closed yet.

rangersnowflakesecurityiceberg
18 Jun 2026
DataFusion 54: what it actually unblocked

DataFusion 54 landed and we bumped to it. The port was mostly mechanical, with one real behavioral change in the shuffle hasher. The interesting part is what the release notes implied and the engine did not deliver: LATERAL joins are logical-plan only, array lambdas still fail, and the one compatibility win we found was a documentation bug, not a DataFusion feature. We tested every claim before we wrote it down.

datafusionicebergbenchmarkssql
15 Jun 2026
The filter that rebuilt itself 14,600 times

A two-table TPC-H join that Trino ran in 2.2s took us 161s at SF10. We blamed partition layout, then single-node joins, then a subquery pattern. All three were wrong. A CPU profile and two timers found the truth: a runtime filter we pushed to the probe scan was getting re-snapshotted once per batch, and each snapshot rebuilt a 300,000-node expression tree. The fix snapshots it once. q12 went 161s to 2.7s, q17 176s to 7.1s, q10 from a 300s failure to 3.3s, with the result rows unchanged and no knob touched.

performancedatafusionicebergbenchmarks
14 Jun 2026
The 14x gap that wasn't: q95, contention, and the number we almost fixed

TPC-DS q95 was our worst query: 18 seconds against Trino's 1.3, a 14x loss that justified building a whole optimizer feature. Before we wrote a line of it, we pulled the plan and the profile. The 12-million-row self-join the feature was meant to shrink did not exist, the engine ran the query in under half a second, and the 18 seconds lived only in a benchmark harness running both engines on one starved host. On a clean rig SQE runs q95 in 240ms and beats Trino 12x. A slow benchmark number is a hypothesis until you reproduce it in isolation.

benchmarksperformancedatafusiontesting
13 Jun 2026
One file, one thread, and the 910ms that explained the SSB gap

On SSB at SF1 Trino ran scan-heavy queries about 2x faster than SQE, even though we pruned just as well. A new line in our query profile found it: a 151MB lineorder file decoded on a single thread, 94% of a 969ms query spent waiting on one scan. The obvious fix (more partitions) was a trap that regressed q72 from 17s to 100s once before. The safe fix parallelizes decode inside the scan without changing the plan the optimizer sees.

performancedatafusionicebergbenchmarks
13 Jun 2026
Six groups where the spec allows four

TPC-H q01 returned six (returnflag, linestatus) combinations on our generated data. The spec defines exactly four. Both SQE and Trino agreed on the wrong answer, because both read the same broken tables. The bug class behind it: fields the spec derives from other fields, drawn instead as independent uniform random. Five defects in TPC-H, one in SSB, all the same shape.

benchmarkstestingduckdbcorrectness
12 Jun 2026
The benchmark that lied, the oracle that didn't, and the day Trino was wrong

Our SF0.1 compare run looked great: zero mismatches across seven suites. Then we asked DuckDB to check the data and found that 16 'passing' TPC-DS queries had never selected a single row, TPC-C had zero warehouses, and the one real disagreement between SQE and Trino was Trino's fault. Plus: the dynamic filter that shipped 6 million rows because nobody carried it across a node swap.

benchmarkstestingduckdbperformance
10 Jun 2026
columns(2) must match fields(16)

Differential testing against Trino caught a distributed-scan bug the whole test suite missed: every projected query failed once a scan actually distributed. The first fix restored correctness by disabling projection pushdown. Then a Claude agent on the Fable model found the real bug, one line in the worker's streaming rewrite, and got the speed back: 3.1x overall, 10x on the worst query.

distributedtestingperformancedebugging
7 Jun 2026
Lake Formation gates the catalog, not the rows

Our own README promised fine-grained Lake Formation in the SQE Glue quickstart. The engine does not do it. Here is what Lake Formation actually enforces when SQE reads S3 directly, and where SQE's real column and row masking lives.

lake-formationgluesecurityiceberg
7 Jun 2026
Runnable docs, or how the quickstarts became a test suite

We turned every 'how to run SQE for X' into a self-contained run.sh that goes from clean state to captured output. The point was documentation. The payoff was a test suite that caught a missing metric, an overclaimed capability, and a benchmark that would have polluted our baselines.

documentationtestingquickstartobservability
2 Jun 2026
A dashboard for the engine

SQE already tracked every query and worker in memory, but there was no way to look at it without tailing logs or wiring Grafana. So we put a read-only dashboard on the health port: no login, no build step, no external assets. One HTML page over a small JSON API, with a tiny in-memory sampler so the charts move over time. Here is how it is built and why it stays deliberately small.

observabilityfrontendoperations
26 May 2026
Speaking Quack: SQE as a DuckDB server, a DuckDB client, and a federation engine

DuckDB 1.5 ships a wire protocol called Quack. We re-implemented it in pure Rust, turned SQE into both server and client, and proved you can JOIN an Iceberg table with a remote DuckDB table in a single SELECT. Two waves of bugs, one federated query.

duckdbquackdatafusioniceberg
26 May 2026
The type matrix as a roadmap: seven DuckDB types in two days

We started with a markdown table tracking which DuckDB types we could round-trip. The table became the roadmap. DECIMAL, LIST, STRUCT, MAP, ARRAY, ENUM, UNION each got their own MR. The surprises came from how DuckDB models the relationships: MAP is LIST<STRUCT>, UNION is STRUCT with a tag field, and DECIMAL packs four widths into one logical type.

duckdbquackdatatypesrust
25 May 2026
Porting DuckDB's BinarySerializer to pure Rust

Ten sub-MRs in a day. The wire format, the fixture-driven debugging loop, and two bugs that the C++ reference encoder ships without telling you: WriteListWithDefault elision, and uninitialised bytes at NULL VARCHAR positions.

duckdbquackrustwire-protocol
17 May 2026
The SSB regression that wasn't

MR #220 wired runtime filters into iceberg-rust's scan path and dropped TPC-DS 67%. SSB looked like it regressed 6%. Two failed heuristic attempts, one parquet-trace session, and ten warm passes later, the regression turned out to be measurement noise. The fix-the-fix that wasn't, and the data-clustering insight that explains why two suites with the same code path behave nothing alike.

performancedatafusionicebergbenchmarks
16 May 2026
q72, our nemesis, and the Int32 that hid for a month

TPC-DS q72 sat at 10 seconds while every other query ran in under 1.4. Five days of investigation chased scan parallelism, range-based NDV, iceberg-rust upgrades, and the RisingWave fork. None of those were the bug. The bug was a silently-skipped Err arm in our dynamic-filter evaluator that swallowed every Int32 vs Int64 type clash. Fixing it: 15.5s to 0.77s. q72 now beats Trino.

performancedatafusionicebergdebugging
15 May 2026
Nineteen MRs, four waves, and the failure modes of agent batches at scale

Two days after the nine-PR audit pass, we ran a bigger one. 130 issues filed, 19 themed MRs merged across four waves. Same workflow, more failure modes. Watchdog stalls, a reboot mid-wave, a broken main, and config.rs as the conflict magnet. Here is what actually happened.

developer-experienceagentic-aicode-reviewgit
13 May 2026
Nine PRs, two merge conflicts, and the value of themed branches

Sunday afternoon: an audit dropped eighteen issues into the tracker. Monday morning we had nine merge requests open. Two of them hit conflicts on rebase. Neither one mattered. Here is why that was deliberate, not lucky.

developer-experiencecode-reviewgitsecurity
13 May 2026
read_parquet shouldn't read /etc/shadow

Modern object-store abstractions unify the filesystem and HTTP behind a single URL. That's a feature for ergonomics. It's a security trap when the URL comes from a user. SQE shipped that trap and then closed it. The IMDS pivot is the part worth telling.

securityssrfdatafusionrust
10 May 2026
Mounting catalogs from SQL: ATTACH, DETACH, and the registry pattern

SQE now ships DuckDB-style ATTACH / DETACH and CREATE / DROP SECRET. The story of building it covers parser extension, credential hygiene, a lifecycle bug we found in the integration tests, and a state-store pattern that is starting to repeat.

icebergduckdbdatafusioncatalogs
9 May 2026
Shipping OpenLineage: column-level lineage for an Iceberg engine

SQE now emits OL 2-0-2 events with column-level lineage on every write. Here is what they look like, why we walked the LogicalPlan to build them, and the disk spool we did not want to write.

openlineagelineagedatafusioniceberg
7 May 2026
How we accidentally created a DuckDB

SQE started as a distributed Iceberg query engine. Five MRs later it queries CSVs from disk, Parquet from S3, and Parquet from HuggingFace. We did not plan that.

duckdbicebergdatafusionembedded
7 May 2026
One Binary, No Cluster: SQE Goes Embedded

We built SQE for distributed Iceberg, but most of the time you just want to look at a parquet file. Here's how we made the engine work both ways without forking the codebase.

cliembeddedduckdbdatafusion
29 Apr 2026
SQE Talks to Five Catalogs Now: HMS, Nessie, Glue, JDBC, S3 Tables

We claimed the engine was catalog-agnostic. Time to prove it. One branch, five live integration tests, one small AWS SigV4 patch, and a matrix score that moved from 153 to 158.

icebergcatalogawss3-tables
29 Apr 2026
Why a Public Iceberg Matrix Beats Vendor Spec Sheets

Sixty-three capabilities, three levels, no marketing. The Iceberg Matrix is what compatibility looks like when the rubric is public and the evidence has to land in code. Here is why it works, what we learned from sitting on it, and why every open standard needs one.

icebergmatrixopen-sourcecompatibility
26 Apr 2026
The Iceberg Matrix and the Quiet Bug Hiding in V3

We thought the V3 path worked. The unit tests said it worked. The matrix called it 'partial' and we agreed. Then we wrote eleven end-to-end tests and discovered Polaris had been silently rejecting every V3 column type for months.

icebergv3polaristesting
16 Apr 2026
Our Nemesis: TPC-DS Query 72 and the Limits of a Custom SQL Engine

One query. Ten tables. Twelve times slower than Trino. Everything we tried, what worked, what didn't, and where the ceiling is.

performancetpc-dsdatafusiontrino
14 Apr 2026
DataFusion 53, a Vendored Fork, and 40% Faster Queries

We upgraded SQE from DataFusion 52 to 53 by forking and rebasing iceberg-rust ourselves. The result: 27-40% faster across every benchmark suite.

datafusionicebergperformancerust
13 Apr 2026
43 Findings, Zero Deferred: A Production Security Audit of a Rust SQL Engine

We ran a full production sign-off audit against SQE and found 43 issues across security, runtime safety, logic bugs, and code quality. Then we fixed all of them in one session.

securityrustauditproduction-readiness
13 Apr 2026
How Agentic AI Helped Us Beat Trino

221 queries, 7 suites, one week — how an AI assistant running automated benchmarks drove a major performance breakthrough.

aiperformancebenchmarksdevelopment-process
12 Apr 2026
Five Layers of Caching and an 8.8x Speedup Over Trino

How multi-layer caching took SQE from slower than Trino to 2.5-8.8x faster across every benchmark suite.

performancecachingtrinobenchmarks
10 Apr 2026
Streaming Writes, Sort Order Safety, and the IN (Subquery) Workaround

Fixing OOM in CTAS, safe Iceberg sort order for mixed writers, and working around DataFusion limitations.

performancecorrectnessicebergstreaming
9 Apr 2026
From 63% to 95%: Building Trino SQL Compatibility in a Single Day

Implementing 70+ UDFs, Iceberg time travel, metadata TVFs, and engine-level SQL features for Trino drop-in replacement.

trinocompatibilityudfsiceberg
24 Mar 2026
Building a Comprehensive SQL Benchmark Suite

Seven benchmark suites, 222 queries, and the infrastructure to measure performance honestly.

benchmarkstpchtpcdsperformance
22 Mar 2026
We Replaced Our Trino Fork with a Rust SQL Engine

How we went from maintaining a 2M-line Java fork to shipping a 50MB binary that runs every query as the authenticated user.

rustdatafusiontrinoarchitecture
22 Mar 2026
How We Build Software with AI Assistants

From brainstorm to production in four phases — structured AI collaboration that produces better software than either alone.

aidevelopment-processclaudeproductivity
22 Mar 2026
Making SQE Work Everywhere: Pluggable Auth and Catalogs

How we're turning a single-vendor query engine into something that runs against any identity provider, any catalog, and any cloud.

authoidcpolarisarchitecture
22 Mar 2026
When Your SQL Engine Understands Meaning

SQL engines know table shapes. We're adding ontologies, property graphs, vector search, and AI-native interfaces.

aiontologyvector-searchfuture

Engineering notes

One Ranger policy, uniform access across SQE and Spark

A big write shouldn't take down the node

Metabase connected, then showed zero tables

The fix that fixed nothing: SSB, dynamic filters, and a 4x bug

When GRANT becomes a Ranger policy

One mask, and Spark and SQE agree to the byte

Snowflake's governance model on open Iceberg

DataFusion 54: what it actually unblocked

The filter that rebuilt itself 14,600 times

The 14x gap that wasn't: q95, contention, and the number we almost fixed

One file, one thread, and the 910ms that explained the SSB gap

Six groups where the spec allows four

The benchmark that lied, the oracle that didn't, and the day Trino was wrong

columns(2) must match fields(16)

Lake Formation gates the catalog, not the rows

Runnable docs, or how the quickstarts became a test suite

A dashboard for the engine

Speaking Quack: SQE as a DuckDB server, a DuckDB client, and a federation engine

The type matrix as a roadmap: seven DuckDB types in two days

Porting DuckDB's BinarySerializer to pure Rust

The SSB regression that wasn't

q72, our nemesis, and the Int32 that hid for a month

Nineteen MRs, four waves, and the failure modes of agent batches at scale

Nine PRs, two merge conflicts, and the value of themed branches

read_parquet shouldn't read /etc/shadow

Mounting catalogs from SQL: ATTACH, DETACH, and the registry pattern

Shipping OpenLineage: column-level lineage for an Iceberg engine

How we accidentally created a DuckDB

One Binary, No Cluster: SQE Goes Embedded

SQE Talks to Five Catalogs Now: HMS, Nessie, Glue, JDBC, S3 Tables

Why a Public Iceberg Matrix Beats Vendor Spec Sheets

The Iceberg Matrix and the Quiet Bug Hiding in V3

Our Nemesis: TPC-DS Query 72 and the Limits of a Custom SQL Engine

DataFusion 53, a Vendored Fork, and 40% Faster Queries

43 Findings, Zero Deferred: A Production Security Audit of a Rust SQL Engine

How Agentic AI Helped Us Beat Trino

Five Layers of Caching and an 8.8x Speedup Over Trino

Streaming Writes, Sort Order Safety, and the IN (Subquery) Workaround

From 63% to 95%: Building Trino SQL Compatibility in a Single Day

Building a Comprehensive SQL Benchmark Suite

We Replaced Our Trino Fork with a Rust SQL Engine

How We Build Software with AI Assistants

Making SQE Work Everywhere: Pluggable Auth and Catalogs

When Your SQL Engine Understands Meaning