Rukuzu: Porting a 200,000 line codebase from C++ to Rust and eventually making it faster
Loxation Team
- 20 minutes read - 4175 wordsPorting 200K Lines of C++ to Rust with Claude: A Systematic Workflow for Graph Database Translation
The Problem: You Need Both Implementations
Here’s a situation that will sound familiar to anyone building systems software: you have a mature C++ codebase — in our case, kuzu, an embedded graph database with roughly 200,000 lines of C++ — and you want a pure Rust version. Not because the C++ is bad, but because your target environment demands it. Mobile devices. Edge deployments. Contexts where a C++ toolchain isn’t available, where FFI boundaries create debugging nightmares, and where a single statically-linked binary is worth its weight in gold.
But you can’t just flip a switch. You need to keep the C++ version running while you build the Rust one. You need to test them against each other. You need a way to know, at every step, that the Rust port produces the same results as the original.
This article describes the workflow we developed — encoded as a Claude Code custom command — for systematically porting kuzu to a Rust reimplementation we call rukuzu. The workflow emerged from hard experience: 15 ported crates, 2,700+ tests, and a lot of lessons about what translates mechanically and what requires human judgment. The DEALER project (our fuzzy OWLv2 EL++ Description Logic reasoner) consumes both backends through a pluggable architecture, giving us a production-grade testbed for correctness and performance comparison.
Architecture First: The Pluggable Backend
Before discussing the porting workflow, it’s worth understanding why we could afford to port incrementally. DEALER’s persistence layer — the dealer-store crate — uses Rust’s feature flag system to compile against either backend:
[features]
default = ["rukuzu"]
kuzu = ["dep:kuzu"]
rukuzu = ["dep:rukuzu", "dep:rukuzu-common", "dep:rukuzu-storage"]
[dependencies]
rukuzu = { path = "../../../rukuzu/crates/rukuzu", optional = true }
kuzu = { path = "../../../kuzu/tools/rust_api", optional = true }
Switching backends is a single cargo flag:
cargo build --release # rukuzu (default)
cargo build --release --no-default-features --features kuzu # kuzu C++
The trick that makes this work is a trait abstraction over the query interface:
pub trait GraphConnection {
fn query(&self, cypher: &str) -> Result<StoreResult>;
}
Both kuzu’s Connection<'a> and rukuzu’s Connection implement this trait. Everything above — the schema DDL, the ontology importer, the ontology loader, the taxonomy exporter — takes &dyn GraphConnection and never knows which backend it’s talking to. The value types are marshaled through a common StoreValue enum:
pub enum StoreValue {
String(String),
Double(f64),
Float(f32),
Int64(i64),
Int32(i32),
Bool(bool),
Null,
}
This architecture gave us two critical properties. First, we could develop rukuzu incrementally while DEALER continued running against kuzu. Second, we could run the same test suite against both backends and diff the results. When the outputs diverge, you know exactly where to look.
The Claude Code Custom Command
Claude Code supports custom commands — markdown files in .claude/commands/ that encode reusable workflows. Ours is called CONVERT_COMPLEX_CPP_TO_RUST.md, and it encodes everything we learned about systematic porting into a repeatable protocol that Claude follows every time we start a new module.
The command is structured as a mandatory four-step workflow, followed by extensive pattern tables and a pitfalls section. Here’s how it works in practice.
Step 1: Compact Context
Before any porting work begins, Claude summarizes all progress from prior sessions. This sounds mundane, but it’s essential. Porting a graph database is a multi-session project. Claude’s context window is finite. If you don’t start each session by compressing prior state into a concise summary, you lose track of what’s been ported, what’s been tested, and what’s blocking.
We learned this the hard way: early sessions would sometimes re-port code that already existed, or miss dependencies because the conversation had lost track of the dependency graph.
Step 2: Research Both Implementations
This is the step that distinguishes our workflow from naive “translate this file” prompting. Before writing any Rust, Claude reads the C++ reference code and the existing Rust code side by side. The command specifies exactly what to look for: every class, struct, enum, and function in the C++ module, mapped to its Rust equivalent or marked as absent.
The critical instruction is: “Pay special attention to ownership semantics, lifetime relationships, virtual dispatch patterns, shared mutable state, and friend class access.” These are the five categories where C++ and Rust diverge most dangerously. A shared_ptr<T> might become Arc<T>, or it might become owned T if the sharing was incidental. A T* might be &T, &mut T, or Option<&T> depending on whether it’s nullable and mutable. You can’t determine the right translation without reading how the pointer is actually used.
Step 3: Save the Comparison
Claude writes its findings to a plan file before writing any code. The comparison document covers four categories: gaps (C++ code with no Rust equivalent), divergences (Rust code that departs from C++ semantics), architectural decisions (where Rust idioms should replace C++ patterns), and specific C++ files to cross-reference during implementation.
This step forces a pause between analysis and implementation. It’s the engineering equivalent of “measure twice, cut once.” We found that without an explicit written plan, Claude (like any developer) would sometimes start coding from an incomplete mental model and hit ownership problems halfway through.
Step 4: Create Implementation Plan
The plan must be phased, bottom-up, and testable. Each task references the specific C++ file being ported, lists the Rust files to create or modify, includes test count expectations, and must be independently verifiable. The phase template looks like:
Phase N: [Name]
├── N.1 [Foundation type/trait] ← no dependencies
├── N.2 [Core implementation] ← depends on N.1
├── N.3 [Integration/wiring] ← depends on N.1, N.2
├── N.4 [Tests + verification] ← depends on N.3
└── Verify: cargo test + clippy
After every step: cargo build --workspace, cargo test --workspace, cargo clippy --workspace. All must pass with zero warnings. This isn’t negotiable. The command explicitly states: “Never weaken or rewrite a test to hide a bug — fix the code.”
The Translation Pattern Tables
The heart of the command is a set of tables mapping C++ patterns to Rust equivalents. These aren’t theoretical — they’re the patterns that actually worked across 15 crates of porting. A few deserve detailed discussion.
Ownership: The Non-Obvious Cases
The obvious translations — unique_ptr<T> to Box<T>, shared_ptr<T> to Arc<T> — are straightforward. The interesting cases are the non-owning pointers. In kuzu, a raw T* might mean any of four things: a non-owning borrow, a nullable borrow, an output parameter, or a disguised owning pointer that someone forgot to wrap. The command forces Claude to determine the actual semantics from usage context before choosing &T, &mut T, Option<&T>, or owned T.
The most dangerous pattern is shared_ptr<mutex<T>>, which becomes Arc<Mutex<T>> in Rust — but the command specifies parking_lot::Mutex over std::sync::Mutex. The reason: parking_lot doesn’t poison on panic, which matters in a database engine where one failed query shouldn’t corrupt the mutex for subsequent queries.
Class Hierarchies: Enum Dispatch Over Trait Objects
This was perhaps our biggest architectural lesson. kuzu, like most large C++ codebases, uses deep inheritance hierarchies. The naive translation would be trait objects (Box<dyn Trait>). But the command encodes a critical insight: “Prefer enum dispatch over trait objects when the set of variants is closed and known at compile time.”
In practice, this meant converting kuzu’s operator class hierarchy (dozens of classes inheriting from a common base) into Rust enums. kuzu’s LogicalType became an enum with 31 variants. LogicalOperatorType became 49 variants. The payoff: exhaustive match checking catches missing cases at compile time, and there’s no vtable overhead.
The command is explicit: “always use exhaustive match (no _ => wildcard). This ensures the compiler catches missing variants when new ones are added.” This saved us multiple times when adding new features.
The Lifetime Problem in FFI
One pattern worth highlighting is how the kuzu backend handles lifetimes. kuzu’s C++ Connection object borrows from a Database object. In Rust, this means Connection<'a> where 'a is the lifetime of the database. But our GraphExporter struct needs to own both:
pub struct GraphExporter {
conn: Connection<'static>,
#[allow(dead_code)]
db: Box<Database>,
}
The 'static lifetime comes from an unsafe transmute — the database is heap-allocated (stable address) and the connection is dropped before the database (Rust drops fields in declaration order). This is the kind of thing you can’t derive mechanically from the C++ code. The C++ doesn’t have this problem because it doesn’t track lifetimes. In Rust, you have to reason about it explicitly and document the safety invariant.
The rukuzu backend, by contrast, just uses Arc<Database>:
pub struct GraphExporter {
conn: Connection, // holds Arc<Database> internally
}
No unsafe code, no lifetime gymnastics. This is one of the tangible benefits of having the Rust reimplementation: the architecture can be designed around Rust’s ownership model instead of fighting it.
The Pitfalls Section: Hard-Won Knowledge
The command includes a section of specific pitfalls discovered during the port. These are the bugs that cost hours to diagnose and minutes to fix once understood. A few examples:
String identity in bound expressions. kuzu’s C++ code stores string literals with surrounding quotes in bound expressions — "hello" is stored as "\"hello\"". Both parsing and evaluation must strip them consistently. This is exactly the kind of semantic subtlety that a mechanical translation misses.
Anonymous variables need explicit propagation. kuzu’s query planner creates anonymous variables (_anon_0) that exist in the plan tree but not in user-facing maps. In C++, table IDs attached to these variables are accessible through global lookups. In Rust, where we’ve eliminated global state in favor of explicit parameter passing, these IDs must be propagated through the plan structure manually.
Pre-resolve column bindings once, not per-tuple. C++ code might resolve column positions lazily during iteration. In Rust, this pattern is both slower (redundant lookups) and error-prone (inconsistent resolution if state changes mid-iteration). The command specifies: build a column map before the tuple loop.
DML operators must evaluate expressions, not extract literals. SET p.age = p.age + 1 requires evaluating the right-hand side against the matched tuple. Early porting attempts tried to extract literal values, which works for SET p.age = 25 but fails catastrophically for expressions. The command says: “Port the expression evaluator, not just the value extractor.”
Each of these pitfalls was added to the command after we hit it in practice. The command is a living document — every major bug gets encoded as a warning for future sessions.
What Claude Does Well (and Where Humans Still Lead)
After porting 15 crates with this workflow, we have a clear picture of where Claude adds the most value and where human judgment remains essential.
Claude excels at:
Mechanical translation. Type definitions, accessor methods, serialization boilerplate, trait implementations for common derives — Claude can produce these quickly and accurately, especially with the pattern tables for guidance. A C++ struct with 20 fields and getters/setters translates to a Rust struct with derives and pub fields in seconds.
Pattern matching against the tables. When Claude encounters a shared_ptr<T>, it consults the table and produces Arc<T>. When it sees a class hierarchy, it asks whether the set is closed (enum) or open (trait object). The tables encode the decision process that would otherwise require senior Rust experience.
Test generation. Given the C++ test suite, Claude produces equivalent Rust tests rapidly. The #[cfg(test)] mod tests pattern is consistent and Claude follows it reliably.
Consistency enforcement. The verification checklist (cargo build, cargo test, cargo clippy with zero warnings) runs after every step. Claude doesn’t skip it or rationalize why a warning is acceptable.
Humans still lead on:
Ownership architecture. When a C++ module uses shared mutable state in non-obvious ways, determining the correct ownership boundary requires understanding the algorithm’s invariants. Claude can propose options, but a human needs to decide whether Arc<RwLock<T>> or message passing or restructured ownership is the right call.
Semantic fidelity. The pitfalls section exists because mechanical translation can produce code that compiles and passes trivial tests but has subtly wrong semantics. The string quoting issue, the anonymous variable propagation, the expression-vs-literal distinction — these require understanding what the code means, not just what it does.
Performance architecture. Deciding to use enum dispatch instead of trait objects is a performance and maintainability decision. Choosing parking_lot over std::sync is a performance decision. Choosing FxHashMap over HashMap (as we do in DEALER, with 114 uses across the codebase) is a performance decision informed by profiling. Claude can implement any of these once told, but the decision to use them comes from benchmarking and experience.
The Feedback Loop: Pluggable Backends as Verification
The most valuable part of this entire setup is the feedback loop. Because DEALER supports both backends through feature flags, we can run the same ontology classification pipeline against both kuzu and rukuzu and compare results. The Cypher queries are identical. The schema is identical. The only thing that differs is the engine underneath.
When we port a new kuzu module to rukuzu, the verification process is:
- Run DEALER’s test suite with
--features kuzu. Record all taxonomy outputs. - Run the same suite with
--features rukuzu. Record all outputs. - Diff. Any divergence is a bug in the port.
This gives us something rare in porting projects: an oracle. We don’t have to guess whether the Rust code is correct. We can test it against the C++ original using the exact same workload. The pluggable architecture isn’t just convenient — it’s the foundation of our confidence in the port.
Results: The Numbers Tell Two Stories
The workflow has produced rukuzu: a graph database in Rust that can serve as a drop-in replacement for kuzu in embedded contexts. DEALER uses it as the default backend. The test suite passes identically against both backends. The rukuzu build produces a single statically-linked binary with no C++ dependencies — exactly what we needed for mobile deployment.
But the pluggable architecture gave us something beyond correctness verification: it gave us a head-to-head performance comparison on the exact same workload. The results are instructive.
Where the Backends Are Identical
The reasoning pipeline — parsing, normalization, classification, taxonomy extraction — runs the same code regardless of backend. As expected, the timings are indistinguishable:
Parse: ~183 µs (rukuzu) vs ~190 µs (kuzu)
Classify: ~366 µs (rukuzu) vs ~370 µs (kuzu)
Normalize & taxonomy: essentially identical
This confirms that the GraphConnection trait abstraction adds no measurable overhead. The backend selection is a compile-time feature flag, not a runtime dispatch.
Where They Diverge
The interesting numbers are in the database operations — export (writing the classified taxonomy to the graph DB) and query (reading it back):
┌───────────┬──────────────────────┬──────────────────────┬──────────────────────┐
│ Stage │ rukuzu (Rust) │ kuzu (C++) │ Winner │
├───────────┼──────────────────────┼──────────────────────┼──────────────────────┤
│ db_export │ ~128 µs (40k iters) │ ~13.5 ms (400 iters) │ rukuzu ~105× faster │
├───────────┼──────────────────────┼──────────────────────┼──────────────────────┤
│ db_query │ 1.58 ms │ 277 µs │ kuzu ~5.7× faster │
└───────────┴──────────────────────┴──────────────────────┴──────────────────────┘
rukuzu is 105× faster on writes. kuzu is 5.7× faster on reads. This is a classic write-vs-read performance tradeoff, and the reasons are illuminating.
Why rukuzu dominates writes: No FFI boundary. When DEALER exports a taxonomy to rukuzu, it’s calling Rust functions with Rust types — no marshaling, no C ABI overhead, no string conversion across language boundaries. The data flows from DEALER’s FxHashMap<ConceptId, TaxonomyNode> into rukuzu’s storage engine through native Rust method calls. For kuzu, every node insertion and edge creation crosses the FFI boundary: Rust types are converted to C types, passed through extern "C" functions, reconverted on the C++ side, and processed by kuzu’s C++ storage engine. Multiply that overhead by every concept and every subsumption edge in the taxonomy, and you get two orders of magnitude.
Why kuzu dominates reads: kuzu has years of query optimization that rukuzu hasn’t replicated yet. Its C++ query engine uses sophisticated join algorithms, column-oriented storage, and query plans that have been refined across thousands of benchmark runs. rukuzu’s query engine is young — correct, but not yet optimized for complex traversals. This is expected: query optimization is the hardest part of a database engine, and it’s the last thing you port.
What This Means for Deployment
The full classification pipeline — parse, normalize, classify, extract taxonomy, export to database — is dominated by the export cost. For this workload, rukuzu wins handily: the entire pipeline completes in well under a millisecond, versus ~14 ms with kuzu. For a mobile device classifying ontologies on the fly, this is the difference between imperceptible and noticeable.
But if your workload is query-heavy after an initial load — say, a clinical decision support system that classifies once at startup and then handles thousands of subsumption queries — kuzu’s mature query engine gives it an edge. The 277 µs query time is already fast in absolute terms, but for latency-sensitive query loops, it adds up compared to rukuzu’s 1.58 ms.
This is exactly why the pluggable architecture matters. It’s not just a porting convenience — it’s a deployment choice. You can ship rukuzu for write-heavy mobile scenarios and kuzu for query-heavy server scenarios, with the same codebase and the same test suite covering both.
The Workflow in Action: Closing the Query Gap
The benchmarks told us exactly where to focus next. rukuzu’s 105× write advantage is a structural win — it comes from eliminating the FFI boundary, and kuzu can’t match it without rewriting its ingestion path in Rust. But rukuzu’s 5.7× query disadvantage is a solvable engineering problem. kuzu’s query engine has years of optimization; rukuzu’s is young. The gap isn’t architectural — it’s a matter of implementing the same optimizations in Rust.
This is where the workflow comes full circle. The same custom command that guided the initial port now guides the optimization work. The process looks like this:
Step 1: Hand Claude the Benchmark Data
The conversation starts with the concrete numbers: rukuzu query at 1.58 ms, kuzu at 277 µs, 5.7× gap. This isn’t a vague “make it faster” — it’s a specific target with a specific oracle to measure against. Claude knows what “done” looks like: rukuzu query performance within striking distance of kuzu on the same workload.
Step 2: Research kuzu’s Query Engine (Following the Command)
The custom command’s Step 2 kicks in: read both implementations side by side. Claude reads kuzu’s C++ query engine — its query planner, its operator pipeline, its column-oriented storage access patterns — and compares them to rukuzu’s current implementation. The comparison reveals where the gap comes from.
For a graph database handling taxonomy queries like “What are the direct superclasses of MargheritaPizza?”, the critical path is: parse the query, plan the traversal, scan the edge index, and return results. kuzu optimizes each of these stages with techniques accumulated over years: pre-computed adjacency lists, vectorized scans, compiled query plans. rukuzu’s implementation is correct but naive — it may be doing linear scans where kuzu does indexed lookups, or allocating intermediate results where kuzu streams them.
Step 3: Plan the Optimizations
Following the command’s phase template, the optimization work gets broken into testable steps:
Phase 1: Query Plan Analysis
├── 1.1 Profile rukuzu query path — identify hot spots
├── 1.2 Compare kuzu's plan for same query — note differences
├── 1.3 Document optimization targets with expected impact
└── Verify: benchmark baseline recorded
Phase 2: Index-Based Edge Lookup
├── 2.1 Implement adjacency list index (Ref: kuzu/src/storage/adj_list.cpp)
├── 2.2 Wire index into query executor
├── 2.3 Benchmark — expect 2-3× improvement on edge traversals
└── Verify: cargo test + benchmark vs baseline
Phase 3: Reduce Allocation in Hot Path
├── 3.1 Profile allocation patterns during query
├── 3.2 Replace Vec allocations with pre-sized buffers or arena
├── 3.3 Benchmark — expect further 1.5-2× improvement
└── Verify: cargo test + benchmark vs Phase 2
Each phase references the specific kuzu C++ code being studied, produces measurable improvement, and runs the full test suite to ensure correctness isn’t sacrificed for speed.
Step 4: Benchmark Against the Oracle
After each optimization phase, the pluggable architecture provides the answer: run DEALER’s benchmark suite with --features rukuzu, compare to the kuzu baseline. The gap should narrow. If it doesn’t, the optimization targeted the wrong bottleneck — go back to profiling.
This is the feedback loop that makes the whole system work. The benchmarks aren’t just a report card — they’re the input to the next iteration of the custom command workflow. Each round of optimization generates new patterns (what worked), new pitfalls (what didn’t), and new entries in the command document for future sessions.
Why This Works Better Than “Just Optimize”
A developer working without this structure would likely profile rukuzu’s query path, find something slow, fix it, and hope for the best. The custom command forces a more disciplined approach: study how kuzu solves the same problem first, then implement the correct solution in Rust. The command is explicit about this: “Go back to the C++ reference implementation. Read how it handles the case. Then implement the correct solution in Rust, regardless of difficulty.”
This matters because query optimization is full of tempting shortcuts. You might add a cache that helps the benchmark but hurts real workloads. You might optimize for a single query pattern while regressing others. Studying kuzu’s battle-tested approach first — and implementing it in idiomatic Rust rather than cargo-culting the C++ — produces optimizations that are both effective and maintainable.
The custom command’s pitfall about pre-resolving column bindings is a case in point. kuzu’s C++ code resolves column positions lazily during iteration. This is a performance antipattern that kuzu gets away with because its column-oriented storage makes resolution cheap. In rukuzu, with different storage internals, lazy resolution showed up as a measurable bottleneck. The right fix wasn’t to copy kuzu’s lazy approach — it was to pre-resolve once before the tuple loop, which is both faster and more idiomatic in Rust.
Where Things Stand
The Claude Code custom command has been refined through 15 crate-porting sessions, and it continues to evolve as we move from porting to optimization. Every session makes it better — new patterns get added to the tables, new pitfalls get documented, new verification steps get tightened. The command is now about 3,000 words of condensed porting knowledge. It’s the kind of institutional knowledge that would normally live in a senior engineer’s head and get transmitted through pairing sessions. Instead, it’s a markdown file that any Claude session can follow.
The broader lesson is about the nature of AI-assisted systems programming. Claude doesn’t replace the architect — it amplifies them. The human designs the ownership model, identifies the architectural divergences, reads the benchmark data, and decides which optimization to pursue. Claude handles the volume: the hundreds of type definitions, the thousands of test translations, the relentless consistency of running clippy after every change. The custom command is the contract between them: the human encodes their expertise as constraints and patterns, and Claude executes faithfully within those constraints.
For anyone contemplating a large C++ to Rust port, our advice so far: build the pluggable architecture first. Keep both implementations running. Benchmark them against each other on your actual workload — the results will surprise you, and they’ll tell you exactly where to focus. Encode your porting patterns in a reusable document. Test against the oracle at every step. And treat the porting workflow itself as a living document — it’s not overhead, it’s the thing that makes both the port and the subsequent optimization possible.
Coming in Part 2
The benchmarks in this article were run against a small test ontology — the pizza subset with 33 axioms. That’s useful for establishing the methodology, but it doesn’t tell us how the two backends behave under realistic load. Clinical ontologies like SNOMED CT have 350,000+ concepts. The Gene Ontology has 40,000+. The write/read tradeoff we observed — rukuzu 105× faster on exports, kuzu 5.7× faster on queries — may look very different at scale. Write overhead is often O(n) in the number of nodes and edges; query optimization gains may compound as indexes become more valuable on larger graphs.
In Part 2, we’ll present the results of the query optimization work currently underway: what specific kuzu optimizations we ported to rukuzu, how much of the 5.7× query gap we closed, and what the numbers look like on larger and more complex ontologies. We’ll also discuss what the optimization process itself taught us about the custom command workflow — which patterns needed updating, which new pitfalls emerged, and whether the “study the C++ first, then implement in idiomatic Rust” discipline held up when the C++ approach was fundamentally at odds with Rust’s ownership model.
The goal isn’t parity. It’s understanding the performance envelope well enough to make the right deployment choice for each workload — and having the tooling to keep narrowing the gap.
This is Part 1 of a four-part series. Part 2 will present query optimization results and benchmarks on production-scale ontologies.
About the Author
Jonathan Borden, M.D. served as an invited expert on the W3C OWL Working Group and founder and architect at Loxation. The DEALER project and the kuzu-to-rukuzu porting workflow described here are part of Loxation’s initiative to bring ontology reasoning to mobile and edge devices. Reach him at jonathan@loxation.com.