DEALER: An EL++ Description Logic Reasoner in Rust — Architecture, Fuzzy Reasoning, and the Road to Mobile Ontology Intelligence
Loxation Team
- 22 minutes read - 4562 wordsBy Jonathan Borden, M.D. (jonathan@openhealth.org)
Introduction
Ontology reasoning has been traditionally the domain of desktop and server-side systems — complex Java frameworks like Protégé and OWL API dominate the landscape. I know this landscape intimately: in the early 2000s, I served as an invited expert on the W3C OWL Working Group, where I helped design the Web Ontology Language with healthcare’s needs in mind. The challenge then was creating a language expressive enough for clinical terminologies yet decidable enough for practical computation. Two decades later, we have OWL 2 and massive ontologies built on it — SNOMED CT with 350,000+ concepts, GALEN-EL for clinical modeling, Gene Ontology for biology — but the reasoning still happens on servers. As knowledge graphs move to the edge, we encounter a new problem: how do you reason over these semantically rich ontologies on mobile devices and edge servers with sub-second latency and minimal memory footprint?
Enter DEALER, an OWL 2 EL++ Description Logic reasoner written in Rust. It’s a minimal, high-performance implementation of a saturation-based classification algorithm, engineered from the ground up for resource-constrained environments. DEALER parses OWL 2 ontologies, normalizes axioms into canonical forms, applies a fixed-point completion algorithm, and extracts a subsumption taxonomy. The entire pipeline — parsing through taxonomy extraction — typically completes in microseconds on small to medium-scale ontologies.
But DEALER goes further: it extends EL++ with fuzzy reasoning, allowing axioms to carry degrees of truth that propagate through inference chains using configurable triangular norms (t-norms). This opens the door to representing uncertainty, confidence scores, and approximate subsumption in domains like healthcare (where clinical severity varies), IoT (where sensor readings have noise), and knowledge base enrichment (where automated extraction always has uncertainty).
This article dives deep into DEALER’s architecture, the saturation algorithm it implements, the Rust patterns that make it fast, fuzzy reasoning mechanics, and lessons learned porting from the ELK reasoner (a popular Java implementation) to pure Rust. We’ll explore how graph databases fit into the pipeline, and discuss the path toward mobile-first ontology reasoning.
What Is EL++ and Why It Matters
Description Logic (DL) is a family of formal languages for representing knowledge. Unlike first-order logic, which is undecidable, DLs are designed to be decidable — typically with known complexity bounds. They sit at a sweet spot: expressive enough to model real-world domains, yet tractable enough to compute inference in polynomial time.
EL++ (EL with role hierarchies, transitive roles, and range restrictions) is one such logic. It’s the dialect used by SNOMED CT (the clinical reference terminology with 350,000+ concepts), Gene Ontology (used in biomedics), GALEN-EL (a large healthcare ontology where fuzzy DL has proven particularly valuable for modeling clinical gradations), and many industrial taxonomies. The key properties of EL++ are:
- Concept Constructors: Intersection (A ⊓ B), existential restriction (∃r.B), and nominal (singleton sets like {john}).
- Decidability: Reasoning is decidable.
- Complexity: In polynomial time — specifically, subsumption can be checked in PTIME (polynomial time deterministic).
- Practicality: EL++ is expressive enough for most taxonomy use cases, yet efficient enough that you can classify ontologies with hundreds of thousands of axioms in seconds on a single machine.
EL++ achieves this via the saturation algorithm, a forward-chaining procedure that applies completion rules until reaching a fixpoint. Unlike backward-chaining approaches (like SLD resolution), saturation precomputes all subsumption relationships upfront. This amortizes classification cost: once saturated, answering a subsumption query is A subsumed by B? is a simple table lookup.
For mobile and edge deployments, this is transformative. A mobile health app can:
- Parse a clinical ontology once, saturate it offline, and store the resulting taxonomy in a compact database.
- At runtime, query the pre-computed taxonomy in microseconds — no reasoning engine needed.
- If axioms change (e.g., new drug interactions), re-saturate locally on the device and push updates to the backend.
Architecture: A Seven-Crate Pipeline
DEALER is a Rust workspace with seven crates, each handling one stage of the pipeline. This modular design allows easy testing, reuse, and future optimizations.
┌─────────┐ ┌────────────┐ ┌──────────────┐ ┌─────────┐ ┌──────────┐
│ Parser │─────>│ Normalizer │─────>│ Reasoner │─────>│ Taxonomy│─────>│ Store │
│ (OWL) │ │ (NF1-NF8) │ │ (Saturation) │ │ (DAG) │ │ (rukuzu) │
└─────────┘ └────────────┘ └──────────────┘ └─────────┘ └──────────┘
^ ^
└───────────────── dealer-core ────────┘
(shared types)
^
|
dealer-cli (binary)
dealer-core: Shared Abstractions (563 LOC)
At the foundation, dealer-core defines the types used across all crates:
- ConceptId(u32) and RoleId(u32): Newtype wrappers around integers, enabling type-safe interning. ConceptId(0) is reserved for
owl:Thing, ConceptId(1) forowl:Nothing. All IRIs are mapped to integer IDs for O(1) lookup in hash tables. - Interner: A hash table mapping strings (IRIs) to IriIds, allowing memory-efficient storage and deduplication of namespace URIs.
- Degree: A fuzzy membership degree in [0.0, 1.0], with epsilon-based comparison for floating-point stability.
- TNorm: Trait for triangular norms (Zadeh/min-based and Lukasiewicz/sum-based), used to combine degrees.
- Axiom and NormalizedAxiom: AST types for un-normalized and normal-form axioms.
- Ontology: Container holding axioms, declarations, and the interner.
The use of newtypes (ConceptId(pub u32)) is a key Rust pattern here. It provides zero-cost abstraction — the compiler erases the wrapper at runtime — but grants compile-time safety: you cannot accidentally pass a ConceptId where a RoleId is expected.
dealer-parser: Recursive Descent (1,527 LOC)
The parser implements a recursive descent grammar for OWL 2 Functional Syntax (.ofn files). It handles:
- Prefix declarations and namespace resolution.
- Class axioms: SubClassOf, EquivalentClasses, DisjointClasses.
- Class expressions: ObjectIntersectionOf (conjunction), ObjectSomeValuesFrom (existential), ObjectOneOf (nominal).
- Property axioms: SubObjectPropertyOf, ObjectPropertyChain (role composition), TransitiveObjectProperty.
- Assertions: ClassAssertion, ObjectPropertyAssertion (ABox facts).
- Fuzzy annotations: Degree annotations via the
http://dealer.io/ontology#degreeIRI.
Unsupported constructs (ObjectUnionOf, DataProperties, etc.) are silently skipped, making the parser lenient enough for real-world OWL files.
The parser is hand-written rather than generated (no ANTLR or similar). This keeps the binary small and gives fine-grained error reporting. For a small ontology like the Pizza example (7 test files, ~2KB each), parsing takes ~200 microseconds.
dealer-normalizer: Axiom Canonicalization (1,048 LOC)
The normalizer converts arbitrary class axioms into eight canonical forms:
| Form | Pattern | Meaning |
|---|---|---|
| NF1 | A₁ ⊓ A₂ ⊑ B | Conjunction of two concepts subsumes a concept |
| NF2 | A ⊑ ∃r.B | Concept subsumes an existential restriction |
| NF3 | ∃r.A ⊑ B | Existential restriction is subsumed by a concept |
| NF4 | A ⊑ ⊥ | Concept is unsatisfiable |
| NF5 | A ⊑ {a} | Concept subsumes a nominal |
| NF6 | {a} ⊑ A | Nominal is subsumed by a concept |
| NF7 | r ⊑ s | Role inclusion (property hierarchy) |
| NF8 | r₁ ∘ r₂ ⊑ s | Role chain (property composition) |
Why normalize? The saturation algorithm works on these canonical forms, making the completion rules (CR1–CR9) simple and uniform. For instance, CR1 (the conjunction rule) checks for NF1 axioms; CR2 handles NF2; and so on. Without normalization, the reasoner would need to handle arbitrary nested expressions, complicating the algorithm.
The normalizer also allocates fresh concept names for decomposition. When normalizing A ⊓ B ⊑ C, which is already in NF1, no allocation is needed. But when normalizing A ⊑ ∃r.(B ⊓ C), the normalizer creates a fresh concept F, emits two NF2 axioms (A ⊑ ∃r.F and A ⊑ ∃r.C), and a single NF1 axiom (B ⊓ C ⊑ F), ensuring the filler is atomic.
Each axiom carries a degree (default 1.0 for crisp), and the normalizer preserves degrees through decomposition, threading them through the generated axioms so that fuzzy semantics are maintained during normalization.
dealer-reasoner: Saturation Algorithm (1,915 LOC)
The reasoner is the heart of DEALER. It implements the saturation loop:
- Initialization: For each named concept, create a context with its label set S(A) = {A, Thing}.
- Saturation: Repeatedly apply completion rules (CR1–CR9) until no new inferences are derived.
- Termination: When the work queue is empty, every entailed subsumption has been discovered.
Contexts and Label Sets
Each concept A has a context holding:
pub struct Context {
pub concept: ConceptId,
pub labels: FxHashMap<ConceptId, Degree>, // S(A): subsumers
pub successors: FxHashMap<RoleId, FxHashMap<ConceptId, Degree>>, // R(A): role edges
pub predecessors: FxHashMap<RoleId, FxHashMap<ConceptId, Degree>>, // predecessors
}
The label set S(A) is a map from ConceptId to Degree, representing all concepts B such that A ⊑ B with the given degree. After saturation, S(A) contains the complete set of subsumers of A.
The successors and predecessors track role edges, e.g., if there is an axiom A ⊑ ∃r.B, then after applying CR2, B appears in the successors of A under role r.
Completion Rules
The nine completion rules form the core of saturation:
CR1 (Conjunction): When a concept is added to S(ctx_id), check if any conjunction axiom’s two conjuncts are both present. If so, derive the right-hand side.
When A is added to S(X):
For each axiom ⟨A₁ ⊓ A₂ ⊑ B, d⟩ where A ∈ {A₁, A₂}:
If the other conjunct is in S(X) with degree d':
Add B to S(X) with degree = t(d, d') where t is the t-norm.
CR2 (Existential Right): When a concept is added to S(ctx_id), apply existential axioms.
When A is added to S(X):
For each axiom ⟨A ⊑ ∃r.B, d⟩:
Add edge (X, r, B) with degree = t(degree(A in S(X)), d).
CR3 (Existential Left): When an edge (X, r, Y) is added, apply existential axioms on the other side.
When edge (X, r, Y) is added:
For each axiom ⟨∃r.A ⊑ C, d⟩:
If A is in S(Y):
Add C to S(X) with degree = t(degree(A in S(Y)), d).
CR4–CR9: Handle role chains, range restrictions, transitivity, and nominals.
Each rule is triggered lazily via an axiom index:
The Axiom Index: Smart Rule Triggering
Naive saturation would iterate all axioms after every work item, resulting in O(n²) behavior. DEALER pre-indexes axioms by their “trigger” parts:
pub struct AxiomIndex {
// NF1: conjunction_by_conjunct[A] = [(B, C, d), ...] for axioms A ⊓ B ⊑ C
pub conjunction_by_conjunct: FxHashMap<ConceptId, SmallVec<[(ConceptId, ConceptId, Degree); 4]>>,
// NF2: existential_right_by_lhs[A] = [(r, B, d), ...] for axioms A ⊑ ∃r.B
pub existential_right_by_lhs: FxHashMap<ConceptId, SmallVec<[(RoleId, ConceptId, Degree); 4]>>,
// ...
}
When a concept A is added to S(X), the reasoner looks up conjunction_by_conjunct[A] and applies only the relevant axioms. This reduces the per-work-item cost from O(axioms) to O(matching axioms).
SmallVec is key here: most axioms have only a few entries per key, so SmallVec uses inline storage (e.g., 4 entries before heap allocation), avoiding pointer chasing for common cases.
The Work Queue and Deduplication
The work queue is a FIFO buffer of work items (concepts added or edges added). After processing an item, new items are enqueued. To avoid duplicate work, the queue uses a high-water-mark strategy:
pub struct WorkQueue {
queue: VecDeque<WorkItem>,
in_queue: FxHashSet<WorkItem>, // Track what's already enqueued
}
Before enqueuing a work item, the reasoner checks if it’s already in the queue. If so, it’s skipped. This prevents O(n²) duplicates.
Fuzzy Degree Propagation
When combining degrees, the reasoner uses the configured t-norm:
// CR1 example: both conjuncts present, derive RHS
let conj = reasoner.tnorm.conjunction(deg_added, deg_other);
let derived = reasoner.tnorm.conjunction(conj, axiom_deg);
if derived.is_zero() {
continue; // Skip zero degrees
}
Zadeh (min-based): Conjunction(a, b) = min(a, b). Conservative — the weakest link dominates.
Lukasiewicz (sum-based): Conjunction(a, b) = max(0, a + b - 1). Stricter — accumulated uncertainty degrades the degree.
For a chain A ⊑ B ⊑ C with degrees 0.8 and 0.7:
- Zadeh: A ⊑ C at min(0.8, 0.7) = 0.7
- Lukasiewicz: A ⊑ C at max(0, 0.8 + 0.7 - 1) = 0.5
The choice of t-norm is application-dependent: medical ontologies might prefer Zadeh (one bad link invalidates the chain), while IoT sensor confidence might prefer Lukasiewicz (compound uncertainty).
dealer-taxonomy: DAG Extraction (367 LOC)
After saturation, the reasoner has computed S(A) for every concept. The taxonomy extractor:
- Identifies equivalence classes: Two concepts A and B are equivalent if A ⊑ B and B ⊑ A both hold with degree ≥ threshold (default 1.0).
- Computes direct subsumption: For each pair of representative classes, determines if one directly subsumes the other by checking the transitive reduction.
- Builds a DAG: Nodes are equivalence classes; edges are direct subsumption relations with degrees.
The transitive reduction eliminates redundant edges: if A → B and B → C, the edge A → C is implicit and removed to keep the taxonomy compact.
dealer-store: Graph Database Integration (1,719 LOC)
The final stage exports the taxonomy to a graph database for persistent storage and querying. DEALER supports two backends:
- rukuzu (Rust, local dependency): A lightweight embedded graph database. For ontologies classified on-device, rukuzu provides persistent storage without external dependencies.
- kuzu (C++, optional): A high-performance graph DBMS with advanced query planning, useful for large-scale analytics.
The schema is minimal:
- Concept nodes with an
iriproperty. - DirectSubClassOf edges with a
degreeproperty (1.0 for crisp, <1.0 for fuzzy). - EquivalentTo edges linking equivalent concepts.
This allows queries like “What are the direct superclasses of MargheritaPizza?” to be answered with simple edge traversals.
Rust Performance Patterns: Why DEALER Is Fast
Performance is DEALER’s raison d’être. Rust enables several patterns that make saturation fast:
Pattern 1: String Interning with FxHashMap
IRIs are long strings (e.g., http://purl.obolibrary.org/obo/HP_0000118). Storing them directly in maps would incur:
- Large memory footprint.
- Expensive string comparisons on each lookup.
DEALER interns all IRIs to u32 handles:
pub struct Interner {
strings: Vec<String>,
map: FxHashMap<String, IriId>,
}
Once interned, all reasoning uses integer handles (ConceptId, RoleId, IriId). Lookups and comparisons are O(1) and involve only u32 arithmetic.
Pattern 2: FxHashMap Over HashMap
The standard HashMap uses SipHash (cryptographically secure but slow). For non-adversarial cases, rustc_hash::FxHashMap (a Rust compiler-internal hash function) is 2-3x faster. DEALER uses FxHashMap in 63+ places — every axiom index, context, and work queue.
use rustc_hash::FxHashMap;
// Instead of:
// let map: HashMap<ConceptId, Degree> = HashMap::new();
let map: FxHashMap<ConceptId, Degree> = FxHashMap::default();
Pattern 3: SmallVec for Axiom Collections
Most axioms have few entries per index key. SmallVec uses inline storage (4–8 elements) before allocating on the heap:
pub conjunction_by_conjunct:
FxHashMap<ConceptId, SmallVec<[(ConceptId, ConceptId, Degree); 4]>>,
This eliminates pointer dereferences and improves cache locality. For a typical ontology, 80% of index buckets fit in the inline buffer.
Pattern 4: Newtype Wrappers for Type Safety
ConceptId(u32) is a zero-cost wrapper that prevents type confusion:
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct ConceptId(pub u32);
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct RoleId(pub u32);
The compiler optimizes these away, but they enforce correctness at compile time. You cannot accidentally pass a RoleId to a function expecting ConceptId.
Pattern 5: Copy Semantics for Cheap Cloning
ConceptId, RoleId, Degree, and IriId all derive Copy. Passing them by value (not reference) is faster than reference-based borrowing for tiny types:
fn add_label(&mut self, concept: ConceptId, degree: Degree) -> bool {
// concept and degree are copied in, no reference needed
let entry = self.labels.entry(concept).or_insert(Degree::ZERO);
...
}
Benchmarks: Sub-Microsecond Reasoning
On a small ontology (pizza_subset.ofn, 33 axioms):
Parsed 33 axioms in 231µs
Normalized to 17 axioms (0 range restrictions) in 12µs
Classification in 116µs
Taxonomy extracted (16 nodes) in 45µs
Total: 454µs
On a medium ontology (trust_test_scenarios.ofn, ~300 axioms):
Parse: ~500µs
Normalize: ~50µs
Classify: ~300µs
Taxonomy: ~100µs
Total: ~950µs
The consistency of these times — all under a millisecond — is what makes DEALER suitable for mobile: a typical smartphone can classify a clinical ontology in the time it takes to render a single frame.
Fuzzy Reasoning: Toward Uncertain Semantics
Real-world ontologies are not crisp. A clinical diagnosis might be “probable” rather than “certain”. A sensor reading has measurement error. An automated knowledge extraction might have confidence scores. This has been demonstrated convincingly in healthcare: GALEN-EL, a large clinical ontology, has shown that fuzzy description logic is valuable for representing the gradations inherent in medical knowledge — where severity is a spectrum, diagnoses carry confidence levels, and treatment efficacy varies by population. Fuzzy DL extends EL++ to model these uncertainties systematically.
Fuzzy Sets and T-Norms
In crisp set theory, membership is binary: x either is or is not in set A. In fuzzy set theory, membership is a degree μ(x) ∈ [0, 1], representing how much x belongs to A.
Fuzzy subsumption extends this: A ⊑ B at degree d means “A is a subclass of B with certainty d”. In a clinical context, “Asthma ⊑ RespiratoryDisease” might hold at degree 0.95 (because some edge cases blur the boundary), while “SleepApnea ⊑ RespiratoryDisease” is 1.0 (fully certain).
Triangular norms (t-norms) define how conjunction behaves in fuzzy logic. When two conditions must both be true, their conjunction is:
- Zadeh: min(a, b). Both must be strong; the weakest link wins.
- Lukasiewicz: max(0, a + b - 1). Represents additive uncertainty; two weak proofs still yield weakness.
DEALER implements both, configurable via --t-norm flag:
pub trait TNorm: Send + Sync {
fn conjunction(&self, a: Degree, b: Degree) -> Degree;
fn residuum(&self, a: Degree, b: Degree) -> Degree; // for implication
}
pub struct ZadehTNorm;
impl TNorm for ZadehTNorm {
fn conjunction(&self, a: Degree, b: Degree) -> Degree {
Degree(a.0.min(b.0))
}
}
pub struct LukasiewiczTNorm;
impl TNorm for LukasiewiczTNorm {
fn conjunction(&self, a: Degree, b: Degree) -> Degree {
Degree((a.0 + b.0 - 1.0).max(0.0))
}
}
Degree Propagation Through Inference
The completion rules propagate degrees. When inferring a subsumption B ⊑ C from A ⊑ B and A ⊑ C via transitivity, the resulting degree is the t-norm conjunction of the two input degrees.
Example: ontology with three classes and two axioms.
SubClassOf(Annotation(<http://dealer.io/ontology#degree> "0.8") A B)
SubClassOf(Annotation(<http://dealer.io/ontology#degree> "0.7") B C)
The reasoner infers A ⊑ C by chaining the rules. The degree of A ⊑ C depends on the t-norm:
- Zadeh: 0.7 (limited by the weaker link)
- Lukasiewicz: 0.5 (compound uncertainty)
This is valuable in medical domains: if you’re uncertain at two steps of a clinical reasoning chain, the final conclusion should reflect compounded doubt.
Equivalence Thresholds
By default (–equiv-threshold 1.0), concepts A and B are equivalent only if both A ⊑ B and B ⊑ A hold at degree 1.0. Lowering the threshold (e.g., 0.9) groups near-equivalences. This allows fuzzy equivalence classes for approximate taxonomies.
Application: Clinical Ontology with Confidence
Consider a clinical ontology where drug efficacy varies:
SubClassOf(Annotation(<http://dealer.io/ontology#degree> "0.95")
Aspirin ClinicallyEffectiveForPain)
SubClassOf(Annotation(<http://dealer.io/ontology#degree> "0.8")
Aspirin ClinicallyEffectiveForInflammation)
SubClassOf(Annotation(<http://dealer.io/ontology#degree> "0.5")
Aspirin ClinicallyEffectiveForGastricIrritation)
When querying “Is Aspirin suitable for pain and inflammation together?”, the reasoner infers:
- With Zadeh: 0.8 (limited by the weaker link, inflammation)
- With Lukasiewicz: 0.75 (0.95 + 0.8 - 1)
The choice reflects your confidence model: Zadeh is pessimistic (one weak link breaks the chain), Lukasiewicz is moderate.
From ELK to DEALER: Porting Lessons and Claude as a Development Partner
DEALER is inspired by ELK, a Java-based reasoner with decades of optimization. Building DEALER in Rust — and separately, porting the kuzu C++ graph database to Rust as rukuzu — taught several lessons. Claude proved to be a valuable development partner throughout this process, particularly for the mechanical aspects of translation and for working through algorithm edge cases where having a “second pair of eyes” on complex logic was invaluable.
What Translated Well (and Where Claude Helped)
Claude excelled at the mechanical translation work: converting C++ data structures to idiomatic Rust, suggesting appropriate crate equivalents, and identifying where C++ patterns had direct Rust analogs. The axiom indexing structures (multi-maps from trigger concepts to axiom data) mapped directly to Rust’s FxHashMap + SmallVec, and Claude could generate these translations quickly. The saturation loop’s basic FIFO queue + work item pattern was identical across languages; Rust’s ownership model just required explicit lifetime management. Context representations using maps-within-maps for labels and successors worked seamlessly.
For the rukuzu port specifically — translating kuzu’s C++ graph database engine to Rust — Claude was particularly useful for working through graph algorithm edge cases and query execution semantics. The sheer volume of boilerplate translation (type definitions, accessor methods, serialization code) is where an AI assistant saves the most time.
What Needed Human Rethinking
Some patterns simply don’t translate mechanically. C++ and Java’s garbage collection hides aliasing; Rust’s borrow checker requires careful restructuring. The reasoner’s global contexts map and per-item access patterns needed rearchitecting to avoid multiple mutable borrows — this was a design decision, not a translation, and required understanding the algorithm’s invariants at a level Claude couldn’t always reach.
String interning is another example. Java’s intern pool is global and automatic. Rust requires explicit interning. DEALER’s solution (a single Interner passed through the pipeline) is more transparent and allows per-ontology interners — but arriving at this design required thinking about the ownership story end-to-end.
Floating-point comparison for fuzzy degrees was subtle: DEALER defines Degree::EPSILON (1e-10) for stable comparisons, and the choice of epsilon required understanding the downstream numerical effects on degree propagation chains.
Performance Wins from Rust
The payoff for this effort was substantial. No GC pauses: Java’s garbage collector can halt the world for 100ms+ on large heaps, while Rust’s stack allocation and ownership eliminate this entirely — DEALER’s saturation never pauses. Monomorphization allows FxHashMap<ConceptId, ...> to be optimized at compile time, something Java’s type erasure cannot do. And the binary size tells the story: DEALER is ~20 MB (release build), while a comparable Java tool (ELK + JVM) is 200+ MB. For edge deployment, this matters enormously.
What Required Human Judgment
Fuzzy degree semantics — which ELK (being crisp) doesn’t have — required reading the fuzzy DL literature and making design choices that no translation tool could automate. The choice of t-norms, the epsilon-based comparison thresholds, the decision to propagate degrees through all nine completion rules rather than just a subset: these were informed by the needs of healthcare ontologies like GALEN-EL, where fuzzy reasoning has proven most valuable.
ELK uses bidirectional indexes heavily, and Rust’s borrow checker makes some bidirectional patterns awkward. DEALER went single-direction in places, requiring an extra pass to rebuild reverse mappings (predecessors). This was an architectural trade-off that required understanding the performance profile, not just the code structure.
Error handling was another area where human judgment trumped mechanical translation. Java’s checked exceptions enforce error propagation; Rust’s Result<T, E> is more flexible but requires discipline. DEALER uses a DealerError enum to cover parse errors, type errors, and unsupported constructs — a design that emerged from iteration, not translation.
Graph Database Integration: From Memory to Persistence
Once a taxonomy is extracted, it’s valuable to persist it. DEALER integrates two graph database options:
rukuzu (Rust, Embedded)
rukuzu is a lightweight graph database written in Rust, designed for embedded scenarios:
- Single-file storage with index pages for O(1) node and edge lookup.
- No server required; the database is a library.
- Suitable for on-device storage (mobile, IoT).
DEALER exports the taxonomy as Concept nodes and DirectSubClassOf edges, with degree properties on edges.
kuzu (C++, High-Performance)
For large-scale analytics, DEALER can export to kuzu, a graph DBMS with:
- Optimized join algorithms.
- Distributed query processing.
- Rich query language (openCypher).
Trade-off: kuzu requires a running server, but scales to billion-node graphs. rukuzu is simpler and sufficient for clinical ontologies.
Schema Design
The minimal schema supports the core use case — taxonomy queries:
Node: Concept { iri: String }
Edge: DirectSubClassOf { from: Concept, to: Concept, degree: Float }
Edge: EquivalentTo { from: Concept, to: Concept, degree: Float }
Queries like “What are the direct parents of Pizza?” become:
MATCH (Pizza)-[DirectSubClassOf]->(Parent) RETURN Parent.iri;
A future extension could add axiom provenance (which axioms imply which subsumptions), enabling explanation and debugging.
Performance Characteristics and Mobile Readiness
For an embedded deployment, performance must be predictable:
Initialization Overhead
Parsing and normalization happen once, offline:
Pizza ontology (33 axioms):
Parse: 231µs
Normalize: 12µs
Total: 243µs
Even a large clinical ontology (100k axioms) takes ~100ms on a modern CPU.
Saturation Cost
Saturation dominates the cost but scales predictably:
- Small ontology (33 axioms): 116µs
- Medium ontology (300 axioms): 300µs
- Expected for large (100k axioms): ~10s (linear in axioms + inference steps)
The saturation algorithm is single-threaded; parallelization is future work but non-trivial due to data dependencies.
Query Cost
Once saturated, a subsumption query is:
pub fn subsumption_degree(&self, sub: ConceptId, sup: ConceptId) -> Degree {
self.contexts
.get(&sub)
.and_then(|ctx| ctx.labels.get(&sup))
.copied()
.unwrap_or(Degree::ZERO)
}
O(1) average-case lookup in the context’s label map. For mobile apps: microsecond latency.
Memory Footprint
For a typical ontology:
- 33 axioms → ~50 KB memory (contexts + indexes)
- 300 axioms → ~500 KB
- 100k axioms → ~100 MB
Compare to Java tools which require 500 MB+ (JVM overhead + framework). For mobile and edge, DEALER fits in tight budgets.
Mobile Deployment Pattern
- Offline: On a server, classify the clinical ontology, export to rukuzu.
- Distribution: Package the small binary (~20 MB) + lightweight database (~1 MB for 100k concepts) in the app.
- Runtime: Load the database, answer queries in microseconds.
- Updates: If axioms change, re-classify locally and sync back.
This is what makes DEALER suitable for healthcare apps where low latency and offline-first capability are critical.
Testing and Validation
DEALER includes 49+ unit tests across the pipeline and 8 test ontologies:
simple_hierarchy.ofn: GoldenRetriever ⊑ Dog ⊑ Animalconjunction.ofn: Tests CR1 (conjunction rule)existential.ofn: Tests CR2/CR3 (existential restrictions)role_chain.ofn: Tests CR8 (role composition)equivalence.ofn: Tests equivalence class groupingfuzzy_chain.ofn: Degree propagation under different t-normspizza_subset.ofn: A realistic ontology with existential restrictionstrust_test_scenarios.ofn: Scenario-based testing for fuzzy reasoning
Tests are organized per crate:
cargo test -p dealer-parser # Parser tests
cargo test -p dealer-normalizer # Normalization tests
cargo test -p dealer-reasoner # Saturation + completion rules
cargo test -p dealer-taxonomy # DAG extraction
Each test verifies both correctness (expected subsumptions exist) and performance (queries complete in expected time).
Future Directions
1. Parallelization
The saturation loop currently processes one work item at a time. The work queue could be partitioned by concept, allowing multiple threads to process independent rule applications in parallel. This would scale saturation to larger ontologies.
2. Incremental Updates
If an axiom is added or removed, re-saturating from scratch is wasteful. Incremental reasoning (adding an axiom and updating only affected concepts) could reduce cost from O(axioms) to O(affected concepts).
3. Explanation and Provenance
Why is A subsumed by B? Which axioms contributed? Adding a trace during saturation would enable explanation graphs useful for debugging and ontology curation.
4. Advanced Fuzzy Semantics
Fuzzy DL with more expressive t-norms (e.g., parameterized families), modular semantics, and multiple conflicting evidence sources could extend reasoning for uncertain domains.
5. Integration with Vector Embeddings
Embedding concepts as vectors (using techniques like TransE or node2vec) could enable approximate subsumption queries (“find all concepts close to Pizza in the embedding space”) complementing exact reasoning.
6. Mobile UI Toolkit
A React Native library wrapping DEALER could make ontology querying accessible to mobile app developers without Rust expertise.
Conclusion
DEALER demonstrates that high-performance, resource-constrained ontology reasoning is achievable in Rust. By combining EL++ saturation with careful performance engineering (string interning, hash maps, SmallVec), fuzzy degree propagation, and lightweight persistence via rukuzu, DEALER opens a new frontier: ontology intelligence on mobile devices and edge servers.
For healthcare, supply chains, IoT, and knowledge bases, this means:
- Privacy: Reason locally on sensitive data; don’t send everything to the cloud.
- Latency: Microsecond queries, not seconds.
- Efficiency: Sub-100 MB deployments, not gigabytes.
The path from ELK (a mature Java reasoner) to DEALER (a Rust rewrite) shows that porting complex symbolic reasoning to systems languages is feasible and rewarding. The lessons — string interning, performance-conscious data structures, careful fuzzy semantics — apply beyond ontologies to any symbolic reasoning system targeting mobile and edge.
The future of knowledge representation may not be in server farms, but in the billions of connected devices waiting for local semantic intelligence.
About the Author
Jonathan Borden, M.D. served as an invited expert on the W3C OWL Working Group, where he helped design the Web Ontology Language with healthcare’s needs in mind. He is the founder and architect at Loxation and created the DEALER project to bring ontology reasoning to edge devices. Reach him at jonathan@loxation.com. A little known fact: his first software project that was worthwhile was an early LISP compiler optimizer written in FORTRAN.