Research

Four active programmes.
One integrated stack.

Our research areas are not independent tracks — they are layers of a single system. Optimizer behaviour informs dataset design. Knowledge graph structure influences tokenization strategy. Software efficiency work feeds back into every layer. Each programme is developed with the others in mind.

Active programmes

Methods in development

Patents filed

In progress · Miller Thomson

Disclosure policy

On completion · selective

External benchmarks

Not a primary target

Programme 01

Optimizer Design

Patent filed

Convergence geometry

Integer weight representation

Gradient trajectory analysis

Memory-efficient training

Lyapunov stability

We have developed a novel optimizer that replaces conventional floating-point momentum buffers with a structured integer-weight representation on a geometric grid — reducing memory overhead while improving convergence stability on models that exhibit instability under standard methods.

Most production optimizers accumulate first and second moment estimates in full FP32, which imposes significant memory pressure during large-scale training. Our approach encodes weight update direction and magnitude as integer indices on a cubic geometric lattice, allowing the optimizer state to remain compact without sacrificing the directional sensitivity needed for stable descent.

Convergence behaviour has been validated on real corpus data. The method demonstrates stable loss reduction on architectures where Adam-class optimizers produce erratic gradient trajectories — particularly in early training phases where loss landscape geometry is poorly conditioned.

Theoretical grounding draws on dynamical systems analysis of gradient flow, treating training trajectories as paths through a locally low-rank manifold rather than unconstrained parameter space.

Key properties

—

Outperforms Adam on architectures that previously caused instability

—

Significantly reduced optimizer state memory versus FP32 buffers

—

Direction-aware update gearing preserves gradient signal at low learning rates

—

Validated on Wikipedia-scale corpus with custom C++ training pipeline

We maintain a multi-billion edge knowledge graph designed for contextual language disambiguation — purpose-built for low-resource and multilingual environments where pretrained embeddings fail to capture domain-specific meaning reliably.

The graph encodes semantic relationships between concepts, entities, and linguistic structures at a scale and specificity that general-purpose embeddings cannot match. Edges carry typed relational weights that are updated through an iterative resonance propagation process rather than static lookup.

Disambiguation is handled by propagating activation across the local neighbourhood of a query context, allowing the system to resolve ambiguous terms by their structural position in the graph rather than by surface co-occurrence statistics alone. This approach handles domain shift and rare vocabulary substantially better than vector-similarity methods.

The architecture is designed to operate efficiently on commodity hardware — a deliberate constraint imposed by our deployment targets in under-resourced educational environments.

Key properties

—

3B+ edges with typed relational structure and provenance tracking

—

Resonance propagation for context-aware disambiguation at inference time

—

Optimised for multilingual and low-resource language coverage

—

Runs on commodity hardware without GPU dependency at inference

Programme 02

Knowledge Architecture

Active · Applied research

Graph-based disambiguation

Resonance propagation

Multilingual coverage

Low-resource deployment

Edge-typed semantic graphs

Programme 03

Data Engineering

Active

Corpus quality scoring

Cross-domain deduplication

Provenance tracking

Domain balancing

Low-resource curation

Dataset quality is a first-order research problem, not a preprocessing detail. We build and maintain tooling for the full corpus construction pipeline — from source curation and deduplication through distributional analysis and domain balancing.

Standard web-scraped corpora carry significant noise: near-duplicate documents, domain imbalance, and low-quality sources that inflate token counts without adding signal. Our pipeline applies multi-stage quality scoring that surfaces these problems before they reach the training loop, rather than relying on post-hoc evaluation to detect degraded model behaviour.

For multilingual and low-resource targets we apply additional curation discipline — domain coverage is assessed against the intended deployment context, not against general web distribution. Provenance is tracked at the document level throughout the pipeline, allowing targeted corpus interventions without full reprocessing.

The same tooling underpins our knowledge graph construction, ensuring consistency between the structured graph layer and the unstructured training corpus.

Key properties

—

Multi-stage quality scoring applied before data reaches training

—

Cross-domain deduplication with document-level provenance tracking

—

Domain balancing calibrated to deployment context, not web distribution

—

Shared pipeline between corpus construction and knowledge graph ingestion

We research and apply low-level software optimization techniques across the full compute stack — from memory layout and kernel scheduling to compiler-directed vectorization and cross-platform profiling — with the goal of extracting maximum performance from available hardware without increasing its cost or footprint.

Performance in ML workloads is rarely limited by raw compute alone. Memory bandwidth saturation, cache inefficiency, poorly scheduled kernel launches, and suboptimal data layout are responsible for a significant fraction of wall-clock time in most training and inference pipelines. We approach these problems at the source rather than compensating with additional hardware.

Our optimization work targets heterogeneous environments — the same efficiency principles applied to NVIDIA hardware must translate to Apple Silicon and AMD GPUs without platform-specific rewrites. This constraint drives a deeper analysis of where performance bottlenecks actually originate, rather than relying on vendor-specific tuning tricks that obscure the underlying problem.

Efficiency research feeds directly back into our training infrastructure and informs our consulting engagements — the techniques developed for our own systems are the same ones we apply when auditing external codebases.

Key areas

—

Memory layout optimisation — cache locality, coalesced access patterns, alignment

—

Kernel fusion and scheduling to reduce launch overhead and memory round-trips

—

Compiler-directed vectorisation and loop optimisation in C++ training pipeline

—

Cross-platform profiling methodology across CUDA, Metal, and ROCm backends

—

Throughput / latency tradeoff analysis for inference on constrained hardware

Programme 04

Software Optimization

Active · Ongoing

Memory layout & locality

Kernel fusion & scheduling

Compiler vectorisation

Cross-platform profiling

Inference efficiency

Pipeline bottleneck analysis

Project

Programme

Stage

Status

R-001

Integer-weight optimizer with geometric grid convergence

Optimizer Design

Production

PATENT FILED

R-002

Training dynamics geometry — manifold analysis

Optimizer Design

Research

ACTIVE

R-003

3B+ edge knowledge graph with resonance propagation

Knowledge Architecture

Production

ACTIVE

R-004

Contextual disambiguation engine — multilingual

Knowledge Architecture

Development

IN PROGRESS

R-005

Multi-stage corpus quality scoring pipeline

Data Engineering

Production

ACTIVE

R-006

Cross-domain deduplication with provenance graph

Data Engineering

Production

INTERNAL

R-007

Cross-platform kernel profiling & optimisation framework

Software Optimization

Active

ACTIVE

R-008

Inference efficiency analysis — constrained hardware targets

Software Optimization

Research

IN PROGRESS

Four active programmes.One integrated stack.

Four active programmes.
One integrated stack.