Blog / Architecture

Architecture

Designing a Hardware Abstraction Layer for Heterogeneous Neuromorphic Chips

Tomasz Brandt · June 3, 2025 · 11 min read

Hardware abstraction layer architecture diagram showing chip targets behind common API

The neuromorphic chip landscape in 2025 looks less like a consolidated market and more like a Cambrian explosion: Intel Loihi 2 with its 128 neurocores and Lava Python framework, BrainChip Akida AKD1000/AKD1500 targeting embedded vision and audio, SynSense Xylo for sub-milliwatt audio inference, Innatera Spiketrum with an analog neuromorphic frontend, and several others in various stages of silicon maturity. Each has a different programming model, different memory topology, different spike routing architecture, and different power management API.

If you're building inference software on top of one chip, this heterogeneity is annoying. If you're building a compiler that must target all of them, it's a foundational design problem. A poorly designed hardware abstraction layer forces you to rewrite the optimizer for each target; a well-designed one makes the optimizer target-agnostic while letting the backend be fully target-specific.

Why neuromorphic HAL design is harder than standard embedded HAL design

Standard embedded HAL design (CMSIS for Cortex-M, or the various Linux kernel driver models) is well-understood: abstract the register interface, provide standard peripheral driver APIs, expose a consistent memory model. The abstraction boundary is between software that doesn't care about silicon details and a driver layer that does.

Neuromorphic HAL design faces a more fundamental problem: the abstract machine itself varies. On Cortex-M, every implementation supports the same ISA — the HAL only abstracts peripherals. On neuromorphic hardware, the computational primitive varies:

Loihi 2: synchronous compartment updates, integer synaptic weights, programmable delay lines, axon routing via mesh NoC
Akida AKD1500: rate-coded input from standard frame data, layer-by-layer spike propagation, fixed 4-bit or binary weights, SPI/UART host interface
SynSense Xylo: asynchronous event-driven execution, analog input frontend for audio, on-chip learning support
Innatera Spiketrum: analog memristive synapses, mixed-signal computation, fundamentally different energy-precision trade-off model

A HAL that hides all of these behind a single abstract interface necessarily loses information that the optimizer needs. A HAL that exposes all chip-specific features to the optimizer requires the optimizer to be aware of every chip variant. Neither extreme is correct.

The NMC HAL layered model

NMC's HAL uses a three-layer model with explicit capability negotiation:

Layer 1: The abstract spike machine (ASM)

The ASM defines the minimal abstract interface that every neuromorphic target must implement. It has four mandatory capabilities:

class AbstractSpikeMachine:
    def load_network(self, nmc_binary: bytes, config: NMCConfig) -> NetworkHandle: ...
    def run_inference(self, handle: NetworkHandle, input_events: EventBuffer) -> OutputBuffer: ...
    def get_energy_sample(self, handle: NetworkHandle) -> EnergyReading: ...
    def reset(self, handle: NetworkHandle) -> None: ...

Every target provides an ASM implementation. Application code that calls only ASM methods is portable across all targets without modification. The NMC runtime's default inference loop uses only ASM methods, which is why the same compiled application binary can run on different chips without recompilation at the application level.

Layer 2: Capability extensions

Beyond the ASM baseline, chips expose optional capabilities via a capability registry. The runtime queries these at initialization:

caps = device.query_capabilities()
# Example output for Loihi 2:
# {
#   "synchronous_timestep": True,
#   "async_event_driven": True,
#   "programmable_delay": True,
#   "max_delay_steps": 62,
#   "synapse_precision": ["int8", "int4", "binary"],
#   "on_chip_learning": False,
#   "analog_frontend": False,
#   "multi_chip_routing": True
# }

Compiler optimization passes that can exploit chip-specific capabilities — the fanout splitter, the delay-line optimizer, the synapse-precision selector — query these capabilities during the compilation phase that selects backend parameters. A pass that requests programmable_delay=True to optimize synaptic delay patterns simply skips on targets where that capability is absent, leaving delays at the default (1 step).

Layer 3: Native driver shim

The native driver shim implements the ASM interface and capability extensions by calling the chip vendor's SDK directly. For Loihi 2, the shim wraps Intel's Lava framework. For Akida, it wraps BrainChip's MetaTF runtime. The shim layer is the only place where vendor-specific code lives; everything above it is vendor-agnostic.

Shim development is gated by SDK availability and licensing. Loihi 2 requires an NDA with Intel Research to access the Lava SDK. Akida's MetaTF is publicly available. This creates an asymmetry in our current support matrix that's driven by IP considerations, not by technical difficulty.

The capability mismatch problem: what the HAL cannot hide

We're not claiming the HAL makes all chips equivalent from a user's perspective — it doesn't. There are irreducible differences that propagate up to the model design level.

Synapse precision mismatch

Loihi 2 supports INT8 synaptic weights; Akida AKD1000 supports 4-bit and binary weights only. A model trained with INT8 weight precision will lose accuracy when mapped to Akida — not because of a compiler bug, but because the hardware can't represent the precision. The compiler emits a warning when quantizing weights to fit the target's precision constraints:

WARNING [nmc-compiler]: Target 'akida-akd1000' requires weight_precision='int4'.
  Quantizing 2 Linear layers from int8 → int4.
  Expected accuracy impact: 0.8–2.1% (estimated from per-layer weight distribution analysis).
  Recommendation: retrain with --weight-precision=int4 for better results.

The HAL can transparently apply the quantization during loading, but it cannot recover lost information. The right fix is to retrain with the target precision in mind, which means model training must be target-aware even when the runtime API is target-agnostic.

Neuron count limits

Loihi 2 supports up to 128K neurons across its 128 neurocores. Xylo supports a maximum of 64 neurons in its standard inference configuration (the Xylo-IMU variant supports up to 1K). A model designed for Loihi 2 with 8K neurons cannot run on Xylo without architectural redesign. The HAL reports this at load time via a NetworkCapacityError rather than silently truncating, which is the correct behavior.

Input encoding lock-in

Akida's accelerated inference path is designed for rate-coded input from standard frame-based sensors. Feeding it a DVS event stream requires explicit event-to-frame conversion, which reintroduces the latency overhead that event-driven processing was meant to eliminate. Xylo's analog frontend, conversely, expects continuous audio and doesn't natively consume pre-encoded spike trains from an upstream DVS camera. These constraints exist at the silicon level and the HAL exposes them as input encoding requirements rather than hiding them.

Benchmarking across targets through the HAL

One benefit of a well-designed HAL is that the benchmark harness can run the same model on multiple targets through a common API and collect comparable measurements. The NMC benchmark suite runs each model variant against all supported targets that meet the network capacity requirements:

nmc benchmark \
    --model compiled/kws_int8.nmc \
    --targets loihi2,akida-akd1500,xylo \
    --dataset shd_test \
    --metrics accuracy,energy_per_inference,latency_p50,latency_p99 \
    --output benchmark_results.json

The results reveal target-specific trade-offs that a single-target benchmark would miss. On an SHD classification task (20-class digit recognition from spike-encoded audio), a representative cross-target comparison shows that Xylo achieves the lowest energy per inference but only for models that fit within its 64-neuron constraint, while Loihi 2 achieves better accuracy on larger models at higher energy cost, and Akida AKD1500 sits in between — better than Loihi 2 for power, worse for accuracy on complex tasks.

Future HAL considerations: analog neuromorphic targets

The most challenging HAL design question for the next 12–18 months is how to abstract analog neuromorphic hardware — specifically Innatera Spiketrum and research-stage memristive crossbar chips. Analog neuromorphic hardware has fundamentally different error characteristics than digital: weight values drift with temperature, synaptic conductances have device-to-device variation, and noise contributes to computation in ways that are sometimes useful (stochastic resonance) and sometimes not.

The ASM interface assumes deterministic inference — the same input always produces the same output. Analog hardware violates this. The HAL extension for analog targets will need to expose a stochastic inference mode with configurable sample-and-average semantics, which changes the runtime contract at a level that affects the application design rather than just the driver implementation. This is an active design question in the NMC HAL specification work.

The core lesson from two years of HAL development is that the abstraction boundary must be placed at a level that captures the semantics of spike-coded computation — not just the register interface — while permitting the compiler to make target-specific decisions at compile time rather than forcing them into runtime adaptation. Compile-time specialization with runtime portability is achievable; runtime polymorphism that hides all hardware differences is not, and attempting it produces a HAL that is technically clean but practically useless.