Docs / Migration Guide

Migration Guide

Step-by-step migration paths from TFLite Micro, TensorRT, and NMC 0.8.x to NMC 0.9.x on neuromorphic silicon.

Migrating from TFLite Micro

TFLite Micro runs int8-quantized dense inference on Cortex-M. NMC replaces the quantized model with a spike-coded equivalent, and the TFLite Micro runtime with nrm_runtime.h. Accuracy delta is typically −1 to −2% in exchange for a 47–83× inference/Watt improvement.

1
Export your model from PyTorch (not TFLite)

NMC accepts PyTorch TorchScript or ONNX — not .tflite. If your current pipeline starts from TFLite, export your float32 PyTorch source model using torch.jit.trace before compiling with NMC.

2
Replace tflite_micro_runner with nrm_step

Remove the TFLite Micro include headers, arena allocator, and interpreter setup. Replace with nrm_arena_init + nrm_load + nrm_step. The event loop structure is similar: both run one inference per timestep.

3
Replace tensor I/O with spike I/O

TFLite Micro reads/writes tensors directly; NMC uses the HAL spike bus. Convert your sensor data to spike-encoded format using neurmorph.encode in Python, or the bundled C encoder nrm_encode.h.

Migrating from TensorRT

TensorRT runs FP16/INT8 dense inference on NVIDIA GPUs. NMC targets neuromorphic silicon at a fraction of the power draw. The migration path is similar to TFLite, but the model sizes and deployment architecture differ significantly.

Key differences to plan for:

  • TensorRT model size is typically 5–15 MB; NMC .snn binaries are 80–200 KB for equivalent models.
  • TensorRT requires CUDA environment; NMC runtime is C99 with no OS dependency.
  • Latency characteristics invert: TensorRT is fast with batching, NMC is faster at batch size 1 (single sensor event).

Migrating from NMC 0.8.x

NMC 0.9.x introduces breaking changes to the Python SDK and the runtime C API. The .snn binary format is not backward compatible with 0.8.x binaries — recompile all models.

Common migration issues

  • Accuracy below expected: Run nmc calibrate with a larger representative dataset. The default 512-sample calibration may be insufficient for models trained on highly varied input distributions.
  • SRAM overflow on deploy: Check nmc inspect output for total SRAM estimate. If it exceeds the target's internal SRAM, apply dead-spike elimination with a stricter threshold using --dse-threshold 0.05.
  • HAL spike bus not draining: Check that hal_spike_read returns the actual number of spikes read, not always 0. A return value of 0 causes the runtime to stall indefinitely.
  • Latency higher than benchmarks: Confirm hal_power_mode(ACTIVE) is called before the first nrm_step. Many default HAL implementations leave the chip in SLEEP at startup.