NEF — Neural Essence Format

execution pipeline · click any node · dots show live data flow

01 — INPUT

User Code · API Call

Python / Go / C++

raw tensor ops
↳ no compute yet

02 — GRAPH BUILDER

Lazy DAG / IR

Directed Acyclic Graph · no execution

nodes = ops
edges = tensors

OPTIMIZER PASSES

03A

Node Fusion

+ Constant Fold

03B

Dead Elim.

+ Mem Reuse

04 — DEVICE PLANNER

Hardware Assignment

MatMul→GPU · Quant→NPU · fallback→CPU

zero device
management API

05 — KERNEL COMPILER

Backend Lowering

PTX · HIP · SPIR-V · AVX-512 · NPU SDK

kernel cache
≥ 95% hit rate

compile once
run forever

06 — EXECUTION RUNTIME

Async Graph Dispatch

parallel streams · mem transfer · sync barriers

GPU util
≥ 85% sustained

07 — OUTPUT

Materialized Tensor

→ .numpy() · .execute() · hydrad consumer

lazy → concrete
on demand only

hardware targets

NVIDIA · CUDA

AMD · ROCm

CPU · AVX-512

NPU · Vendor

Intel · SPIR-V

cache hit → skip compile

planning overhead < 1ms · graph planning < 10K nodes · write once · run anywhere