zero-dependency C runtime for typed tensor graphs and memory-planned execution
Thymos is an in-progress zero-dependency C runtime for typed tensor graphs and memory-planned execution. The aim is a small, auditable core that can eventually run inference on memory-constrained hardware (e.g. Jetson-class) without dragging in the Python/PyTorch stack.
Initial target: scalar CPU execution, a graph IR with validation, an arena/context-based allocator, and linear/logistic regression as the first end-to-end demos. Everything beyond that — quantized inference, GGUF, CUDA, classical ML models from published research — is on the roadmap, not in the codebase.
Design motif: typed morphisms. Every operation has explicit inputs, outputs, and context. No hidden state. The point of writing it in C from scratch is to keep that property all the way down.
What currently exists in the repo.
Early scaffolding only — see the repo for the current state of the code. This page will be updated as modules land.
Active design and prototyping.
Graph IR and op registry. Arena/context allocator and lifetime model. Scalar CPU backend for core tensor ops. Validation pass over the graph. Linear and logistic regression as end-to-end smoke tests.
Next, once the core is stable.
GGUF inspection and quantized weight loading. Quantized matmul (Q4_0 / Q8_0) on CPU. Transformer forward pass on top of the graph IR. Classical ML models (SVR, LS-SVR, p-Laplacian semi-supervised regression).
The reason the core looks the way it does.
Optional CUDA backend (custom kernels, cuBLAS where it pays). Jetson Orin Nano deployment for LLM inference under tight memory budgets. Benchmarks on real hardware against llama.cpp / ggml as reference points.
The runtime is organized around a graph of typed ops over a single arena/context. Allocation is bump-style inside the arena; the entire context is freed in one shot. The intended shape of the API:
// arena + context (planned API — subject to change)
ThymosArena *arena = thymos_arena_create(512 * MB);
ThymosCtx *ctx = thymos_ctx_init(arena);
// typed tensors as graph nodes
ThymosTensor *a = thymos_tensor(ctx, THYMOS_F32, (int[]){4, 4}, 2);
ThymosTensor *b = thymos_tensor(ctx, THYMOS_F32, (int[]){4, 4}, 2);
ThymosTensor *c = thymos_matmul(ctx, a, b);
// validate, plan memory, execute
thymos_graph_validate(ctx);
thymos_graph_run(ctx);
thymos_arena_destroy(arena); // single free
Morphism layer (planned): explicit domain/codomain on each op, so composition can be checked statically rather than discovered at runtime. The first place this earns its keep is graph validation.
milestone status ───────────────────────────────────────────────── graph IR + op registry in progress arena/context allocator in progress scalar CPU backend (core ops) in progress linear / logistic regression demos in progress GGUF inspection planned quantized matmul (Q4_0, Q8_0) planned transformer forward pass planned SVR / LS-SVR / p-Laplacian planned CUDA backend planned Jetson Orin Nano deployment long-term
Benchmarks will be published once a real end-to-end path lands. No numbers are claimed before then.