θ thymos (in progress)

zero-dependency C runtime for typed tensor graphs and memory-planned execution

Thymos is an in-progress zero-dependency C runtime for typed tensor graphs and memory-planned execution. The aim is a small, auditable core that can eventually run inference on memory-constrained hardware (e.g. Jetson-class) without dragging in the Python/PyTorch stack.

Initial target: scalar CPU execution, a graph IR with validation, an arena/context-based allocator, and linear/logistic regression as the first end-to-end demos. Everything beyond that — quantized inference, GGUF, CUDA, classical ML models from published research — is on the roadmap, not in the codebase.

Design motif: typed morphisms. Every operation has explicit inputs, outputs, and context. No hidden state. The point of writing it in C from scratch is to keep that property all the way down.

status

implemented

What currently exists in the repo.

Early scaffolding only — see the repo for the current state of the code. This page will be updated as modules land.

in progress

Active design and prototyping.

Graph IR and op registry. Arena/context allocator and lifetime model. Scalar CPU backend for core tensor ops. Validation pass over the graph. Linear and logistic regression as end-to-end smoke tests.

planned

Next, once the core is stable.

GGUF inspection and quantized weight loading. Quantized matmul (Q4_0 / Q8_0) on CPU. Transformer forward pass on top of the graph IR. Classical ML models (SVR, LS-SVR, p-Laplacian semi-supervised regression).

long-term target

The reason the core looks the way it does.

Optional CUDA backend (custom kernels, cuBLAS where it pays). Jetson Orin Nano deployment for LLM inference under tight memory budgets. Benchmarks on real hardware against llama.cpp / ggml as reference points.

design notes

The runtime is organized around a graph of typed ops over a single arena/context. Allocation is bump-style inside the arena; the entire context is freed in one shot. The intended shape of the API:

// arena + context (planned API — subject to change)
ThymosArena *arena = thymos_arena_create(512 * MB);
ThymosCtx   *ctx   = thymos_ctx_init(arena);

// typed tensors as graph nodes
ThymosTensor *a = thymos_tensor(ctx, THYMOS_F32, (int[]){4, 4}, 2);
ThymosTensor *b = thymos_tensor(ctx, THYMOS_F32, (int[]){4, 4}, 2);
ThymosTensor *c = thymos_matmul(ctx, a, b);

// validate, plan memory, execute
thymos_graph_validate(ctx);
thymos_graph_run(ctx);

thymos_arena_destroy(arena);  // single free

Morphism layer (planned): explicit domain/codomain on each op, so composition can be checked statically rather than discovered at runtime. The first place this earns its keep is graph validation.

roadmap

milestone                                status
─────────────────────────────────────────────────
graph IR + op registry                   in progress
arena/context allocator                  in progress
scalar CPU backend (core ops)            in progress
linear / logistic regression demos       in progress
GGUF inspection                          planned
quantized matmul (Q4_0, Q8_0)            planned
transformer forward pass                 planned
SVR / LS-SVR / p-Laplacian               planned
CUDA backend                             planned
Jetson Orin Nano deployment              long-term

Benchmarks will be published once a real end-to-end path lands. No numbers are claimed before then.