NUX and GGML: Bringing AI to Kernel Space.
In the past few months, mostly pushed by friends more knowledgeable than me in this field, I started – something not exactly original – to divert my attention to the recent improvements in machine learning and AI.
I had an alternating fascination with the field over the years. The first time I had real interest as an adult engineer was in 2010, after I watched the Jeff Hawkins' 2002 TED talk. If you haven't watched it, watch it now; it's a brilliant talk!
I was living in Amsterdam at the time, and I remember spending every possible hour outside work tinkering with the idea of prediction. I remember downloading the first Numenta's whitepaper about their Cortical Learning Algorithm and I did what I usually do when I want to understand something: I reimplemented it. Twice.
Speaking of Numenta, they're definitely up to something. Their recent papers, although I have read them only lightly, look extremely promising and super-interesting. If you haven't already, check their Thousand Brains Project. Seems like a place to spend a lifetime of fun.
But of course, today all the discourse is about everything that happened since this paper. And I couldn't ignore it.
GGML to the rescue
Personal taste here, but in order for me to experiment with things, I need to find a way to experiment without resorting to Python.
I have been briefly exposed to PyTorch at work, and that was enough experience for me.
I thought for some time that this meant that the whole AI thing would be out of touch for me, but then a friend pointed me to GGML.
GGML is a tensor library used by projects such as Llama.cpp. In the repository examples, you can find even some simple but effective GPTs.
It was originally meant to support CPUs only (and aarch64 Macs in particular), but now has backends for BLAS, OpenMP and hardware platforms.
The code – that has all the obvious signs of a fast growing project – is a mix of C and minimal C++. On a cursory glance, it seems to be architected in this way:
- A set of tools to open, save and load models.
- Functions that create a computational graph from the models.
- A VM that executes the computational graphs in a thread pool.
What I liked about GGML is that the architecture makes sense and it's easily hackable, if you can stomach cmake.
GGML in kernel space.
The goal of my NUX prototyping kernel framework is to be able to quickly create custom kernels. It has its own libc – libec, based on the NetBSD libc – and a powerful memory management and this means that as long as file I/O is not required, you should be able to port any C program to run in kernel mode.
Another thing that NUX offers is the ability to completely control what the hardware is doing. If I run some code in kernel space in a CPU, I can make sure that nothing will ever interrupt it.
NUX supports IPIs, so we can use that (or simple SMP barriers) to syncronize among them.
I realised quickly that this works really well with GGML architecture. You could for example, boot a machine and assigns all their secondary CPUs to the GGML threadpool, while using the bootstrap CPU for system control and drivers.
Of course, I decided to implement that. And make a FOSDEM talk about the effort!
An early prototype
Today, I published a github something that has been living dangerously unbacked up in my machine for the past few months: blasbare.
It has been my workspace for running various experiment of porting various computing architectures to NUX.
As it stands, there's a simple
kernel
that runs the simple-ctx
GGML example.
Despite the simplicity of it, it compiles the full GGML library with the CPU backend.
Works still needs to be done, and the documentation is lacking, but it's the early days.
This project will be discussed more in detail at FOSDEM 2025 in Brussels later this month. Hope to see you there!