Gianluca Guida's personal page.

A Short Guide to the GGUF Format

Gianluca Guida , Feb 17, 2025

As my recent FOSDEM talk suggests, I find the GGML ecosystem an effective way to introduce system programmers such as myself to the fancy new fashion of AI – meaning transformers and their uses.

For my FOSDEM demo, which ported GGML to my kernel library, I wrote a crude GPT-2 implementation based on GGML's official example.

This demo led me to focus on how real models — like those running on llama.cpp — are actually distributed. The answer is GGUF, GGML's attempt at creating a universal format for distributing models.

I spent a weekend hacking away at GGUF, and here’s what I figured out.

GGUF as a universal format

GGUF is the latest format understood by llama.cpp to load and run models.

Its history points to an organic evolution — GGML, GGMF, GGJT, and now GGUF — and it is now at version three of the format.

The first thing to notice about GGUF is that it is meant to be mmap'd into memory and thus data on disk appear in the same order as they do in memory.

What about endianness? Well, it is implicit. Version three, the current version of the GGUF format, lets data be stored in big-endian, but there is no flag whatsoever to signal this.

Interesting choice, but it does make sense, little-endian is expected in 2025: in my recent experience, big-endian machines are either embedded routers or hypotheses.

Having cleared the encoding on disk, let's have a look at what a GGUF file is.

GGUF: An overview

In broad strokes, a GGUF file is composed of:

A fixed-size header
A key-value store
A list of typed, named tensors

GGUF structure

Let's go down section by section and see how they're actually represented on disk.

The GGUF header

The GGUF header is a fixed-size data structure, and there are no surprises there.

struct gguf_header_t {
     char     magic[4];
     uint32_t version;
     uint64_t tensor_count;
     uint64_t metadata_kv_count;
};

The first four bytes contain the ASCII characters ‘G’, ‘G’, ‘U’, ‘F’ to identify the file as GGUF.

A 32-bit unsigned integer follows to indicate the version. Currently, the latest version is 3.

Next, there are two important fields:

tensor_count: how many tensors this model includes.
metadata_kv_count: how many key-value elements the metadata has.

How to find tensors and metadata is the core of GGUF parsing, and will be the main topic of this post.

Let's start with the metadata, because they are placed right after the header.

The Key-Value Store (Metadata)

Following the header is a sequence of key-value data. The format is:

struct {
    struct gguf_string  key;
    uint32_t value_type;
    /* Value appended here. */
};

The metadata is named by the string key, and has type value_type.

A GGUF string has this format:

struct gguf_string {
    uint64_t len;
    char str[0];
}

Where len is the number of bytes composing the string, and str is a non-NULL terminated string that follows the len field.

The type is stored in a 32-bit unsigned integer, which currently can be one of the following:

enum gguf_metadata_value_type {
    GGUF_MVT_UINT8 = 0,
    GGUF_MVT_INT8 = 1,
    GGUF_MVT_UINT16 = 2,
    GGUF_MVT_INT16 = 3,
    GGUF_MVT_UINT32 = 4,
    GGUF_MVT_INT32 = 5,
    GGUF_MVT_FLOAT32 = 6,
    GGUF_MVT_BOOL = 7,
    GGUF_MVT_STRING = 8,
    GGUF_MVT_ARRAY = 9,
    GGUF_MVT_UINT64 = 10,
    GGUF_MVT_INT64 = 11,
    GGUF_MVT_FLOAT64 = 12,
};

Most of the fields should be self-descriptive, but there are two types that need a bit more explanation: bool and array.

bool is a one-byte value, where zero is false.

array is what makes metadata parsing complicated.

When an element is described as an array, the following structure is appended:

struct gguf_array {
    uint32_t type;
    uint64_t len;
}
/* 'len' elements of type 'type' follow */

type is, once again, described by the enum gguf_metadata_value_type above.

What is complicated about this, you ask? Well, the type of an array can be GGUF_MVT_ARRAY, so you can have multi-dimensional arrays described in the metadata.

Powerful, but requires care during parsing.

The number of metadata elements in the file is specified by the GGUF header field metadata_kv_count.

After this, the tensor store begins.

The Tensor Store

Tensors are stored in two separate structures: the Tensor Info Array and the Tensor Data.

The Tensor Info Array starts right at the end of the metadata, and is a sequence of these fields:

struct gguf_string name: a GGUF string naming the tensor
uint32_t ndims: A 32-bit integer indicating the number of dimensions of the tensor:
uint64_t dims[ndims]: The size of each dimension follows as a sequence of ndims 64-bit integers.
uint32_t ggmltype: The type of data stored in the tensor. This is defined as enum ggml_type, and contains data type natively supported by GGML. The enum is too long to list here, but supports from common floating point data to quantization formats.
uint64_t offset: A 64-bit offset. This offset is counted from the start of the tensor data, which is the region following the tensor info array.

And here lies the real surprise of the format: the tensor data alignment.

The Tensor Data

The tensor data do not start immediately after the tensor info. Tensor data can be aligned to a specific boundary. This is important, I believe, because aligned data do speed up certain instructions – AVX for example – and in some cases it might even be required. This allows data to maintain alignment when the file is mmapd.

There's a caveat though. The value of the alignment is stored in the metadata. More specifically, the alignment value is a 32-bit integer under the key general.alignment. If such metadata is not present, the default alignment value is 32.

Tensor data starts at the next alignment boundary from the end of the tensor info array.

After this simple calculation, the rest is easy. The tensor info includes an offset, and this offset – which must be aligned – is added to the start of the tensor data, to retrieve the actual tensor.

We have now described how to scan the header, read the metadata, and retrieve the tensors. This is all that there is in a GGUF file.

Conclusions

The GGUF format was in a way a pleasant surprise. It's a simple, binary format that is almost self-explanatory.

There are things I would have done differently. Here are the things that left me a bit puzzled about it:

- Tensor data alignment value in metadata. I think that inserting a file's structural information inside a high-level data structure – general.alignment – rather than in a quickly accessible field is a suboptimal choice. But once again, I don't know how this file format evolved, and, as everything in software engineering, certain choices make sense only when seen through the lens of historical evolution.

- Everything is serial. In order to find the tensor info array, or the tensor data start, we have to scan everything before it. Including parsing the metadata.

The picture of the perfect GGUF variant in my head is something with this header format:

struct gguf_header_t {
     char     magic[4];
     uint32_t version;
     uint64_t tensor_offset;      /* NEW */
     uint64_t tensor_count;
     uint64_t metadata_kv_offset; /* NEW */
     uint64_t metadata_kv_count;
     uint64_t tensor_data_offset; /* NEW */
};

I.e., adding three offsets to the header, indicating the start of each section: tensor info, tensor data and metadata.

This would have allowed for arbitrary alignment to be implicitly supported without the need for special metadata keys. It would also allow any scanner to quickly find the section it is searching for.

Moreover, this would allow a more flexible layout regarding the ordering of the sections, without complicating the creation and saving of the file.

In any case, nothing is perfect in this world, and all in all I am pleasantly surprised by the simplicity and expandability of this format.

Full Article and Comments

NUX and GGML: Bringing AI to Kernel Space.

Gianluca Guida , Jan 15, 2025

In the past few months, mostly pushed by friends more knowledgeable than me in this field, I started – something not exactly original – to divert my attention to the recent improvements in machine learning and AI.

I had an alternating fascination with the field over the years. The first time I had real interest as an adult engineer was in 2010, after I watched the Jeff Hawkins' 2002 TED talk. If you haven't watched it, watch it now; it's a brilliant talk!

I was living in Amsterdam at the time, and I remember spending every possible hour outside work tinkering with the idea of prediction. I remember downloading the first Numenta's whitepaper about their Cortical Learning Algorithm and I did what I usually do when I want to understand something: I reimplemented it. Twice.

Speaking of Numenta, they're definitely up to something. Their recent papers, although I have read them only lightly, look extremely promising and super-interesting. If you haven't already, check their Thousand Brains Project. Seems like a place to spend a lifetime of fun.

But of course, today all the discourse is about everything that happened since this paper. And I couldn't ignore it.

GGML to the rescue

Personal taste here, but in order for me to experiment with things, I need to find a way to experiment without resorting to Python.

I have been briefly exposed to PyTorch at work, and that was enough experience for me.

I thought for some time that this meant that the whole AI thing would be out of touch for me, but then a friend pointed me to GGML.

GGML is a tensor library used by projects such as Llama.cpp. In the repository examples, you can find even some simple but effective GPTs.

It was originally meant to support CPUs only (and aarch64 Macs in particular), but now has backends for BLAS, OpenMP and hardware platforms.

The code – that has all the obvious signs of a fast growing project – is a mix of C and minimal C++. On a cursory glance, it seems to be architected in this way:

A set of tools to open, save and load models.
Functions that create a computational graph from the models.
A VM that executes the computational graphs in a thread pool.

What I liked about GGML is that the architecture makes sense and it's easily hackable, if you can stomach cmake.

GGML in kernel space.

The goal of my NUX prototyping kernel framework is to be able to quickly create custom kernels. It has its own libc – libec, based on the NetBSD libc – and a powerful memory management and this means that as long as file I/O is not required, you should be able to port any C program to run in kernel mode.

Another thing that NUX offers is the ability to completely control what the hardware is doing. If I run some code in kernel space in a CPU, I can make sure that nothing will ever interrupt it.

NUX supports IPIs, so we can use that (or simple SMP barriers) to syncronize among them.

I realised quickly that this works really well with GGML architecture. You could for example, boot a machine and assigns all their secondary CPUs to the GGML threadpool, while using the bootstrap CPU for system control and drivers.

Of course, I decided to implement that. And make a FOSDEM talk about the effort!

An early prototype

Today, I published a github something that has been living dangerously unbacked up in my machine for the past few months: blasbare.

It has been my workspace for running various experiment of porting various computing architectures to NUX.

As it stands, there's a simple kernel that runs the simple-ctx GGML example.

Despite the simplicity of it, it compiles the full GGML library with the CPU backend.

Works still needs to be done, and the documentation is lacking, but it's the early days.

This project will be discussed more in detail at FOSDEM 2025 in Brussels later this month. Hope to see you there!

Full Article and Comments

Why would one rewrite Mach?

Gianluca Guida , Jan 06, 2025

Over the past few months, I have been writing a microkernel aiming at being effectively a Mach rewrite.

The response to this, when I shared my plans with friends who share my interests, was a simple, almost desperate question: Why, Gianluca, why would you do that?.

Definitely an interesting question, so I thought this would be a good time to write about it.

The many reasons behind a personal project.

In the spring of 2024, I picked up an old project of mine, NUX, and brought it to a state where I could easily build portable kernels that would run on real, modern hardware.

The first test I made was to port Murgia Hack to NUX. The port, which allowed MH to run on two new architectures, RISCV64 and AMD64, was straightforward. But the architecture of NUX is heavily influenced by the design choices of MH, so it was not meant to be difficult.

What I needed was a challenging kernel. I briefly considered a classical UNIX system, but then it occurred to me that there's always been a microkernel-shaped hole in my life: Mach.

Mach and me, a personal story.

I remember the first time I downloaded the GNU Hurd sources. It was the mid-to-late 1990s. The first thing I've heard about the Hurd was this famous Linus Torvalds quote:

In short: just say NO TO DRUGS, and maybe you won't end up like the Hurd people.

Truth is, the promise of GNU Hurd was exciting. It was considered the next big thing. It was an architecture that allowed to play with the fundamentals of a UNIX system by providing translators.

It took me days to finally build the toolchain and the sources, and being able to boot GNU Mach on my 486.

At that time – I was a high school student – I was able to understand some parts of the Linux Kernel code, and so I decided to study Mach. And so, with the usual youth optimism and fearlessness, I printed the "Mach 3 Kernel Principles" on my super-slow early ink jet printer, and spent nights reading it. Then I started looking at the code.

Me and the Mach source code

If you have ever looked at Mach's source code, you know where this story is going. I was unprepared for that code. I finally understood something my mediocre Italian Literature teacher was failing to explain at the time: Dante's trip to the Inferno.

There was something off about the purity, clarity of the Mach principles and architecture, as explained in the documentation, and its code.

I was always interested in memory management, and the VM code seemed to me, at the time, designed to bring the worst doubts about one's own capacity to understand things.

As a reaction, I went on and decided to start writing my own kernel. But this is another story.

I stayed around the Hurd community for quite some time. Despite what Linus said about them – us? –, there were some really nice people in the project. Many of them I still consider my friends.

Sto-Mach

With the years, I started being a bit more at ease with the Mach source code, and in 2005, as a late computer engineering student at university, I presented Sto-Mach at the Hurd Meeting in Madrid. Sto-Mach was my reaction to a project called oskit-mach. Based on oskit, oskit-mach removed much of the Mach core and substituted it with oskit components. Sto-Mach did the opposite. Kept the core of Mach intact, removed the Linux 2.0 driver glue code, and used oskit components as drivers.

The result was slightly better. Now we had Linux 2.2 drivers, but more importantly a COM-based interface – yes, oskit packaged other operating system source code in this Microsoft standard – to write drivers.

I have recently found the source code of Sto-Mach in my archives. I will be writing a blog post about it. For now, only slides of that presentation are available on my talks page.

Shortly after that, I left university, and started my career as a hypervisor engineer. And I forgot many details about Mach.

MACHINA: a NUX-based Mach clone.

Now, should be clearer why, when faced with the choice of a challenging kernel architecture to implement in NUX, I chose Mach.

By coincidence, right when I was thinking about doing it, I found, while unpacking the boxes after the move to a new flat, my old printed version of Mach 3 Kernel Principles I spent so many nights reading, decades ago.

On re-reading it, I re-discovered my fascination for that architecture, and decided that I wanted to rewrite Mach, to answer questions that now, as a seasoned system engineer, I can finally face:

Does Mach have to be this complex to achieve this functionality?
What would a modern Mach implementation look like?
What architectural decisions of Mach would not be made today?

Is Mach easy to rewrite? No, it is not. Is this going to be a replacement for Mach? No, it will not. But sometimes, the best way to understand a system is reimplementing it.

I will introduce MACHINA at FOSDEM 2025. See you there!

Full Article and Comments

Introducing NUX, a kernel framework

Gianluca Guida , Dec 24, 2024

History and motivation

Circa 2018, I decided that the Murgia Hack System needed a fresh start to support newer architectures.

MH's kernel is quite clean and simple, but suffers from an aging low level support. Incredibly, some of that i386 code can be traced back to my early experiments (in 1999!) and code that I wrote for my first SMP machine – a dual Pentium III bought in Akihabara in the early 2000s!

Unfortunately, emotional attachment to code doesn't create great engineering, and I had to start from scratch.

The driving principle behind this effort – that later became NUX – was to rationalise my kernel development.

At its core, a kernel is an executable, running in privileged mode. It's special because it handles exceptions, IRQs and syscalls, essentially events, so it can be seen as an event-based program. And it runs on multiple CPUs concurrently, we can even draw similarities with multi-threading.

The very annoying and often project specific part of a kernel is the bootstrap. A kernel usually starts in a mode that it's either very limited (think x86 legacy boot) or very different in terms of runtime (think EFI).

A kernel is thus required to set up its own data structures (and virtual memory), and then jump in it (through magic pieces of assembler called trampolines).

In a nutshell, NUX can be seen as an attempt to solve all the abovementioned problems that differentiate a kernel from a normal executable.

Solving the bootstrapping problem.

To solve the setup of the kernel executable data structures, NUX introduces APXH, an ELF bootloader. APXH – (upper case of αρχη), greek for beginning – is a portable bootloader whose goal is to load an ELF executable, create the page tables based on the ELF's Program Header, and jump to the entry point. It attempts to be the closest thing to an exec() you can possibly have at boot.

APXH also supports special program header entries, – such as Frame Buffer, 1:1 Physical Map, Boot Information page – that allows the kernel to immediately use system features discoverable at boot, further reducing low level initialisation.

APXH is extremely portable, and currently works on i386, AMD64 and RISCV64, and also supports booting from multiple environemnts, currently EFI, GRUB's multiboot and OpenSBI.

Creating an embedded executable: the need for a small libc.

In order to create an executable in C, you'll have to create against a C Runtime (crt) and a C Library.

This is why NUX introduces libec, an embedded quasi-standard libc.

libec is based on the NetBSD libc, guaranteeing extreme portability and simplicity. It is meant to be used as a small, embedded libc.

Every binary built by NUX – whether APXH, a NUX kernel, or the example kernel's userspace program – are all compiled against libec.

A kernel as a C executable.

As for any C-program, the kernel will have to define a main function, that is called after the C-runtime has initialised. The libec is complex enough to support constructors, so that you can, define initialisation functions that run before main.

A special function of NUX, that diverts from normal C-programs, is main_ap. This is a main funciton, that is called on secondary processors, that is other processors that are not the bootstrapping CPU.

Kernel entries as events.

As mentioned above, a kernel has to deal with requests from userspace and hardware events. In NUX, this is done by defining entry functions for these events.

The whole state of the running kernel can be defined by the actions of these entry functions.

A kernel entry has a uctxt passed as a parameters and returns a uctxt. uctxt is a User Context, the state of the userspace program. The kernel can modify the User Context passed as an argument and return the same one, or can return a completely new one.

The former is how system calls return a value, the latter is how you implement threads and process switches.

The NUX library interface

Finally, NUX provides three libraries:

libnux: a machine-independent library that provides the higher level funcitonalities you need to develop a fully functional OS kernel. The 'libnux' interface is here.

libhal: This is a machine-dependent layer. Exports a common interface to handle low level CPU functionalities. The HAL interface is here.

libplt: This is a machine-dependent layer. Exports a common interface to handle low level Platform functionalities, such as device discovery, interrupt controller configuration and timer handling. The Platform Driver interface is here.

The separation between hal and plt is possibly a unique choice of NUX, and allows, as many other design choices of NUX, for a gradual and quick porting to new architectures.

For example, when the AMD64 support was added, the ACPI platform library needed no changes, as the CPU mode was different but the platform was exactly the same.

Similarly, an upcoming support for ACPI support for Risc-V consists mostly on expanding the ACPI libplt to support Risc-V specific tables and the different interrupt controllers.

A useful tool for kernel prototyping.

NUX goal is to remove the burden of bootstrapping a kernel. And be portable.

The hope is that NUX will be useful to others the same way it has been useful to me: experimenting with kernel and OS architectures, while skipping the hard part of low level initialisation and handling.

Full Article and Comments

Controlling MRG LFOs frequency range

Gianluca Guida , Dec 09, 2020

The MRG LFOs contains two different oscillator cores, each capable of oscillating from slightly more than a second to about 600Hz.

Deciding the frequency range of an LFO is a matter of components selection, arbitrary decisions, physical limits and personal taste.

There are three components that control the timing of each LFO in the circuit:

The resistance at the 1M potentiometer (Rp) in the front PCB.
The resistor (Rt).
The timing capacitor (Ct)

lfo pcb

Location of the components controlling the frequency range on a MRG LFOs PCB. Click Here for full picture.

The mathematical relationship that regulates each LFO frequency is:

lfo freq

The highest frequency is realised when the potentiometer is at its minimum, and can be approximated by removing Rp from the equation:

lfo freq

and the lowest when the potentiometer is at its maximum, approximated by exchanging Rp with the value of the potentiometer:

lfo freq

The MRG LFOs as shipped and designed uses 1K for Rt and 1uF for Ct, creating the following theoretical frequency ranges:

lfo freq

In general, these default frequency ranges can be changed by modifying the elements involved. The suggested way of doing that is changing the timing capacitor Ct. A bigger capacitance will make the oscillator slower, a smaller one faster.

As long as the timing capacitor Ct is a non-polarised capacitor with a 5mm pitch, it should be fine. Of course, being a timing capacitor, the suggestion is to use a film capacitor, although 5mm pitch film capacitors at the micro-farad scale are unfortunately rare.

Using a Ct of 4.7 uF as a timing capacitor leads to the following minimum and maximum frequencies:

lfo freq

Of course, as everything analog these are approximations, and real life measurement will be subject to some degree of error.

As an example, the picture below shows a MRG LFOs modified with a 4.7 uF capacitor in one of the oscillators (I am using the KEMET R82CC4470Z330K). The measured frequency range was from .126 Hz (about 8 seconds) to 124Hz.

lfo mod

Full Article and Comments

With apologies to Paganini

Gianluca Guida , Sep 27, 2020

This is what happens when you feed a MIDI file of Paganini's Capriccio no. 24 to a monosynth built with MRG modules:

I find that violin, viola and cello compositions are perfect to test a simple monosynth, as they are mostly monophonic and they range across many octaves.

The Capriccio n.24 is perfect in this, and it actually overstretches the octaves over the limits of my 2HP MIDI module: unfortunately all sounds above C7 are flattened to C7. But it's still a good demo of the MRG 3340 VCO's excellent 1V/O tracking.

I played a lot with the envelope generated by the MRG 3310 ADSR module. The envelope is actually fed to the 24db MRG LPF, that acts as a low pass gate. You can see me tweaking the filter as well. The output of the filter is then passed to the MRG VCA, that amplifies the signal controlled by the MIDI velocity, and then to the output.

I use two MRG 3340 VCOs here, both sawtooth waves. They start being tuned at the same frequency, but around 2:09 I detune the second oscillator to an octave lower.

I feel extremely dirty for having destroyed one of classical music's masterpieces, and I am probably in danger for having offended a composer who liked to spread the rumor of having sold his soul to the devil, but I admit that recording this was a lot of fun!

Full Article and Comments

The Yamaha DX7 Envelope Generator, part four.

Gianluca Guida , May 12, 2020

This is part four of a qualitative analysis of the DX7 Envelope Generator. The series begins here.

Part three is here

Rising segments.

As we've seen, the rising segment of an envelope in a DX7 is not a pure exponential.

A really useful (and hackish) experiment is to divide the signal by a linear function, to poke with linear components.

Signal divided by linear function

The envelope divided by (x - 86250)

What emerge is that, once divided by x, the segment is markedly linear. This is really good, as it seems to suggest that the segment is quadratic.

Let's go back at raw2plot.c and add a new column, that calculates the square root of the envelope. Patch is here.

Square root of the envelope

Square root of the envelope.

The linear behaviour of the square root is even more evident here.

Let's calculate the curve by finding the start and end samples of the segment. The maximum (1.0) is reached at sample 87427, while the signal appears to start (hard to tell given noise) at around 86275.

We have then that:

s = (1.0 - 0.0) / (87427 - 86275) = .000868

and

env(t) = V * s^2 * x^2

Let's try this against the signal:

$ gnuplot
gnuplot>  plot [86000:88000][-.1:1] 'envtest0.plt' \
using 1:2 with lines, \
.8*.000868*.000868 *(x - 86275)*(x-86275)

Fitting of rising segment with a parabola

Fitting of rising segment with a parabola.

It is a good fitting, apart from noise and some offset at the start.

As our goal is not a precise reproduction of the wave generated by the DX7 – this will likely require an analysis at the hardware level – but a characterisation of the envelope, this fits well enough.

Putting it all together

We now have enough data to reconstruct the envelope of ENV TEST#0.

envtest0.c is a simple program that simulates the DX70 envelope at rate 50 using the equations we just found and recreates the original recording. Its output can be printed by gnuplot.

Match1 Match2

Comparison of envtest0.c output ("syntehtic") versus envelope of the recorded sound.

As you can see, the result is incredibly close. There's a lag of about 1000 samples (or 22usec) between on the start of segment three, which is probably due to an incorrect calculation of level 50.

Conclusion

We finally found out more about the DX7 envelope, and we were able to recreate an envelope based on our measuring.

This is not all is needed to simulate completely the Envelope Generator of a DX7. Information lacking is:

How the parameters s and d change when rate changes.
What's the output value for all the levels.

Full Article and Comments

The Yamaha DX7 Envelope Generator, part three.

Gianluca Guida , May 10, 2020

Part two is here

Extracting the envelope.

We concluded part two finding out that the DX7 envelope generator's curves were:

non linear, apparently exponential.
rising faster than they were decreasing.

Let's try to find out if the curves are indeed exponential.

Before we proceed, let's modify our tool to plot only what really interests us: the envelope. This patch to raw2plot.c adds a crude envelope detector: it tracks the delta between two samples to detect a local maximum when the derivative of the signal changes sign from positive to negative.

The output will now contain two columns: the first one is the signal, the second is the envelope.

$ ./raw2plot < envtest0.raw > envtest0.plt
$ gnuplot
gnuplot> plot 'envtest0.plt' using [1:3] with \
lines

Now that we have the envelope extracted, we can finally try to understand more about it without being distracted by the wave's oscillations.

Plot of recording's envelope

Plot of the first two segments envelopel

Our recording and the first two segments of our signal after being processed by the envelope detector.

Normalising the envelope

The record we're using has maximum volume at 0.794. In order to make our calculations more generic, let's normalise this by setting the maximum to 1.0.

[This patch] adds support for an passing an optional normalisation factor to raw2plot.

This command:

$ ./raw2plot .794 < envtest0.raw > envtest0.plt

will produce the same graph as above, scaled so that value 0.794 will become 1.0.

For the rest of this article we'll be using the normalised envelope.

Searching for exponentials.

A quick and immediate way to check for exponential curves of the form exp(x * T) in the envelope is to plot in logscale. this is easy in gnuplot, using set logscale y.

$ gnuplot
gnuplot> set logscale y
gnuplot> plot 'envtest0.plt' using [1:3] with \
lines

This will produce a log-linear plot of the envelope: abscissa unmodified but ordinate in log10.

Plot of recording's logscale evnelope

plot of recordings's logscale envelope edges

Logscale (base 10) plot of the envelope, and details about segments.

It emerges clearly that the decreasing edges are linear in log scale: this means they're pure exponential. The increasing segments are more complicated.

Decreasing segments

We have finally collected enough data to find the equation of one element of the DX7 envelope generator: the decreasing segments.

The first segments reaches the maximum (1.0) at sample 87427, and decrease linearly until sample 91410. Near the zero the graph is polluted by recording noise, so we take another earlier sample point and find the decay constant. Let's note that at sample 89640, output is circa .1.

The slope of the segment in the log-linear plot is:

d = -log(.1 / 1) / (89640 - 87427) = 0.00104

d is the decay constant we're searching for, and a decreasing envelope in DX7 at rate 50, with a sampling rate of 44100 sample/secs, will behave as:

env(t) = V * exp(-d * t)

where V is the maximum level reached by the envelope.

The time constant for rate 50 of a decreasing envelope is:

T = 1/d = 961.1 samples

Let's prove our theory comparing or wave with the exponential 0.8 * exp( -d * (x - 87427)):

$ gnuplot
gnuplot>  plot [-1:1] 'envtest0.plt' using 1:2 \
with lines, .8*exp(-.00104 * (x - 87427))

Fitted exponential

Our curve matches perfectly the decreasing envelope!

Conclusions

What we have so far is a characterization of the decreasing segment of an envelope in a DX7, with precise measurement for rate 50 (the one recorded).

In part four, we'll focus on the rising segments.

Updated sourcecode for rawp2plot.c is in the 'part3' branch of this GitHub project.

Part four is here

Full Article and Comments

The Yamaha DX7 Envelope Generator, part two.

Gianluca Guida , May 08, 2020

Part one is here

A closer look at an operator envelope.

Let's begin our understanding of the envelope generators by trying to answer the easier question: what kind of shape the envelope segments have.

As already anticipated, we'll have to look at the DX7 output, and in order to have a rational analysis we'll have to control it somehow, to isolate the part we want to study.

Let's do this by creating a simple voice and uploading it to the synthesizer. The voice (ENV TEST#0) will have one operator active, with a simple envelope. The envelope will go from 0 to 99, 99 to 50, 50 to 80, 80 to 0. On a constant, slow rate. Fast enough that can be recorded quickly, slow enough that the curve can be seen.

The rest of the voice should be configured as flat as possible, with no fancy settings to avoid complicating our analysis.

Creating a voice.

In order to create a voice we need to create a MIDI System Exclusive message that contains the voice in the Packed 32 Voice format. It is a format that stores all 32 voices of a DX7 in 4096 bytes.

The structure of the SYSEX message and the bulk voice format can be both found in the DX7s Owner's Manual.

genrom.c is a simple application that write to standard output a packed 32 voices bulk SYSEX message for the Yamaha DX7. The first voice in the set is ENV TEST#0, all others are empty).

The basic characteristic of the voice can be seen in this code excerpt:

v.ops[0].eg_rate[0] = 50;
v.ops[0].eg_rate[1] = 50;
v.ops[0].eg_rate[2] = 50;
v.ops[0].eg_rate[3] = 50;

v.ops[0].eg_level[0] = 99;
v.ops[0].eg_level[1] = 50;
v.ops[0].eg_level[2] = 80;
v.ops[0].eg_level[3] = 0;

Note that we're only setting to non-zero the first operator. All other operators are essentially turned off. The algorithm used (i.e., the connection map of the operators) is 32, i.e., all operators go directly to output, no one modulates any other operator.

After compiling the above program, we can finally create a file to send to the synthesizer:

$ ./genrom > envtest.syx
$ file envtest.syx 
envtest.syx: SysEx File - Yamaha

We now have a file slightly bigger that 4k that contains the set of voices just created. The next step is sending it to the synthesizer.

Uploading a voice.

Uploading the set of voices to the synthesizer is also straightforward.

All we need to do is turn the synthesizer on and setting memory protection to off (the synthesizer manual will explain how to do that). Once this is done, you can send envtest.syx from your computer with the amidi ALSA utility:

$ amidi --send envtest.syx -p <port>

<port> should be substituted with your midi device ALSA port. If you don't know it you can find it with amidi -l. Should be in the form hw:[0-9]+,[0-9]+,[0-9]+.

If this is successful, you should see that the voice names in your DX7 have changed. They'll be all empty except from the first one, which will be ENV TEST#0.

ENV TEST#0 loaded

ENV TEST#0 loaded on a Yamaha TX7, a desktop version of the DX7.

We now have our test voice loaded in the synthesizer. All we need to do is playing a note and recording the sound.

Recording a voice.

This step is also very basic. All that is required now is to connect the output of the DX7 to the microphone input of the computer, and record the output of a single note in raw format:

$ arecord -f FLOAT_LE -t raw -c 1 -r 44100 envtest0.raw

envtest0 contains a sequence of 32-bit floats (on a little-endian machine), one per each sample, -c 1 specifies mono recording.

It can be played back using aplay:

$ aplay -f FLOAT_LE -t raw -c 1 -r 44100 envtest0.raw

Now that we have a recording of the sound, we can finally start analysing it.

Plotting the voice.

Let's write a basic tool that we'll hack as we go: raw2plot.c is a simple C program that takes a raw file and converts it into a format understood by gnuplot.

Once it is compiled, we can generate a gnuplot output and plot it.

$ ./raw2plot < envtest0.raw > envtest0.plt
$ gnuplot
gnuplot> plot 'envtest0.plt' with lines

And you should see a window appear with your wave plotted.

Plot of recording

Plot of the signal

plot of the first two segments

gnuplot output at three different zoom levels: full recording, the signal recording, the first two segments of the envelope.

As you can see, something quite strange is going on at the start of the recording, most likely at the sound card level. After giving it some time for settling, a key is pressed and the sounds is recorded. As expected, the envelope goes to level 99, then to 50, and then to 80, and finally back to zero.

Even with a single plot we can derive the following conclusions:

The envelope is not linear. Appears exponential, and at parity of rate the rising of the envelope is faster than the decay.
Levels are not linear: level 50 is very close to zero, and level 80 is less than half of level 99.

In the next part we'll attempt at characterising this plot.

Source code and samples

All source code and samples used in this post are in the 'part2' branch of this GitHub project.

Part three is here

Full Article and Comments

The Yamaha DX7 Envelope Generator, part one.

Gianluca Guida , May 07, 2020

Silence, sine waves and trigonometry.

The Yamaha DX7, an FM synthesizer released in 1983, should need little or no introduction. Its sound can be heard throghout most of mid-to-late 1980s music.

If you're new to FM synthesis, I'd suggest reading Chowning's paper that introduced the concept: it's a fascinating and inspiring read.

The DX7 has six operators. Each operator is an oscillator with two inputs (amplitude and pitch), with the output sine wave modulated by a per-operator envelope. The fundamental operation of FM synthesis consist in using the output of an operator to modulate the pitch of another (or itself, referred to as feedback). An additional global envelope modulates the pitch of the note being played.

Two operators scheme

The basic operation of FM synthesis: one operator (Modulator) modulates the pitch of a second operator (Carrier) that produces sound. (Source)

A DX7 voice, or instrument, specifies each operator's (relative or absolute) pitch and amplitude, and defines how they're connected to each other, and which operator's output becomes sound. There are other parameters, but the fundamental characteristic of a voice is defined by the parameters above.

The Yamaha DX7 can hold 32 voices. It is possible to create and use new ones – it's a programmable synthesizer after all – and the never disappointing Internet has many new voices available for download.

The preset voices (that range from Piano, to Marimba, to Bass) show how flexible this synthesizer is. It is in the nature of curious people, then, to look at each voice and see how they're made. These are not samples after all, the operators synthesize these complex sounds from silence, sine waves and trigonometry!

This is precisely the beauty of FM synthesis: its theory and basic implementation are simple, and few lines of code produce powerful and unexpected results.

Analysing a DX7 voice.

In order to look at voices, we need to fetch them and look at their internals.

Both operations are quite easy. You can instruct the synthesizer to dump all 32 voices in bulk over MIDI and in Linux you can easily save them using amidi --receive. The format is quite simple, and well specified in the DX7 manuals.

Understanding how those parameters translate in physical characteristic is a bit more complicated.

The DX7 operator configuration is mostly in settings that go from 0 to 99, and understanding how these translate into sound production (pitch, amplitude) requires some fair amount of reverse engineering.

The operators connections (algorithms) and the frequency at which they operate are well known and easy to understand.

What's hard, and yet fundamental for characterising a FM sound, is understanding the parameters of the envelope generators.

Understanding envelopes.

As already said, there are seven envelope generators in a DX7 voice. Six for each operators, modulating amplitude, and one global, modulating the pitch.

An envelope in a DX7 has four segments. Three for attack (start of note to sustain) and one for the decay (end of sustain to end of sound). Each of these segments are configured by two paramenters, level and rate. Level (0-99) is the amplitude reached by the segment, rate (also 0-99) specifies how fast it will get there.

DX7 Envelopes

A DX7 envelope scheme as printed on the synthesizer itself. (Source)

At this point a lot of question arise, some make sense, some might not.

What kind of curve has each segment? (linear, exponential, other)
How level progress from zero to maximum? (linear, exponential, other)
How rate progress from zero to maximum? (linear, exponential, other)
How level (amplitude) is interpreted when used to modulate frequency of another operator? I.e., what amplitude is required to modulate the operator frequency by 1 Hz?

In order to get these answers, we'll have to look at waves.

Part two is here

Full Article and Comments