Designing a Looper on the Daisy Seed x Simple Touch

Teacher:
Vladyslav Lytvynenko
Date:
February 12, 2024

introduction

This lesson will cover the design behind [ INSTRUMENT NAME ], using the Simple Touch kit. The kit includes the Daisy Seed MCU, atouch sensor (MPR121), stereo in and out, pots, switches, faders etc.

Although at a first glance a looper might seem relatively simple kind of music gear, designing it revealed for me fair amount of challenges which in their turn inevitably led to some learnings which I’d like to share below.

Along the way we’ll learn how to:

  • work with DaisySeed SDRAM
  • read/write to audio buffer
  • modify playback pitch and speed with linear interpolation
  • prevent sonic artifacts when changing loop start and length
  • reuse same knob to control different parameters
  • use programming concepts like classes, references, declaration and definition, dependency injection etc.

Overview

The instrument we’re talking about has three independent loopers which are fed from the shared audio buffer of 7 seconds length so from a single piece of audio recording it can produce 3 layers of loops played with different parameters. Every layer has independent setting for:

  • loop start
  • loop length
  • playback speed/direction (-2x…2x)
  • decay (0 = one shot,  0…1 fading out on release, 1 = infinite loop)
  • volume
  • pan

Controls for the first four parameters are shared between layers: to change parameter of the particular layer operator holds corresponding pad while turning the knob. At the same time every layer has a dedicated volume/pan knob for more fluent mixing during the performance. In this case holding a pad changes the function of the button between volume (when released) and pan (while touched).

looper.png

Under the hood

The source code can be found on Synthux Academy Github page. It’s written in C++ using DaisyDuino library which is mostly used for controls. For the most part the code is platform independent so it can be reused on another boards, inside plug-ins etc.

I’ve structured the code in classes, which gave these building blocks:

  • buffer
  • loopers
  • control knobs
  • knob value routers
  • touch terminal

…organized this way:

Screenshot 2024-01-15 at 08.38.34.png

This is by no means the only right way of doing a looper. I hope this going to provide insights and explain some concepts. We’ll go through it step by step . Let’s start from the buffer.

Buffer

Audio requires significant amount of memory as for microcontroller - seven seconds of stereo audio requires:

2 channels * 7 seconds * 48000KHz sample rate * 32bit float per sample =~ 2.7MB

So it comes handy that Daisy Seed is equipped with 64MB SDRAM.  All we have to do is to tell compiler that the buffer has to be allocated in that  memory region. We’re doing this by adding DSY_SDRAM_BSS to the array declaration. As we’re working with stereo we’ll need to declare two arrays and then pack them in the two-dimensional array.

//Calculate buffer length
static const uint32_t kBufferLengthSec = 7;
static const uint32_t kSampleRate = 48000;
static const size_t kBufferLenghtSamples = kBufferLengthSec * kSampleRate;

//Allocate SDRAM memory for two channels
static float DSY_SDRAM_BSS buf0[kBufferLenghtSamples];
static float DSY_SDRAM_BSS buf1[kBufferLenghtSamples];

//Combine channels into two-dimensional array
static float* buf[2] = { buf0, buf1 };

<aside>💡 Reading through the code you’ll encounter types like uint8_t,  uint32_t . They are so called fixed width integer types  and their purpose as their name suggest is to guarantee exact number of bits for an integer. The number of bits for the integer declared as plain int  can vary depending on the platform. Daisy Seed board is built on STM32 microcontroller where integer (as well as float) is 32 bit so uint32_t will be the same as unsigned int and unsigned long int. One more type size_t is designed to be able to represent largest object size (including size of the entire array) and on STM32 is also 32 bits. You’ll notice it everywhere where big values involved like buffer size, index etc.

</aside>

To work with this buffer we’ll add a class Buffer, which has three main tasks:

  • provide convenient interface for reading and writing
  • ensure that we won’t overshoot the buffer size while reading
  • prevent clicks in the beginning and at the end of the recorded audio.

The Buffer is defined in buf.h header. As all the other classes in this project the header contains both *declaration* and definition so no .cpp files involved. This is done for simplicity.

Init

Looking at Init() method you'll notice it requires a pointer float **buf to the raw buffer. We’ll pass the pointer to the buffer we’ve just defined above. This is done in main .ino file’s setup() method. In software development this technique is called *[dependency injection](https://en.wikipedia.org/wiki/Dependency_injection#:~:text=Dependency injection aims to separate,how to construct those services.)*. In this case it allows the Buffer class to be platform independent as it doesn’t use any platform specific functionality and is not concerned with the way the buffer is actually allocated.

One more important detail in the Init()  method is that it sets the entire buffer to zero. I’d recommend doing it always as there’s no guarantee the provided memory is empty.

Read

Let’s move on to the Read(). Notice output arguments have type float& . That ampersand denotes that this is a reference. So they’re not float values themselves, but the pointers to their location. Another option would be passing a raw pointer like float* **directly, however reference (which is in fact just a [syntactic sugar](https://en.wikipedia.org/wiki/Syntactic_sugar#:~:text=Syntactic sugar is usually a,easier to type and read.) over the pointer) allows for cleaner code since we don’t have to dereference pointers before writing the values there.

Write

This part has a bit of the logic going on. The reason is that if we enable/disable recording instantly chances are we’re going to get nasty click in the beginning and at the end of the recording. To prevent this we need slight fade-in after we’ve started recording and slight fade-out once stopped. The fade length is set to 192 samples (see third parameter in the Init() method). This is 4ms at 48K sample rate, which is practically unnoticeable, but enough to prevent the click.

The fade in/out is driven by _state which has one of four values defined in the State enum : idle, fadein, fadeout and sustain.

<aside>💡 Note: State has type enum class, not just enum. The advantage of this type is that case values are not implicitly convertible to other types, e.g. we can’t assign State::fadein to int variable. Apart from that class enum cases are local to the enum e.g. to access fadein case we need to always type State::fadein. All this makes enum class way less error prone.

</aside>

Once the record pad is touched the _state changes to fadein and the _envelope_position increments until it reaches _envelope_slope value. After that _state jumps to sustain and sits there as long as recording pad is being held. Once pad is released the _state changes to fadeout and stays there till _envelope_position becomes zero which signifies entering idle state.

<aside>💡 Note: recording time in current implementation overshoots the time of holding the pad by 4ms.

</aside>

Length

The buffer has Length() method. The value returned depends on the state of the buffer - if the recording head haven’t yet reached the end of the buffer - it’s position will be returned as the length. Otherwise the full buffer length is returned. This ensures that we’re reading from the area where something exists.

Al right, this is all about the buffer, it’s time to jump into looper itself.

Looper

In the simplest scenario we could just read from the Buffer directly from the AudioCallback starting from the beginning once reached the end. However we want a bit more tricks there: variable loop start and length and variable speed and direction. These are implemented inside Looper class sitting in the **looper.h** header. Looking closer you’ll notice there are actually two classes defined there: Looper and Window. The latter is used by (and only by) the Looper , they are both parts of the same logic so it’s logical to put them side by side.

<aside>💡 The Window class is mentioned in the file twice: first time only as a name above the Looper and second time - below with the full definition. This technique is called forward declaration. Unlike .ino file, where the order of declaration can be any, in standard C/C++ files the entity should be declared somewhere above the line where it’s used (except properties and methods in a class/struct). On the other side Window in our case is sub-entity of the Looper and it’s better to have the “main” class at the top of the file. That’s the case where forward declaration does the trick. It’s like saying “here’s the thing you’re supposed to work with and I’ll give you more detail later”.

</aside>

Playback

Let’s consider basic scenario - just playing through the buffer from the beginning to the end. Although the output will be just continuous playback, that’s not how Looper sees it. Instead it operates with the series of successive crossfading windows. The reason for that we’ll clarify while discussing loop start and length settings. Window has triangular envelope and is short. Default slope duration is 192 sample which makes duration of 384 samples or 8ms at 48K sample rate. The Window class is responsible for the window behavior. The Looper  has an array containing several (minimum two) windows.  Let’s see how it works.

Once we start playback the first window gets activated. Upon activation window receives current playhead position, loop start and playback speed. The loop start parameter is only used when reading from the audio buffer - it’s added then to the playhead position. Playhead in it’s turn only circles between 0 and loop length. This simplifies logic. The window has its internal iterator and reference to the audio buffer. Once activated, it operates on it’s own. Looper only activates windows and sums their output.

In it’s Process() method the looper checks for the iterator position of the active window(s). If the window has reached it’s middle (IsMiddle() returns true), the looper activates a new one passing there playhead position of the currently active window as the start (see _Activate(float play_head)). Later in the Process() the looper takes outputs of active windows and sums them producing an output value.

Once window’s iterator reaches the end, i.e. 384th position in our case, the window marks itself inactive and is ready to be reused for the next activation.

Screenshot 2024-01-18 at 20.18.13.png

Window envelope

has a shape of equilateral triangle. Every sample read by the window gets multiplied by corresponding attenuation value depending on the window’s iterator position. The attenuation value is not calculated for every sample, but is taken from a lookup table (LUT) computed in advance. This technique allows significantly reduce amount of realtime computations, especially for complex envelopes/waves. Because of this LUTs are extremely common in audio programming. For the Window the table is generated by Slope() function and is stored in kSlope property.

static constexpr std::array<float, win_slope> kSlope { Slope<win_slope>() };

There’s a lot going on here. Let’s slow down for a second and zoom in a bit.

static word in this case means that kSlope is the same instance no matter how many instances of Window are created (in fact it’s not associated with window objects and exists even when no windows are created). It saves significant amount of memory. Otherwise every window created would have own copy of the table.

constexpr means that kSlope is calculated when program is being compiled, so even before it’s running on device.

std::array<float, win_slope> is one of standard C++ containers. It’s just a relatively thin wrapper around plain array. Essentially it’s the same as float kSlope[win_clope] , however this container provides many convenient features without additional performance cost so I’d highly recommend using it when possible.

There’s no = operator, but there are { }. In C++ it’s a recommended way of initialization. However in this case = would also do.

Finally Slope function call has this <win_slope> part. This is becuase Slope is a function template (the less correct name in use is “template function”).  This means that at compile time compiler going to generate a version of this function replacing all the occurrences of win_slope with concrete value, in our case it’s going to be 192.  It’s needed because the function returns an array who’s length has to be specified at compile time.

<aside>💡  Looper and Window are also templates - class templates. The reason I’ve done it this way is the same as for Slope function - they contain arrays and compiler needs to know array’s size at compile time and on the other side I wanted to make them configurable. I could use global constant or even macro, however in this case it would be visible to the entire program and I wanted to avoid that.

</aside>

The lookup itself is happening in private _Attenuation() method of the window.

Loop start and length

So far we’ve been talking about simple cycling from the beginning of the buffer to the end. Let’s add now start and length variation. Take a look at SetLoop()method of the Looper:

_loop_start = static_cast<size_t>(loop_start * _buffer->Length());

auto new_length = static_cast<size_t>(loop_length * _buffer->Length() / _delta);
_win_per_loop = std::max(static_cast<size_t>(new_length * kSlopeKof), static_cast<size_t>(2));

<aside>💡 You may have noticed that Length() method on the buffer is called using arrow -> syntax. It’s because the method gets and operates with the pointer to the  Buffer .

</aside>

Loop start is straightforward - just multiplies normalized input value by the Length() (see above).

The loop length is more interesting. It’s quantized to the half of the window (because windows are overlapping by half) with the minimum length of one whole window. This ensures that the loop always starts and ends with a fade.

One more thing to notice in this code is that loop length also depends on the _delta which is the speed of playback. We‘ll cover that in Speed and Direction section below.

What‘s happening, when we change the loop start during playback? As was mentioned above upon its activation  the window gets loop start parameter.  This value is kept unchanged for this window until the next activation. If we don‘t change the loop start, every window receives the same value which ends up in the contiguous playback. However when the loop start is being changed during playback, every window gets a slightly different loop start so it‘s going to read from different location on the buffer. But because windows have triangular envelope and are crossfading there will be no click during the change. Instead it‘s going to produce time stretching effect.

Screenshot 2024-01-23 at 16.40.43.png

Speed and Direction

As was mentioned before each window has an internal iterator and a playhead. Let’s look closer at what they are.

size_t _iterator; as its name suggests is responsible for traversing through the window. It is initialized to 0 upon the activation and is being incremented by 1 until it reaches 2 x 192 = 384. At this point the window gets deactivated.

float _play_head; on the other side is a read position on the buffer. It’s being incremented by _delta that we’ve mentioned above, which changes from -2 to +2 and is effectively speed and direction of the playback. Depending on _delta value the playhead will be caching up or falling behind the iterator. If let’s say both were initialized to 0 and _delta  is 1.34, in the end of the window iterator will be 384 whereas playhead is going to be 1.34 * 384 = 514.56.

This separation allows keeping windows chain is strict order while varying the speed and direction.

We just saw _playhead is a float , but as we know buffer as any array has only integer indices, so how do we read samples? By using linear interpolation. The principle behind is really simple - it’s finding a point lying on the straight line connecting two known values. Let’s say we have playhead at 3*.57*. The output value will be between samples at index 3 and index 4 with higher contribution from the latter because 3.57 is closer to 4 than to 3. Another name for this is weighted average. The value is calculated with formula:

$$S = A + Afrac * (B - A) ,$$

Screenshot 2024-01-24 at 21.30.35.png

The interpolation, reading from the buffer and applying the envelope attenuation is happening in the Process(Buffer* buf, float& out0, float& out1) method of the Window class.

//Get previous integer index by trancating fractional part
auto int_ph = static_cast<size_t>(_play_head);
//Extract fractional part
auto frac_ph = _play_head - int_ph;
//Get next integer index depending on the direction
auto next_ph = _delta > 0 ? int_ph + 1 : int_ph - 1;

//Read from the buffer
auto a0 = 0.f;
auto a1 = 0.f;
auto b0 = 0.f;
auto b1 = 0.f;
buf->Read(int_ph + _loop_start, a0, a1);
buf->Read(next_ph + _loop_start, b0, b1);

//Get attenuation for the current iterator position  
auto att = _Attenuation();
//Calculate output using linear interpollation formula
out0 = (a0 + frac_ph * (b0 - a0)) * att;
out1 = (a1 + frac_ph * (b1 - a1)) * att;

As we’ve seen above the length of the loop is quantized and measured in halves of windows. Also we know that movement through the window, and eventually through the loop is described by _iterator while _play_head may move faster, equally fast or slower than _iterator. What this means is that  the actual portion of the audio played back will have different length depending on the speed. Here’s the example of the actual playback length without compensation:

Screenshot 2024-01-25 at 16.40.21.png

To keep the playback length the same, we’re adjusting length inversely to the _delta before quantizing. This is happening in SetLoop() method of the Looper:

auto new_length = static_cast<size_t>(loop_length * _buffer->Length() / _delta);

Fading out

The remaining part of the looper we’re going to look at is fading out behavior. The looper can operate in three modes:

  • on shot; playback stops after playing the loop once no matter how long a pad is being held.
  • infinite loop;
  • fade out on release; the loop is playing as long as the pad is being held. On release it fades out. The length of fade can vary.

The mode setting is happening in SetRelease() method and applying is in Process() . The most interesting is fade out of release:

if (!_is_gate_open) {
  if (_mode == Mode::release) _volume -= _release_kof * _volume;
    if (_volume <= .02f) {
  _Stop();
      return;
    }
}

Notice that _release_kof is calculated in advance in SetRelease()  method. This reduces the amount of calculations necessary in time critical context which Process() method is.  The difference between SetRelease() and Process() in this sense is that the former is called from the loop() function whereas the letter is called from the AudioCallback().

<aside>💡 setup() and loop() functions inside an .ino file are part of the normal execution flow, whereas AudioCallback() is invoked by interrupt whenever a new portion of audio needs to be produced. As its name suggests interrupt pauses the normal execution flow, performs necessary work and then the normal execution flow continues. While code needs to be efficient in both contexts, the time of AudioCallback() execution is strictly limited comparing to loop() so whenever there are computations that can be moved outside and done in advance, it’s necessary to do so. When writing a code it’s generally good to keep in mind from which root function it’s going to be called.

</aside>

Another thing to point out in the above code is _volume <= .02f as a stop condition. The threshold of 0.02 is chosen here because  _volume -= _release_kof * *volume;* will produce exponential decrease of the volume which will be the less intense the lower is the volume. It will be asympthotic. As the result the looper can be running for long time producing effectively no audible result. Having threshold of 0.02 prevents this.

User interface