Tickle - how we designed a small and fast tween library for Unity

Published: 9 Aug 2025

I have recently started work on a lightweight and performant OSS tween library for Unity called Tickle.

Here is a preview of how you can move a transform using Tickle.

transform.LerpPosition(start, end, duration).Start();

In this article, I will like to talk about the design ideas behind Tickle, and a little bit about how they are implemented.

Why Tickle?

First and foremost, Tickle is here to improve the Unity developer experience when creating and sequencing tweens (i.e. animating moving objects). Currently there are several ways where this could be done without Tickle:

Using Unity's built-in animation systems (e.g., creating an "Animator" asset, or other latest methods). This introduces the creation of too much data when only simple object movements need to be made.

Using Unity coroutines, which is a more elegant, code-centric approach as compared to using animation timelines. But things are slow when there are a large number of coroutines.

IEnumerator Coroutine() {
    float time = 0, duration = 10;
    while(time < duration) {
        transform.position = Vector3.Lerp(start, end, t);
        yield return null;
        time += Time.deltaTime;
    }
}
StartCoroutine(Coroutine());

Using third-party tweening libraries, such as DOTween, LeanTween, PrimeTween or LitMotion. They are code-centric solutions that abstracts a lot of the underlying mechanics (less repeating code) and are more performant than Unity coroutines.
```
// Using DOTween
transform.DOPosition(start, end, duration);
```

So why would I want to write my own tween library instead of using the existing ones?

In Search of Simple Beauty

Let's take a look at these 2 types of keyboards. First, this is a 100% keyboard with all its bells and whistles. It is big and heavy. Not very portable. It consists of a lot of keys that we wouldn't use most of the time, but you can be assured that they're there when you need them.

Figure 1. I am not saying that this is a bad keyboard by any means. It's just... heavy.

Next, consider the Neo65. It is a 65% keyboard. It is small and minimalistic. It has a very clean aesthetic. It cotnains only the keys that I would frequently use, and none of those that are out my mind most of the time. It frees up my desk space, as well as my mind space.

Figure 2. The handy Neo65 with a plain white aesthetic. This belongs to someone else (Reddit).

At the same time, it is highly customizable, both in terms of hardware and software-driven configurations. As nothing is soldered onto each other, I can easily swap out the hardware parts. As for keys that are missing in a 65% keyboard, I can easily map to specific key combinations via software. It becomes a tool that fits me very, very tightly.

Figure 3. The Neo65 on my desk, looking different from the plain one shown above, as it is customized for my specific use cases and taste. I am a proud owner.

We love simplicity in design (which in itself requires deep understanding of the requirements). Like a Neo65 keyboard, Tickle aims to be the tweening library that is fast and extensible, while keeping said minimalistic appeal.

There are already many tween libraries out there, but we want to create a library that

Offers only a light set of API for the most intuitive and common features, such that it can be easily extended/customized to your own needs
Does not contain compulsory external dependencies
Is beautiful to read and easy to reason, even if at the low cost of very little GC allocation, and yet...
...could still be trusted for relatively high performance, which could optionally be made even better with Unity's Job System!

DOTween, PrimeTween and similar libraries are packed with too many features that I don't use. The bloat from these libraries weigh on my mind when I know there are pieces of code sitting in my project that I would never use and never even be aware of.

Meanwhile, LitMotion is reliant on the Burst compiler as an external dependency. While we appreciate the focus on performance, not all developers would welcome the idea of installing an extra dependency just for tweening objects. Tickle seeks the performance gain of the Burst compiler, but keeps it as an optional dependency.

Clean API intuitively described by underlying data structure

First, an ITickle simply represents a lerp process, like this:

ITickle tickle = transform.LerpPosition(startPos, endPos, duration);

It is common for tween libraries to allow multiple of such processes to be grouped together to be ran concurrently. You can also "chain" them together to run in a sequence (i.e., one after another). However, the syntax can get slightly overwhelming for someone new to the library.

Below is an example from DOTween:

UISequence.
    Append(middleText.fontMaterial.DOFloat(0f, "_FaceDilate", 5f)).
    AppendCallback(() => progressParent.gameObject.SetActive(true)).
    Append(progressParent.materialForRendering.DOFade(1.0f, 2.0f)).
    Append(middleText.fontMaterial.DOFloat(-1f, "_FaceDilate", 1.0f)).
    AppendCallback(() => { for (int i = 0; i < arms.Length; i++) arms[i].gameObject.SetActive(true); }).
    Join(leftGuide.rectTransform.DOAnchorPos(new Vector2(-770, -385), 3.0f)).SetEase(Ease.Linear).
    Join(leftGuide.rectTransform.DOLocalRotate(new Vector3(0, 0, -90), 3.0f))
    AppendCallback(() => progressParent.gameObject.SetActive(false));

While Tickle still aims to support a similar API for those who are familiar with DOTween (by far the oldest and, hence, the most used of all tween libraries), there is another way to represent "concurrent sets" and "chained sequences" in a more explicit and intuitive manner to newcomers.

Let's start with representing a set of concurrent lerps, or ITIckles. We can simply use an array to group them together as a TickleSet.

// An array of Tickles simply makes a "TickleSet"
TickleSet tickleSet = new ITickle[] {
    transform.LerpPosition(startPos, endPos, duration),
    transform.LerpScale(startSize, endSize, duration)
}.Start(); // Starts lerping all ITickles concurrently

And if we want to chain TickleSets together into a sequence, such that each TickleSet is played one after another? We simply group the TickleSets together in an array, or a 2D array of ITickles!

// Creating a TickleChain
TickleChain tickleChain = new ITickle[][] {
    new ITickle[] { // TickleSet 1
        transform.LerpPosition(startPos, endPos, duration),
        transform.LerpScale(startSize, endSize, duration)
    },
    new ITickle[] { // TickleSet 2
        transform.LerpRotation(startRotation, endRotation, duration)
    }
}.Start(); // Starts playing the TickleSets in ordered sequence

Take note that you can effectively chain an ITickle, rather than a TickleSet, by only chaining a TickleSet of size = 1, as shown in the example above.

If you wish to use an API that looks similar to DOTween, you can now do so with confidence gained from intuitively understanding what is lying under the hood.

var tickleSet = new TickleSet()
    .Join(transform.LerpPosition(startPos, endPos, duration))
    .Join(transform.LerpScale(startSize, endSize, duration));

var tickleChain = new TickleChain()
    .Chain(tickleSet)
    .Chain(transform.LerpScale(startSize, endSize, duration))
    .OnComplete(() => Debug.Log("Finished with sequence"));

tickleChain.Start();

Hot-swappable wrapper around an efficient core

If you find my API design suggestions (such as, with regards to syntax for chaining and grouping lerps) to be too opinionated and you have other preferences, you might be glad to know that, just like changing the key switches to a hot-swappable mechanical keyboard, you can replace the Tickle API layer with something of your own!

The Tickle API is actually just a wrapper around a lightweight core system that handles lerps for different data types.

Figure 4. Diagram featuring a lightweight core system and a hot-swappable wrapper providing the API for the rest of the game code.

To give you an idea, this is how a wrapper like the Tickle API may interface with the core "Lerp System".

using Tickle.Lerp;

public class Tickle<T> : ITickle
{
    private int _lerpId;
    private T _target = 0;
    private bool _isDone = false;
    private System.Action _onComplete;
    
    private void Start()
    {
        _lerpId = Lerp<T>.Create(ref _target, ref _isDone, start, end, duration);
        Lerp<T>.Start(_lerpId);
    }
    
    private void Update()
    {
        Debug.Log(_target);

        if (_isDone)
            _onComplete?.Invoke();
    }
}

Note that the value to be lerped (i.e. _target) and the done handler (i.e. _isDone) are passed by reference (using the ref keyword) to the underlying Lerp System to be modified directly. This way, the Lerp System will not need to call any methods in the wrapper layer to update any values or to notify completion, and hence remains agnostic to any wrappers that may sit on top of it.

At the same time, we do not need to pass any delegates to the Lerp System for updating target values or to notify completion, which means we may fully work with unmanaged data in contiguous memory and avoid heap allocation. And this brings us to the next section...

Memory management strategies

One of the keys to the Lerp System's performance is in memory management. In general, we want to achieve zero heap allocation, keep data in contiguous memory to facilitate quicker search / iteration, and avoid overheads such as GC spikes or boxing.

Avoiding heap allocation and GC with unmanaged structs

Firstly, each lerp process is represented as an unmanaged Lerp<T> struct. An "unmanaged type" is a type that is not allocated on the heap and hence is not managed by the garbage collector. By avoiding heap allocation, we also avoid the overhead from garbage collection, which could become significant if we are dealing with many lerp processes.

We use a struct, because a class is immediately recognized by the compiler as a managed type. However, we also need to take note that every property of the struct needs to be unmanaged too in order for the struct to also be considered unmanaged.

public unsafe struct Lerp where T : unmanaged {
    public int Id;
    public T* Target;
    public T Start;
    public T End;
    public float Duration;
    public Ease EaseType;
    public float ElapsedTime;
    public bool IsRunning;
    public bool IsDone;
    private bool* _doneHandle;
}

Note that the generic type T of our lerp "target" appears to always be unmanaged anyway. For e.g., we only lerp primitive or value types like floats, ints, or UnityEngine.Vector3.

We also avoid storing easing functions as delegates, or working with delegates to update our target values and completeion status of the lerps. This is because delegates are managed objects, and containing them would force the Lerp struct to become unmanaged. In place of delegates, we used an enum (i.e. Ease) to identify the easing function to apply during frame updates. And to update target values and completion status, we use raw pointers (i.e. T* Target and bool* _doneHandle).

Avoid cache misses with contiguous memory

Another benefit of working with unmanaged data is that we can store them in contiguous memory, which removes the overhead from cache misses. This is important for us, as we will be spending a lot of time iterating through (potentially) a lot of data.

public class LerpRunner<T> where T : unmanaged {
    // Note that Lerp<T> is an unmanaged struct
    private Lerp<T>[] _lerps;

    // Potentially iterating through MANY lerps on every frame
    public void UpdateFrame() {
        for(int i = 0; i < _lerps.length; i++)
            _lerps[i].Update();
    }
}

But first, let's talk about what is a cache miss?

When you declare an array of class instances, like this:

MyClass[] arr = new MyClass[1000];

you are creating a contiguous memory block of 1000 items ("contiguous" = sitting side-by-side in memory). However, note that what actually gets stored contiguously is an array of pointers, and not the MyClass instance objects themselves.

In contrast, the objects that these pointers are pointing too are scattered throughout different locations in memory. This scatteredness depends on the time when each object is created and the availability of memory blocks during the program's runtime.

Figure 5. Referenced objects being scattered throughout memory

In order to speed up data processing, modern CPUs would load commonly used data from RAM into the CPU cache. Data that sits together side-by-side in RAM would be loaded as a chunk into CPU cache. This chunk is known as a cache line, and it usually has a fixed length.

However, since objects are scattered in an unordered manner throughout the RAM, "over-fetching" would occur, which means there are unused data being loaded into the CPU cache.

Let's illustrate this better by taking a look at Figure 5 again. Imagine that we are now iterating through the array of MyClass pointers, like this:

for(int i = 0; i < 1000; i++) {
    var myClass = arr[i];
    // Do something with myClass instance
}

The first entry "Ptr-0" would lead us to load the cache line [Obj-0, Obj-3, ...] into the CPU cache. Due to the limited size of the cache line, Obj-1 and Obj-2 are not fetched as they sit too far out in memory. We would only have a cache hit of Obj-0 in this fetch. Then, when we want to load Obj-1, we will find that it is not in the CPU cache (this is known as a cache miss), which will prompt the CPU to fetch a new cache line from the RAM again.

You would imagine that this defeats the purpose of having a CPU cache as we would be constantly reading from the RAM anyway when we are iterating through the array.

However imagine again that now, instead of storing an array of pointers, we directly store an array of the unmanaged data (which in our case would be the Lerp<T> structs), as below:

Lerp<T>[] arr = new Lerp<T>[1000];

for(int i = 0; i < 1000; i++) {
    var lerp = arr[i];
    lerp.Update();
}

These data are sitting contiguously, that is, they are seated side by side each other in an ordered manner in RAM. So, when we load them into the CPU cache, we will definitely load consecutive entries in a single cache line.

Figure 6. Data stored in contiguous memory in RAM leads to less cache misses

As consecutive data are fetched together into the CPU cache, the program can iterate through the array with less fetches from the RAM. This leads to a significantly faster runtime!

This data-oriented approach to designing systems pays attention to spatial-locality and it is a common idea behind the design of many game engines. In fact, this is the idea behind Unity's ECS framework.

One extra thing to note is that, instead of storing our Lerp<T> structs in an array, we are actually using Unity's NativeArray, which is more compatible with Unity's Burst compiler / Job system.

NativeArray<Lerp<T>> lerps = new NativeArray<Lerp<T>>(64, Allocator.Persistent);

Further works

These only scratches the surface of the technical implementation of Tickle, but hopefully it is enough to inform you of our data-oriented and minimalistic approach to designing the core system and API.

There are definitely more works in the plan, for example, optimizing data insertion, deletion and search with sparse sets, and also adding support for user-defined easing functions and more! Perhaps they can be part of another blog post.

Thank you for reading! I only do this for personal learning and fun. But if you find this to be a helpful tool, do consider showing your support by dropping a small tip! Cheers!