Skip to main content

Command Palette

Search for a command to run...

A "perfect" data structure

Updated
8 min read
A "perfect" data structure
S

I’m a software developer and founder of Illucrum, a small IT company focused on helping businesses bring their ideas to life. I create websites and mobile applications that are easy to use, straightforward to maintain, and built with clean, well-structured code. I work closely with you to understand your goals and explain technical choices in a clear, simple language, so you always know what we’re building and why. The aim is to create something that works well, fits your needs, and avoids unnecessary complexity.

If you have an idea or a challenge to solve, let’s talk and explore what we can build together. Your project can start simple — and grow with your business.

Algorythms, data structures and maths. Those are the three legs of the tripod that holds programming. Today, let’s focus on data structures. You know how each one has it strengths and weakneses, each one has it’s own use cases. A “perfect” data structure should only have strengths and no weakneses, but it should also be useful in all situations. Which is of course ridiculous. There is just no way you can have a data structure that would be able to replace a regular list, just the same as it would be able to replace a tree. Although it’s worth mentioning that a binary tree can be implemented with an array. In this article, I’ll explore the posibilities of coming closer to the “perfect” data structure by trying to design one that would behave like a list while assuring a O(1) complexity for all relevant operations.

Algorythmic complexity

Although this article is not about complexity, let’s briefly explain what it is about. The complexity of an algorythm is used to predict how an algorythm will perform depending on the input size. This is very important for data structures that can hold millions of entries. So for example, if we assume a simple node list of which we only have direct access to the first node, if we want to add a new item to it at the beggining, no matter how long the list is, it will always cost the same ammount of resources, which is very good. But if we want to add a new item at the end, we would need to iterate through the whole thing. In this case the time needed to add this new item would be roughly directly proportional to the length of the list.

In some other situations, things are not always that simple. Sometimes the resources needed don’t depend only on the size of the input. Which is very common for sorting algorythms, for example. In those situations, normally the worst case scenario is the most relevant. And this is where the big-o notation comes in, as a way to represent the complexity of an algorythm in it’s worst case scenario. From the previous example of the node list, adding an item at the beggining presents constant complexity, which is O(1). This basically means, it always needs the same amount of resources, no matter how long the list is. Adding an element at the end, presents linear complexity, which is O(n), where n represents the lenght of the list.

The “perfect” list

Goals

First, let’s define the goals that the list I’m trying to describe should meet. The goal is to have a data structure on which we can perform these operations with O(1) complexity:

  • Retrieving any element given it’s index

  • Adding a new element on any position

  • Removing any element

As we can retrieve any element given it’s index, there should be of course no problem with iterating the list. So this goals should really cover it all.

Restrictions

This list will have one very important restriction. It will have a set maximum size. This is sadly necessary, I don’t see the possibility of avoiding this restriction.

Structure

It’s clear to me, that in a way, what I’m trying to achieve is to create something like an “indexed hash table”. This seems to be the most promising approach. To start with, let’s set the maximum size to 4 elements. This should give as a list big enough to see more clearly how it works, but also small enough to make it quite simple to simulate. The list will have the following components:

  • The buckets: there should be as many buckets as many elements the list can hold at the same time. In our case it will have 4 bucket. Each will have it’s own identification binary number assigned to it, so we will have buckets 00, 01, 10 and 11. The buckets will really behave like wrappers of the elements stored in the list, and they themselves will be stored in a hash map, using their identifiers as keys. This will assure that we can access any element given it’s bucket number with constant complexity.

  • A size variable: there should be a variable to keep track of the current size of the list. Nothing special about it.

  • The state number: the buckets hold the elements, but it’s important to have a way of keeping track of how the buckets are odered. As our list can hold at most 4 buckets, and each has a 2 bits long key, an 8 bit long number will be used , which will contain the four buckets identifiers concatenated, in the order in which the buckets are ordered. Let’s consider the least significant two digits represent the first bucket, the two most significant digits represent the last bucket on the list.

The size variable, is there mostly to make sure the inputs are valid. Most of the magic will happen with the state number, as by manipulating it the order of the buckets will be read and written.

So for example, let’s say we start with a list like this:

State numberSizeElements ordered from left to right
00 01 10 1131 - 2 - 4

This would mean we would have a list with three elements, bucket 11 holding value “1“, bucket 10 holding value “2“ and bucket 01 holding the value “3“, while the bucket 00 remains empty. So now let’s say we want to add a “3“ between the “2” and the “4”. We would put the value “3” into the empty bucket 00, and place it between the buckets 01 and 10. This would give us:

State numberSizeElements ordered from left to right
01 00 10 1141 - 2 - 3 - 4

Now, every bucket has a value attached to it. If we wanted to remove the first value (index 0), we would empty the bucket 11, and move it to the “end”. So our list should look like this:

State numberSizeElements ordered from left to right
11 10 00 1032 - 3 - 4

Retrieving any element given it’s index

This is fairly simple. It’s sufficient to first check if the given index is allowed. Obviously you can never get the third element from a list with only two elements in it. For that we have the size variable. Once the given input is checked, we need to check which bucket is located at the indicated index. We can do that using the following formula:

$$b = {s \bmod 2^{2*(i+1)} \over 2^{2*i}}$$

Where b is the bucket number, we of course ignore any decimals; s is the state number and i is the index.

This way, no matter how long the the list is, we can retrieve any element by “just“ calculating the proper bucket. So it presents O(1) complexity.

Adding a new element anywhere

First things first, we of course need to validate the input. After that we should get the last bucket, which will be always empty if the list is not full. This can be done using the previous formula and giving the index the value of 3. Once we have the bucket we can set it’s value. The we need then to modify the state number accordingly. To do so, we can use the following formula:

$$s\prime=s mod 2^{2i} + b * 2^{2i} + {{s/2^{2i}} * 2^{2(i+1)}}$$

Where s’ is the new state number, we only care about the binary eight least significant digits; s is the old state number, i is the index and b is the bucket we are using.

And of course we need to update the index.

This time it’s not as simple as before, as we have to perform two calculations, not just one. But still the algorythm is independent from the lenght of the list, giving it the desired complexity.

Removing any element

By now it should be more or less clear how this will look like. First we need the make sure that the index we are given is valid. Then, we will retrieve the bucket for that index in the same fashion that we did before, so we can empty it. And then we can use the following formula to put the bucket at the “end”:

$$s\prime=(s-smod2^{2*(i+1)}/2^2)+(smod2^{2i})+b2^6$$

Where s’ is the new state number, s is the old state number, i is the index and b is the bucket.

And last but not least, we of course need to update the index.

And also in this case, it’s enough to perform two calculations to remove any element with complexity O(1).

Final thoughts

This list idea, surely could be optimized. For example, as the number of states is finite, no calculations are really needed. It would be enough to map the whole thing, basically making a finit state machine. Nevertheless, this solution has a big drawback. And that is the maximum length.

Nowadays, we are using 64 bit processors. The biggest list could have at most 16 elements, each with a 4 bit id, which gives as a state number of 64 bits long. This would allow us to perform the calculations easily. Surely it’s possible to do maths with numbers longer than 64 bits on 64 bit processors, but something tells me that this wouldn’t really work fast enough. But maybe I’m mistaken.

On the other hand, designing finite state machines should always work quite well, but that would be very memory heavy. And of course it would also take a lot of time to design and then implement. But maybe it is worth a try?