# BucketList<T>

March 21, 2021

TL;DR:`BucketList<T>`

is a data structure that performs exceptionally well for a certain usage scenario:

`n`

(list size) is large.- Fast index-based access:
`list.GetItemAt(i)`

.- Fast index-based removal:
`list.RemoveAt(i)`

.In my use case, this reduced the runtime

from 7 hours to 17 seconds!

## An Interesting Problem

In December of last year I took part in solving the Advent of Code programming puzzles for the first time. Since then I became a bit obsessed with that and started solving also the older events going back until 2015.

Day 19 of 2016 presented an **interesting challenge**
that took me a while to solve (with a reasonable runtime) and led to the solution based
on this `BucketList<T>`

data structure.

The basic idea behind this problem is that we have a:

- Circular data structure.
- Maintain a
**current element**. - Find the element
**opposite**of the current element. - Advance the current element by one.
- Repeat until the list has only one element left.

## A poor solution

My first approach was using a `LinkedList`

which performed OK for small list sizes.
But for large lists (e.g. 3.001.330 was my puzzle input) the **performance was dismal**.

```
List size: 3001330.
END after 26696.1142087 seconds.
```

That's a runtime of more than **7 hours** š¬.

The problem was the **linear search** for locating the **opposite element** in the list that caused
an `O(nĀ²)`

complexity.

Using a simple `List<T>`

(instead of `LinkedList<T>`

) performed equally bad because
in that case the **remove** operation causes a runtime complexity of `O(nĀ²)`

.

To sum it up:

Data Structure | Lookup | Removal |
---|---|---|

`List<T>` |
`O(1)` |
`O(NĀ²)` |

`LinkedList<T>` |
`O(NĀ²)` |
`O(1)` |

## A better (by a lot) solution

So I needed a data structure with **linear or better performance** for:

- Indexing
- Removal

My solution was a **custom data structure** I called `BucketList`

(ha ha) which:

- maintains the elements in a
**list of lists** - that have the same maximum size (hence,
**buckets**).

This way, when performing an index-based lookup I have nearly constantĀ¹ `O(1)`

access to elements
and removal is only limited by the `List<T>`

performance of a single bucket.

Ā¹ Actually `O(k)`

where `k`

amounts to the number of buckets which should be negligibly low compared
to `n`

.

To sum it up:

Data Structure | Lookup | Remove |
---|---|---|

`BucketList<T>` |
`O(k)` |
`O(n)` |

`k`

: number of buckets

`n`

: bucket size (list size `N`

/ `k`

)

This brought the runtime down dramatically!

```
List size: 3001330.
END after 17.0722168 seconds.
```

**17 seconds** ... way better š¤©.

Here's the full source code for BucketList.

Note: A bucket size of

`25.000`

worked best in that specific scenario but your mileage may vary, of course, and depends on the problem input size.

### Caveats

- I only used
`BucketList<T>`

for this specific problem so far. - The runtime performance depends on the selection of a
**bucket size**for a given list size. - The
`InsertAt`

case is not yet considered. - I'm sure that this data structure exists already with a different name. However, I came up with this on my own when solving my problem, so give me that ;-).