It should be clear that the implementations in language libraries are based on but different from the abstract data types in computer science. This sometimes can be the source of confusion. For example, dynamic array is implemented in C++ STL as vector
, in Java as ArrayList
, and in C# as ArrayList
/List
. Specifically, the matching between ADT and C# classes:
ADT | C# Type |
---|---|
Array | System.Array |
Linked List | System.Collections.Generic.LinkedList<T> |
Dynamic Array | System.Collections.Generic.List<T> System.Collections.ArrayList |
Type implementations tend to be more powerful than the original ADTs as well. For example, counting the linked list ADT takes linear time but in C# LinkedList
the time cost is constant because the count is already cached.
This posts focuses on the three ADTs and also briefly covers when to use which specific type in C#.
Imagine we have a task to store five numbers. Which ADT and its corresponding class should we choose?
First, let’s consider array.
Array is a continuous block of memory to store multiple elements. It is arguably the most straightforward model. We put the numbers next to each other and that’s pretty much it. If we know the elements’ type (int
in our example), we know how many bytes each element takes. Therefore, the cost to fetch an element at a certain position n is constant: start from the memory block’s first byte, add n * size offset and there’s the target element.
If we know how much memory we need, i.e. how many elements to store, and we only need read operation, array works perfect.
What if we don’t know the number of elements? What if we want to add new elements and remove elements often?
Removing an element takes linear time as we need to shift all elements after the now vacant position to keep the memory block continuous. Otherwise we cannot retrieve element efficiently. Of course, in the best case we are removing the 5th (last) element and no shifting is required.
Adding a 6th element sounds not very different from removing. But remember that memory block is already allocated, so we cannot simply just take more memory for the new element. For all we know the next byte could already be in use. The only way to do this safely is to search for another continuous memory block available that can hold 6 elements and move the whole array there with the new elements added. This means adding element costs linear time, even when just adding to the end of the array.
It should be obvious that when the number of elements change often, using array is not optimal.
Dynamic array solves some of the problem by keeping redundant memory for possible adding operation. A common approach is to make the capacity twice as the current array size when a resizing happens. For example, when the 6th element is added to the initial array storing 5 numbers, we find an available memory block with the capacity to hold not just 6 elements but say 12. By doing this, if we keep adding elements, the resizing is not required until 12 elements are stored. This does not fully solve the moving problem, but just reducing its frequency by large is already a huge gain. Also note that dynamic array is not performing better when removal occurs, as the shifting is still necessary to keep the continuous block.
What if we do not bother keeping this continuous block of memory, but store the elements in a scattered manner? This is linked list.
Linked list takes an opposite approach to array. It does not require a continuous memory block, hence we can avoid the problem of elements shifting during removal and moving elements to larger memory block during adding. Here’s how linked list stores the 5 numbers. For each element, it does not only store its value, but also a pointer to its next element. If it’s a double-linked list, each element also stores a pointer to its previous element. This means each element is stored in different position in memory but we can always find the next (and previous, in double-linked list) element via the pointer.
Now adding and removing element becomes trivial. To remove an element, we do not need to shift all the elements after it because we don’t maintain a memory block anymore. Instead we update the pointers to let the previous element point to the next element, skipping the now removed element. (Again, for double-linked list the pointer to previous element is also updated) To add an element, we can simply store the new element in any memory available and update the pointer(s) to point to this new element. No moving is involved.
But the big problem of linked list comes up when you want to act a simple job like retrieve the third number. With array and dynamic array, you can get this job done in constant time with the index. But with linked list, the only way to find the third number is to start from the first number, then visit the second number, then the third, using the pointers. In other words, accessing element at a certain position takes linear time in linked list. Also, you might want to be cautious when your linked list contains large number of elements as each element storing its pointers will cost more memory.
So, here’s the take away: Array is good for scenarios where size is fixed and it has great reading performance. Linked list is good for scenarios where elements are removed or added frequently but its reading performance is not ideal. Dynamic array sits in between. It provides reading performance almost as good as an array but also does not require constant moving when elements are added.
On a side note, there is another variant of dynamic array. In this implementation, when an array’s capacity appears insufficient, instead of moving the array to a larger space, only new elements are stored to the new space with a pointer attached to the previous array to locate the new space. This may remind you of linked list.
Lastly, let’s talk about which you may consider to use in day to day C# coding.
In C#, everyone loves and are familiar with List
. It provides satisfying performance in most cases and its API is easy to use. If you don’t know much about the data being stored, List
is the safe option. Its performance should be good enough in most cases as it uses dynamic array under the hood.
Array
is frowned upon by some for various reasons, unless the developer is certain that the size will be fixed and the performance is crucial.
LinkedList
is often forgotten when some kind of list structure is required by C# developer. In fact, LinkedList
does not implement the IList
interface. Why? Because IList
represents a collection of objects that can be individually accessed by index. That simply is not how linked list ADT works. But LinkedList
could be helpful, as pointed out above, when insert (in the middle or prepend, as list
actually does append really well) and removal happens often and when elements are accessed mostly via enumerating instead of index.