Now that I discussed a bit about bits and boolean operators, I will now start a series of talks surrounding data structures. In this first "episode", we will focus on different types of lists. Throughout the next few posts, if you find a class name, unless otherwise specified, I will be referring to the Java class with that given name (for example, today you might hear quite often about classes such as ArrayList and LinkedList). Even if you don't use or know Java, read these posts anyway... the stuff contained in these should not be too Java-centered (in the sense that most of the stuff we will discuss should apply to other languages/platforms that contain similar classes). As with all my written e-mails, forum and blog posts, I will try to keep this as short as possible (FYI, more often than not, I fail miserably at this "keep it short" policy... just see the previous post as an example). This article is part of a series of at least 3 posts that you should expect in the following few days. If everything works out as planed, I will talk about lists (current), sets and maps (the last one is still in "draft" mode and topic might change drastically - but maps will definitely be covered at some point).
First of all, some of you might be wondering why I mentioned before the ArrayList class, but not the basic arrays from Java. Indeed, Java uses for it's ArrayList and Vector classes arrays as the means of storing the data you add in the list (in case you didn't already know, the Java Vector class is just a synchronized version of ArrayList - in other words, only Vector is thread safe between the two). They are of course, just wrappers that manage two of the array's properties: the array length and the used portion of the array. If for example you needed a list of numbers, you could just use an array, remember how big the array is (the capacity - in some languages this is also needed... in Java for example, the array length is contained in the array object itself, but in C it is not) and also how many of the array elements have you already filled (the list length). Besides managing these parameters, the ArrayList class also handles array re-allocation whenever necessary (if no more elements fit in the array currently used by the list, a bigger array is allocated and the elements are copied from the old array to the new one). This re-allocation step is actually the downside of using arrays as support for a list: if you need to re-allocate too often, you will get a slower run time (we will get to the pros and cons of ArrayList vs LinkedList after a brief review of the linked list approach).
With LinkedLists however, you only need to allocate space for a new object every time you need to add to the list. This is why adding to a linked list is always done in constant time, no matter how many elements are in the list (as opposed to array based lists, where adding an element to a full array can lead to linear time - depending on implementation of course). As an implementation detail, for each element in the list, the linked list also contains a "pointer" to the next element of the list (in Java, LinkedList also contains a pointer to the previous element, thus making it a doubly linked list).
Now, before we discuss when to use an array based list vs a linked list, we will list the cost of a few basic operations performed on lists (by 'n' we will denote the number of elements currently in the list and by 'k' a random position within the list):
The first thing you might notice is the "amortized" cost of the add operation. For those of you that are unfamiliar with the term, it's just a fancy way of saying "the average of the first n operations". This average, of course, depends on the so called 'grow policy' which is implemented in your array list implementation (by how much to expand the array if no more elements fit in it). The two most common policies are:
1) growing by a fixed amount (for example, grow by 10 more elements)
2) growing proportional to the existing number of elements (for example, always double the array size)
Out of the two, the second one is the most common, since it provides the costs mentioned above. The first policy gives O(n) amortized cost as well as worst case. However, always doubling the array size can be a waste of memory, therefore some implementations offer you the possibility to chose between the two (the 'Vector' class from Java, as well as 'vector' from the C++ STL to name a few).
| Operation | Position | ArrayList | LinkedList |
|---|---|---|---|
| ADD | start | O(n) | O(1) |
| k | O(n) | O(k) | |
| end | O(1) amortized O(n) worst case | O(1) | |
| GET | start | O(1) | O(1) |
| k | O(1) | O(k) | |
| end | O(1) | O(1) | |
| REMOVE | start | O(n) | O(1) |
| k | O(n-k) | O(k) | |
| end | O(1) | O(1) |
The first thing you might notice is the "amortized" cost of the add operation. For those of you that are unfamiliar with the term, it's just a fancy way of saying "the average of the first n operations". This average, of course, depends on the so called 'grow policy' which is implemented in your array list implementation (by how much to expand the array if no more elements fit in it). The two most common policies are:
1) growing by a fixed amount (for example, grow by 10 more elements)
2) growing proportional to the existing number of elements (for example, always double the array size)
Out of the two, the second one is the most common, since it provides the costs mentioned above. The first policy gives O(n) amortized cost as well as worst case. However, always doubling the array size can be a waste of memory, therefore some implementations offer you the possibility to chose between the two (the 'Vector' class from Java, as well as 'vector' from the C++ STL to name a few).
Based on the table above, you can deduce a few situations where an array based list should be used:
Also, a few rules you should consider when choosing a linked list instead of an array list (this is actually why I wrote this entire article... I know a lot of people that use ArrayList in Java regardless of the situation they use it in and forget that LinkedList performs better in these situations):
I feel that I should give more details about when to use array vs linked lists, but most of the situations that pop in my head right now are just derived from the rules above. Just to summarize:
ArrayList: random access, OK for reasonable (or known) amount of addLast operations (also performs well when add operations are countered by removeLast operations, hence making it a decent implementation for Stack), NOT recommended when you have a lot of add operations
LinkedList: addFirst and removeFirst support (making it the ideal choice for almost any Queue), constant speed for addLast operations (no matter how many such operations occur), NOT recommended for some algorithms that require random access to the list elements.
- when you need fast access to random elements from the list (some sort algorithms will work equally well on linked lists as on array lists if you can use iterators, but some, like HeapSort, may be a lot slower on linked lists - although there are ways to avoid this issue in this particular case)
- when you don't need too many ADD operations (or you know a reasonable upper bound of ADD operations); e.g, you need to construct a sublist of a given list containing only the elements with a certain property (just set the initial array list capacity to the original list size)
- when you have a reasonable balance between ADD and REMOVE operations (this is actually why the Stack class from Java actually extends Vector - therefore, making Stack an array based list)
Also, a few rules you should consider when choosing a linked list instead of an array list (this is actually why I wrote this entire article... I know a lot of people that use ArrayList in Java regardless of the situation they use it in and forget that LinkedList performs better in these situations):
- when you need a lot of add operations (or an unknown number of add operations, then you just need to iterate over the elements of the list)
- when you need add or remove operations at the beginning of the list (in fact, in Java, LinkedList implements the Queue and Deque interfaces - thus highlighting the fact that the addFirst and removeFirst operations can be used efficiently on this class)
I feel that I should give more details about when to use array vs linked lists, but most of the situations that pop in my head right now are just derived from the rules above. Just to summarize:
ArrayList: random access, OK for reasonable (or known) amount of addLast operations (also performs well when add operations are countered by removeLast operations, hence making it a decent implementation for Stack), NOT recommended when you have a lot of add operations
LinkedList: addFirst and removeFirst support (making it the ideal choice for almost any Queue), constant speed for addLast operations (no matter how many such operations occur), NOT recommended for some algorithms that require random access to the list elements.
I also wanted to talk a little about stacks and queues, since they are just special cases of lists, but most of it was already covered. A special mention goes to the fact that linked lists can be easily used for both stack and queue operations. Just a reminder: stacks are lists that use add and remove operations at the same end of the list (doesn't matter if this is the beginning or the end of the list... the important part is the "last in-first out" behavior) while queues have a "first in-first out" behavior which is accomplished by adding at one end, while removing from the other end.
As for when to use stacks and queues (and when to use regular lists), it's hard to formulate some rules. However, here's a couple of tips:
As for when to use stacks and queues (and when to use regular lists), it's hard to formulate some rules. However, here's a couple of tips:
- Stacks:
- recursion: if you can describe the algorithm easily using recursion, CONSIDER a stack, since the recursion mechanism in all programming languages uses stacks internally (it's not necessary to use stacks in this situation, but it is a high probability)
- parentheses: if you have any mathematical expression or "open elements" and "close elements" whose meaning depend on the relative positions (not only () {} [], but also XML-style elements); in this case, stacks are almost MANDATORY (except in the case when you have a single, simple open and close pair of elements (e.g, only one type of parentheses), when actually the height of the stack is sufficient
- Queues:
- "fewest number of steps" - this may sound a little too general, but I think this covers all the situations I ever used a queue; if you need the number of state transitions or you are just interested in the "first object" that has a specific property (starting from a set of objects and using the fewest number of transformations), then you will most likely need a queue
Well, I guess that about sums it up. If you want to remember anything from this post, remember this: for any two given solutions for a problem (in this case, lists), one of them will *always* have situations when it's better than the other. This, however, doesn't mean that if solution A has advantages over solution B, then solution B also has advantages over solution A. Sometimes, solution B is just worse than A on all counts... and should only be used for small cases (if it's easier to implement, of course) - e.g, heap sort vs selection sort (heap sort is faster, but harder to implement, so on small enough cases, selection sort can be considered).
Coming back to lists, try to remember that linked lists *also* exist, even if array based lists are ok for most situations (and especially remember when linked lists are better than array lists and use them accordingly).
If you have anything to add, feel free to leave a comment.
Coming back to lists, try to remember that linked lists *also* exist, even if array based lists are ok for most situations (and especially remember when linked lists are better than array lists and use them accordingly).
If you have anything to add, feel free to leave a comment.