Recently, together with my colleagues Parinya Chalermsook, Mayank Goswami, Kurt Mehlhorn, and Thatchaphol Saranurak, we published three papers about binary search trees. All three are available on ArXiv as preprints. In this post I briefly and informally describe some results and ideas from these papers.
Binary search trees (BST) are perhaps the simplest data structures that are not entirely trivial, and they form the basis of several more advanced data structures and algorithms. BSTs are part of every computer science curriculum, and there is a rich theory describing their properties. Surprisingly, many aspects of BSTs are still not understood, so they are an active field of research.
Searching a single element in a BST is straightforward, and the cost of the search depends on how far the searched element is from the root of the tree. If we search more than one element, we might want to change the tree after each search, to make future searches more efficient. What is the best way to transform the tree after each search? The splay tree of Sleator and Tarjan from 1983 gives one possible such strategy. Splay trees have many nice properties, for example, they are as good as the best static tree for a given sequence (“static” here means not changed between accesses, and I am sweeping some details under the rug here). Sleator and Tarjan conjectured that even more strongly, splay trees are as good as any “dynamic” tree, i.e. any sequence of changes in the tree between accesses. This last statement is the famous dynamic optimality conjecture. If true, this would be quite surprising: splay trees just “react” to each access, without knowing what accesses will come next. The conjecture says that knowing the future doesn’t help much. Even the best strategy for changing the tree (tailored to a given sequence) cannot beat splay trees (again, some details omitted here). The conjecture is wide open. For more information about such questions I recommend the survey of Iacono. Now our recent results.
Splay trees are considered quite mysterious: they execute some local transformations that just “happen to work”. We know that splay trees have many nice properties (including the static optimality mentioned before), but the proofs are a bit unintuitive. What is special about splay trees that gives them these properties? Are there other algorithms that have similar properties?
So we look at a general class of algorithms that preserve some properties of splay trees and relax others. Such an algorithm accesses an element in a BST, then transforms the search path into some tree, such that the accessed element becomes the root. We identify some simple conditions of this transformation that are sufficient to guarantee that the algorithm shares some of the nice properties of splay trees. We look at splay trees in a different way, that makes it obvious that they fulfill these conditions. We also show that some other known algorithms and heuristics fulfill these conditions, which explains why they are efficient. We also identify some new heuristics that fulfill the conditions but have not been studied before. Finally we ask, are our conditions necessary for a BST algorithm to be efficient? In a limited sense, we show that they are, but the results in this direction are not conclusive. See the paper for details.
I mentioned that splay tree is conjectured to be as good as any other BST algorithm, even those that can see into the future, for any possible sequence. However, for most “random” sequences, even the theoretically best BST algorithms are quite inefficient, making the question vacuous in that case. The interesting cases are those sequences that have “some useful structure” that BST algorithms can exploit. So, if we want to show that splay trees (or some other algorithm) are efficient, we need to show that they do well on such “structured” input.
The literature on dynamic optimality describes a number of such “useful structures”. One example is “dynamic finger”: loosely speaking, if successive searches are for values that are close to each other, then search should be efficient. A special case of this is “sequential access”: if we just search the values in increasing order from 1 to n, then search should be really efficient.
In this paper we describe a new kind of structure, that has been studied in depth in mathematics, but not in the context of BSTs. We show that search sequences should be executable efficiently by a BST algorithm if they are free of some fixed pattern. See this page for a description of patterns. This generalizes the “sequential access” mentioned before in a different direction, and it also includes other known structures as special case.
We explore this topic in detail in the paper, giving some general results, and some stronger results for special cases.
Again, studying input with some structure, we look here at a sequence of insert and delete operations into a BST, with the restriction that we can only delete or insert “at the two ends”, that is at values that are the current maximum or minimum of the values in the tree. Such sequences are called deque sequences. It was conjectured a long time ago by Tarjan, that splay trees can execute deque sequences very efficiently (in linear time overall). This is not yet known, but the known results are so close to this, that the difference is only theoretically interesting (which makes it even more interesting :). In this paper (and the previous one) we look at a different algorithm, introduced independently by Lucas and Munro a few years after splay trees, and later extended by Demaine, Harmon, Iacono, Kane, and Pătraşcu. We show that this algorithm (called Greedy BST) is almost linear on deque sequences. This “almost” again hides a small factor that is only of theoretical interest, although a little bit larger than the corresponding factor known for splay trees. The details are in the paper.