## Interval trees in EDS

### July 17, 2008

Finally, I have finished implementation of inteval trees. Less or more, I have followed the book Introduction to Algorithms, 2nd edition. My implementation is based on red-black trees, too. EDS is often asked for events in some time range (interval). In times before interval trees, we had to go through all events, generate recurrences and test, if the event is in this range. With interval tree, we can do it much faster.

The data in the tree (events in our case) are accessed by interval queries. Underlaying data structure is a red black tree with the low endpoint chosen as a key. To do efficient interval search, each node needs to maintain some additional data – maximum (**max**) and minimum (**min**) value of any interval endpoint stored in the subtree rooted at x. It is needed to update **min** and **max** fields when performing rotations, inserting and deleting new data.

Basic operations, like insert, delete and search, are simple, as we already know about red-black trees. They can be done in logarithmic time. The interesting operation is “search all”, which should return all events in given interval. To search for events, we start in the root of the tree and check if we should go left or right – the **min** and **max** fields of the left and right son help us to make correct decision. Sometimes it is needed to go in both directions. Therefore, time can be O( min(n, k lg n) ), where k is the number of events in output list and n is the total number of events.

If someone is interested in implementation, here are some differences between my implementation and book:

- The code in the book is keeping only
**min**field, not**max**(**max**field help us to make more effective decisions) - We need to store different events with the same interval and search by UID of the event
- We need to work with open-ended intervals (special value -1)

It seems like instead of using -1 as a special value, you could use the minimum and maximum range values of the integer type you use for time stamps. That would remove the need to handle some cases specially and would make the API and code easier to understand.

Why do you keep max? Is it to detect early whether a range does not have any events? In other words, to avoid navigating to the leaves of the tree when searching for time ranges with no events. If so, is the tree deep enough to make it worth keeping additional data on each node?

yes, I could use maximum range instead of special value -1. I wanted to avoid using maximum time value – would you prefer it ? I realized, that code would be easier to understand probably, but I had an opinion, that maximum time value can be different on different architectures, and perhaps this could be a problem ? do you think that I should use it ? i am not sure… I could change implementation, if you will convince me…

and you are right, again – you have described the goal of max variable very well. The height of the tree is O( log(number of events) ). it is difficult to predict how much this variable helps, but it cannot be more that by O ( log(number of events) ). in this case, i decided to prioritize speed in known speed-space dilemma. I think that I was aware of all pros and cons. What would be your decision in my place ?

thank you for great comments…