First results of optimization

July 17, 2008

In this post I will describe impact of new more efficient data structure on calendar part of EDS. The new data structure is our famous interval tree. Imagine, that we are maintainers of calendar data and enemy is asking us expensive questions about our data, mostly “give me events in some time range”. In the past, we took naive approach to answer questions. Now, we are doing more wisely.

On the attached picture is shown, how many events need to be scanned to evaluate query. Red color are queries BEFORE optimization – either 1271 or 346 items were scanned (I have two calendars with 1271 and 346 items and before optimization, all events are scanned to answer question). The green line shows, how many items need to be scanned after optimization.

Number of events scanned before and after optimization

Number of events scanned before and after optimization

One can notice, that there is green (optimized) query on the beginning that was doing scan through all events. This is query, in which no subexpression of from *occur-in-range* was
found – it was not question of form “give me all events in some time range”. Therefore, interval tree did not help us.

More results:

  • More numbers – how many took to evaluate query ?

    Comparison of performance - time to evaluate query

    Comparison of performance - time to evaluate query

  • How many items was tested (sum of 423 queries) ? For every tested event we had to generate recurrences.
Number of scanned events

Number of scanned events

Advertisements

8 Responses to “First results of optimization”

  1. Adam Petaccia said

    That’s quite impressive! Keep up the awesome work.

  2. Javier Kohen said

    Nice. Maybe I won’t be switching to GMail after all 😉

    Could you give some more detail about the first query?

  3. looks good! wanna have a beer at prague in august? 🙂

  4. Raphael Bosshard said

    That’s awesome. Simply awesome. Can’t wait for it. 😀

  5. slusnys said

    Thank you for your interest!

    I will write details about the queries in next post, hopefully today – I was not sure, how to assign time-data to query, so I would appreciate all comments…

  6. slusnys said

    These comments are very motivating, thank you 🙂

    Andre, if you will be here at Prague in August, we should meet and drink great czech beer, of course 🙂 But beware, czech beer is fantastic – you just can’t stop drinking it :-))

    let’s stay in touch – i’ll write you an email, see you…

  7. chenthill said

    great work!! To introduce myself, I am the current calendar maintainer. Am just curious to know where the implemention for the interval tree resides. The File backend maintains all the events as a flat ical file, where as there is also a ECalBackendCache (xml based) which other backends use. The xml based one is also a sequential cache storing all the events as strings which is bad. It would be good have it optimizied and used commonly in all backends to provide better abstraction.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: