The story

Evolution data server is process living (unnoticed if you are lucky 🙂 ) in your Desktop.  Besides other things, it takes care of calendar data. Clients (programs, that need calendar data – Evolution, clock applet…) send questions (called queries) to e-d-s. Typical query looks like “give me all events in some time range”.

At some point during Evolution’s development, developers informally determined that calendar queries to e-d-s were slow.  Nobody took the trouble of actually figuring out why they were slow, so someone added a simple cache instead. Now it’s time to do more optimization.

The task

The task is to find out, if the cache is useful ( taking into account, that it consumes some memory). That is, we have to answer these questions:

  1. Which queries get performed, and how often ?
  2. What is the lifetime of the cached queries?
  3. How long do queries take to be performed?
  4. Is the cache useful ?

Our approach

Together with my great mentor Federico, I have added some logging subsystem to to e-d-s. It logs each calendar query. I have implemented Python script, that can analyze this log and sort queries to various criteria and produce some gnuplot graphs. Following graphs were generated by the script in one of my typical workdays.

1. Which queries get performed and how often ?

First graph shows how  the cache size is changing (green line) during the day. The cache size is the number of different queries stored in cache. Query gets into cache, when some client wants e-d-s to execute (answer) it, and the query is not found in the cache yet. The peaks correspond to massive usage of Evolution (I was browsing through the calendar). After 13:00 I opened the Evolution and closed  it  before 14:00. The next two peaks correspond again to opening/closing Evolution. The red bar shows, how many queries e-d-s answered without executing it – results were already cached. Obviously, this number can be greater than cache size, because some queries are re-executed several times.

Ananalysis of cache

Ok, so we would like to know how many times were individual queries used. Are the queries used uniformly or small part of queries are used again and again ?

There were almost 370 different queries, from which 180 was not reused  while sitting in cache. The mostly used query was reused 9 times.

2. What is the lifetime of the cached queries ?

The ambition of the following graph is to gain insight into queries lifetime. As can be seen from the picture, there were about 320 different queries. Six queries lived more than 10000 secs, majority of them lived about 4000 secs and more than 100 queries lived just for several seconds. As can be noticed from the picture (if it is not too small), about 30 queries (at the rear of the graph) have negative lifetime values – these queries were not destroyed at all. The queries coming from alarm daemon fall into this category.

Liffetime of queries

3. How long do queries take to be performed ?

This is great opportunity to present another graph.

As can be seen from the graph, it depends on backend that we are using. In case of file backend (first two columns) the execution of query is really fast. In the last two columns there are values for google calendar backend. It takes 0.01 secs to execute the query in average and 0.1 sec in worst case.

4. Is the cache useful ?

So what do you think ?  In my opinion the cache is useful, mainly when we expect to work with remote calendars. Perhaps we should ask, if it is worth the memory, that it consumes ( I’m not trying to say that it consumes too much memory, I do not know, but some developers have this opinion ).

Advertisements

Hello friends,

so I’ve been accepted onto the Google Summer of Code 2008 program to work on optimizing calendar part of EDS. The main goal of this blog is to discuss my work (and life) with you. So, feel free to leave any comments.

Interesting thoughts about EDS, that directly apply to my work can be found at blogs of Ross, Philip and Federico.

You can see my commits at my github repository.

My first task is to analyze performance of calendar queries cache. These days I am preparing patches to log every calendar query. If you use calendar with Evolution, I would appreciate, if you could provide me e-d-s logs to analyze. Please, contact me, if you are interested.

As I live in Prague, I am looking forward to meeting at least some of you at Ubuntu Developer Summit next week.