EDS and it’s cache – analyze this

May 22, 2008

The story

Evolution data server is process living (unnoticed if you are lucky 🙂 ) in your Desktop.  Besides other things, it takes care of calendar data. Clients (programs, that need calendar data – Evolution, clock applet…) send questions (called queries) to e-d-s. Typical query looks like “give me all events in some time range”.

At some point during Evolution’s development, developers informally determined that calendar queries to e-d-s were slow.  Nobody took the trouble of actually figuring out why they were slow, so someone added a simple cache instead. Now it’s time to do more optimization.

The task

The task is to find out, if the cache is useful ( taking into account, that it consumes some memory). That is, we have to answer these questions:

  1. Which queries get performed, and how often ?
  2. What is the lifetime of the cached queries?
  3. How long do queries take to be performed?
  4. Is the cache useful ?

Our approach

Together with my great mentor Federico, I have added some logging subsystem to to e-d-s. It logs each calendar query. I have implemented Python script, that can analyze this log and sort queries to various criteria and produce some gnuplot graphs. Following graphs were generated by the script in one of my typical workdays.

1. Which queries get performed and how often ?

First graph shows how  the cache size is changing (green line) during the day. The cache size is the number of different queries stored in cache. Query gets into cache, when some client wants e-d-s to execute (answer) it, and the query is not found in the cache yet. The peaks correspond to massive usage of Evolution (I was browsing through the calendar). After 13:00 I opened the Evolution and closed  it  before 14:00. The next two peaks correspond again to opening/closing Evolution. The red bar shows, how many queries e-d-s answered without executing it – results were already cached. Obviously, this number can be greater than cache size, because some queries are re-executed several times.

Ananalysis of cache

Ok, so we would like to know how many times were individual queries used. Are the queries used uniformly or small part of queries are used again and again ?

There were almost 370 different queries, from which 180 was not reused  while sitting in cache. The mostly used query was reused 9 times.

2. What is the lifetime of the cached queries ?

The ambition of the following graph is to gain insight into queries lifetime. As can be seen from the picture, there were about 320 different queries. Six queries lived more than 10000 secs, majority of them lived about 4000 secs and more than 100 queries lived just for several seconds. As can be noticed from the picture (if it is not too small), about 30 queries (at the rear of the graph) have negative lifetime values – these queries were not destroyed at all. The queries coming from alarm daemon fall into this category.

Liffetime of queries

3. How long do queries take to be performed ?

This is great opportunity to present another graph.

As can be seen from the graph, it depends on backend that we are using. In case of file backend (first two columns) the execution of query is really fast. In the last two columns there are values for google calendar backend. It takes 0.01 secs to execute the query in average and 0.1 sec in worst case.

4. Is the cache useful ?

So what do you think ?  In my opinion the cache is useful, mainly when we expect to work with remote calendars. Perhaps we should ask, if it is worth the memory, that it consumes ( I’m not trying to say that it consumes too much memory, I do not know, but some developers have this opinion ).

Advertisements

3 Responses to “EDS and it’s cache – analyze this”

  1. kraptor said

    Great research work!

  2. eTM said

    If you repeat the tests with the cache entirely removed, maybe you gain some insight wether the cache really is useful or not.

  3. Javier Kohen said

    Hi. I know you just started with this, but I think you can improve these graphs substantially.

    It’s impossible to say how big an impact the cache is making unless you present relative numbers in relation to total number of queries and time spent in them. In other words, we don’t know how many cache-misses are happening in your test cases. Also you might want to make these tests documented and reproducible by others, because at some point you will want to verify that your changes will have a positive impact.

    On the other hand, your first graph has already sparked some ideas in my mind ― I gather that the cache only handles a significant number of queries a handful of times a day. In this case having it swap to disk after a moment of inactivity would have a big impact in its memory footprint. You could actually have two levels of cache: a very small memory-based one for more frequent queries (from the clock applet, perhaps?) and a larger disk-based to answer those peaks shown in the first graph.

    Cheers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: