Intro to Bitmap Index

Intro to Bitmap Index: what is it and when to use

There are bunch of different indices in the database world, you are probably most familiar with B-tree index. Different kind of index will have great impact on performance, bitmap index can be very helpful in some use case.

 


Case

Suppose Columbia university has a following table for the database course:

screen-shot-2016-10-12-at-11-18-40-am

And suppose the course is so popular that millions of people take it, so the table is going to be huge.

Now we want to run the following SQL statement:

SELECT * FROM table WHERE Gender='male' AND Pass='false';

 


No index on ‘Pass’ and ‘Gender’ column

If we don’t use any index, the process will be scanning the whole table record by record million times and return the result set. But I’m sure we can do better than this.

 

If we don’t use any index, the process will be scanning the whole table record by record million times and return the result set. But I’m sure we can do better than this.

 

B-tree index on ‘Pass’ and ‘Gender’ column

So everyone knows index is going to help searching in database by sorting the entries. Is it going to faster if we put index on those columns we care about?

Not necessarily.

Index is not magic. If 50% of the records in table is ‘Gender=male’ and 50% of the records is ‘Pass=true’. Then the sql statement is going to return a very large dataset, maybe 30% of the whole data. In this case, the percentage of result set is considerable large compares to the original table, than the database optimizer will choose not to use the index even you have the index built.

Most probably, b-tree index is going to end up the same path as no index situation, scan the whole table record by record.

 

B-tree index on ‘Pass’ and ‘Gender’ column

So we know B-tree index is not going to help much on low cardinality situations. Here comes the use case of bitmap index.

We create one set of vectors for each of the column we care, in this case, ‘Gender’ and ‘Pass’. So now we have the 2 following set of vectors:

screen-shot-2016-10-12-at-11-48-36-am

When we try to execute SELECT * FROM table WHERE Gender='male' AND Pass='false';, it will do a vector AND operation:

screen-shot-2016-10-12-at-12-05-27-pm

Voila.

 


When to use bitmap index

  1. When the column you want to search on has low cardinality (gender, marriage status, etc).
  2. When the data in the column is kind of static, at least not updated frequently. This is because when we try to update a column with low cardinality, it might take a long time. During the lengthy time, the table is locked and the other user can’t do anything.

 

* One thing to notice is MySQL is not supporting bitmap index for now (the only database that is supporting bitmap index I know about is Oracle database) , it’s in there work log for future feature though: https://dev.mysql.com/worklog/task/?id=1524

Leave a Reply

Your email address will not be published. Required fields are marked *