Search This Blog

Tuesday, March 18, 2014

Increasing Read Times with Accumulo

A few weeks ago I wanted to increase performance of the Accumulo 1.5 so that scanning through large tables for information would happen a faster rate. I think I was getting around 200,000 entries per second prior to this performance modification.  I was able to increase the speed to what is now almost 3,000,000 entries per second with the following steps.

Stop Accumulo
I am able to stop the entire cluster by running the script inside the Accumulo home folder's "bin" directory.

Increase JVM Heap Space to accommodate larger Index Cache 
The Tablet server heap space is defined in the file "" located in the Accumulo home folder's "conf" directory.  Inside this folder you can see the settings for tablet server Xmx and Xms at the bottom defined as an environment variable, "$ACCUMULO_TSERVER_OPTS".  Depending on how much memory is available you will want to increase this value to support the increase we will make to the index cache next.  Here is my setting:

Increase Index Cache
In the Accumulo home folder's "conf" directory, you should also see a file called "accumulo-site.xml".  Here you can define properties for the Accumulo cluster.  I have set the cache.index.size to 512M:


I have not had any issues with tablet server memory yet, so I believe this is a good fix.  Please provide feedback and comments below.

There are various other performance tweaks as well.  Such as NOT using LVM with CentOS/RHEL, and ensuring any virtual machines in the cluster are running with "Independent Disk Mode" so writes are flushed straight to disk. 


  1. Nice description. Thanks. Love to see others in VA using Accumulo.

    1. Me too! Accumulo seems to really have gained some traction in the big data game. Last time I checked, Cloudera was planning to implement an Accumulo role into their CDH5 stack. I believe it is currently in beta.