Search This Blog

Monday, January 19, 2015

RabbitMQ handshake_timeout

Currently I am maintaining an Openstack cluster deployed via Mirantis Fuel 5.1 (Icehouse). Things were going well for a while, but at some point there were a lot of delays in requests to the APIs to perform various tasks such as creating an instance, volume, mounting, etc. This would cause failures and would regularly leave openstack objects in an inconsistent state. This is very frustrating and difficult to diagnose because you will see errors all over the place.

The issue for us was the system swappiness default setting of 60 with Centos 6. This caused a lot of messages to take longer than the rabbitmq default of 3 seconds, resulting in a timeout and failed request.

As root on all openstack controllers:
# sysctl vm.swappiness=10
# swapoff /dev/mapper/os-swap

Additionally it looks like mirantis fuel used LVM. This is likely a slower file system than ext4 native on non lvm partitioned disks.

 Also make sure you have enough RAM to disable swap. More importantly, make sure you have enough RAM for your openstack controller.


Update: This has been added to launchpad as a bug in 5.1, 6.0 and 6.1:

No comments:

Post a Comment