I came across this article describing some of the common issues w/ DHCP, Improving DHCP Performance in OpenStack.
The article explains that the default lease duration is 120 seconds. In order to ensure dhcp clients (VMs) don't lose their IP, the client daemon attempts a renewal halfway through the lease duration. Which is approximately +/-60 seconds.
In my case, I have about 20 virtual machines hitting a single DHCP agent every 60 seconds.
VMs on an Openstack network (private) will not be accessible (from public) during this time, causing periods of 'lock up' for users.
The fix is to simply increase DHCP lease duration to something a bit more sane for your platforms.
In my case, I chose 600 seconds, thus givng a nice 5 minute window between lease renewals.
However, if running Juno with HA, pacemaker will maintain a running dhcp agent on any host in your cluster. This can be confusing since you will see the agent running on a node, even though the service is 'stopped'.
The fix is just to update the config on your nodes, and then disable/enable the resource via pacemaker.
# For each node in your cluster, from a controller as root run:
[root@node-54 ~]# ssh node-46 -C "echo dhcp_lease_duration = 600 >> /etc/neutron/dhcp_agent.ini"
[root@node-54 ~]# ssh node-47 -C "echo dhcp_lease_duration = 600 >> /etc/neutron/dhcp_agent.ini"
[root@node-54 ~]# ssh node-50 -C "echo dhcp_lease_duration = 600 >> /etc/neutron/dhcp_agent.ini"
[root@node-54 ~]# ssh node-51 -C "echo dhcp_lease_duration = 600 >> /etc/neutron/dhcp_agent.ini"
[root@node-54 ~]# ssh node-52 -C "echo dhcp_lease_duration = 600 >> /etc/neutron/dhcp_agent.ini"
[root@node-54 ~]# ssh node-53 -C "echo dhcp_lease_duration = 600 >> /etc/neutron/dhcp_agent.ini"
[root@node-54 ~]# ssh node-54 -C "echo dhcp_lease_duration = 600 >> /etc/neutron/dhcp_agent.ini"
Next, disable and then enable the service:
[root@node-54 ~]# pcs resource disable p_neutron-dhcp-agent
[root@node-54 ~]# pcs resource enable p_neutron-dhcp-agent
[root@node-54 ~]# neutron agent-list
+--------------------------------------+--------------------+------------------+-------+----------------+---------------------------+
| id | agent_type | host | alive | admin_state_up | binary |
+--------------------------------------+--------------------+------------------+-------+----------------+---------------------------+
| 11a5d775-7f10-4856-b905-a6176e9cb689 | Open vSwitch agent | node-46.example.com | :-) | True | neutron-openvswitch-agent |
| 14b47084-d1f5-428c-8fed-746a5d5e743f | Metadata agent | node-54.example.com | :-) | True | neutron-metadata-agent |
| 311fe911-503d-498f-a027-72414c6c8664 | DHCP agent | node-46.example.com | :-) | True | neutron-dhcp-agent |
| 326c7dcc-03b0-4830-8a4e-6b58f3af2445 | Open vSwitch agent | node-51.example.com | :-) | True | neutron-openvswitch-agent |
| 3867b579-1bc2-4aee-b9b6-5bc21f61f514 | Open vSwitch agent | node-54.example.com | :-) | True | neutron-openvswitch-agent |
| 3de48cde-8dba-4042-9a04-baa9f3d411f9 | Open vSwitch agent | node-50.example.com | :-) | True | neutron-openvswitch-agent |
| 5b85f41e-2671-4bf6-8dc5-243a0ecb55b3 | L3 agent | node-50.example.com | :-) | True | neutron-l3-agent |
| 61b3c026-99e0-4c50-a440-339b7085d428 | L3 agent | node-46.example.com | :-) | True | neutron-l3-agent |
| 6e9b071d-5417-45a2-abf6-72b2691fd464 | Open vSwitch agent | node-47.example.com | :-) | True | neutron-openvswitch-agent |
| 91e124a6-6594-4fb4-b48c-06b697cbf437 | L3 agent | node-54.example.com | :-) | True | neutron-l3-agent |
| 9881ddce-f177-4120-9019-1fc26eee19ca | Open vSwitch agent | node-52.example.com | :-) | True | neutron-openvswitch-agent |
| 9e259924-80ce-44c8-9704-8cb8ed35d751 | Metadata agent | node-50.example.com | :-) | True | neutron-metadata-agent |
| c5b75fd5-f83b-4afb-a4ff-271c92d61695 | Open vSwitch agent | node-53.example.com | :-) | True | neutron-openvswitch-agent |
| cd99597e-ead6-4eb3-a729-4bc609955ee6 | Metadata agent | node-46.example.com | :-) | True | neutron-metadata-agent |
+--------------------------------------+--------------------+------------------+-------+----------------+---------------------------+
[root@node-54 ~]# neutron agent-show 311fe911-503d-498f-a027-72414c6c8664
+---------------------+---------------------------------------------------------+
| Field | Value |
+---------------------+---------------------------------------------------------+
| admin_state_up | True |
| agent_type | DHCP agent |
| alive | True |
| binary | neutron-dhcp-agent |
| configurations | { |
| | "subnets": 5, |
| | "use_namespaces": true, |
| | "dhcp_lease_duration": 600, |
| | "dhcp_driver": "neutron.agent.linux.dhcp.Dnsmasq", |
| | "networks": 5, |
| | "ports": 39 |
| | } |
| created_at | 2015-06-04 17:35:57 |
| description | |
| heartbeat_timestamp | 2015-06-04 17:36:46 |
| host | node-46.ccri.com |
| id | 311fe911-503d-498f-a027-72414c6c8664 |
| started_at | 2015-06-04 17:35:57 |
| topic | dhcp_agent |
+---------------------+---------------------------------------------------------+
The result is that the client will repeat DHCPREQUEST until the service is back up, and start using the new lease time,
root@hannibal:~# tail -f /var/log/syslog
Jun 4 17:35:02 hannibal dhclient: DHCPACK of 192.168.111.95 from 192.168.111.2
Jun 4 17:35:02 hannibal dhclient: bound to 192.168.111.95 -- renewal in 51 seconds.
Jun 4 17:35:53 hannibal dhclient: DHCPREQUEST of 192.168.111.95 on eth0 to 192.168.111.2 port 67 (xid=0x50f4018e)
Jun 4 17:36:35 hannibal dhclient: message repeated 6 times: [ DHCPREQUEST of 192.168.111.95 on eth0 to 192.168.111.2 port 67 (xid=0x50f4018e)]
Jun 4 17:36:37 hannibal dhclient: DHCPNAK from 192.168.111.2 (xid=0x50f4018e)
Jun 4 17:36:37 hannibal dhclient: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 (xid=0x75f1e5db)
Jun 4 17:36:37 hannibal dhclient: DHCPREQUEST of 192.168.111.95 on eth0 to 255.255.255.255 port 67 (xid=0x75f1e5db)
Jun 4 17:36:37 hannibal dhclient: DHCPOFFER of 192.168.111.95 from 192.168.111.2
Jun 4 17:36:37 hannibal dhclient: DHCPACK of 192.168.111.95 from 192.168.111.2
Jun 4 17:36:37 hannibal dhclient: bound to 192.168.111.95 -- renewal in 255 seconds.
Jun 4 17:40:52 hannibal dhclient: DHCPREQUEST of 192.168.111.95 on eth0 to 192.168.111.2 port 67 (xid=0x75f1e5db)
Jun 4 17:40:52 hannibal dhclient: DHCPACK of 192.168.111.95 from 192.168.111.2
Jun 4 17:40:52 hannibal dhclient: bound to 192.168.111.95 -- renewal in 211 seconds.
Jun 4 17:44:23 hannibal dhclient: DHCPREQUEST of 192.168.111.95 on eth0 to 192.168.111.2 port 67 (xid=0x75f1e5db)
Jun 4 17:44:23 hannibal dhclient: DHCPACK of 192.168.111.95 from 192.168.111.2
Jun 4 17:44:23 hannibal dhclient: bound to 192.168.111.95 -- renewal in 262 seconds.
Jun 4 17:45:01 hannibal CRON[4249]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 4 17:48:45 hannibal dhclient: DHCPREQUEST of 192.168.111.95 on eth0 to 192.168.111.2 port 67 (xid=0x75f1e5db)
Jun 4 17:48:45 hannibal dhclient: DHCPACK of 192.168.111.95 from 192.168.111.2
Jun 4 17:48:45 hannibal dhclient: bound to 192.168.111.95 -- renewal in 249 seconds.
IMPORTANT!!!
"Attempting to work around these performance problems by significantly increasing IP lease time will cause a huge problem with respect to the release of IP addresses by neutron if your cloud loads dynamically change. By default, neutron will allocate an IP address to a VM for 24 hours, independent of the actual lease time. Also, by default, neutron will not release an IP address until 24 hours after an instance has been terminated."
- https://www.mirantis.com/blog/improving-dhcp-performance-openstack/