Lost AMI connection caused failover

Configuring synchronization, sensors, and optimizations
Post Reply
User avatar
CRM User
Posts: 174
Joined: Sun Nov 27, 2016 3:41 pm

Lost AMI connection caused failover

Post by CRM User » Thu Apr 05, 2018 1:17 pm

This morning I found my cluster had failed over to the backup data center (during the night). Around that time I see an error in the haast log that the remote peer lost connection to the AMI. But, I can confirm that Asterisk was still running.

I sent you my Asterisk logs and HAAst logs by email. Can you confirm what caused the failover?
Account for questions transferred from CRM system
User avatar
Telium Support
Posts: 234
Joined: Sun Nov 27, 2016 3:27 pm

Re: Lost AMI connection caused failover

Post by Telium Support » Thu Apr 05, 2018 1:22 pm

Based on the Asterisk full message log received, it appears that your Asterisk process was hung for almost 30 seconds. As proof, you have a number of plug-ins/dialplan add-ons that trigger log messages at least once per second. Notice that at 2:38am all messages stopped for almost 30 seconds? Something was blocking IO/CPU to the Asterisk process.

Five seconds after the Asterisk process hung HAAst deemed the peer to be non-responsive and initiated a failover. (This is correct behavior on the part of HAAst - something was going wrong on your PBX).

You need to trace down the root cause of Asterisk hanging for almost 30 seconds. Look for badly written backup scripts, IO or CPU intensive jobs scheduled for this time, etc. Do a grep search through all of your system logs around that time for clues as to what else was happening on your system.

(Hint: Looking at your Asterisk log file you appear to have added a new plug-in in the last 2 days)
Post Reply