Auto Remediation with Zabbix and Ansible Tower Part 2

In this part we are going to setup Zabbix to detect when a new instance doesn’t have the zabbix agent installed and running to inform our Ansible Tower sever to run the job template to install and configure it.

Lets create our first trigger action in Zabbix navigate to configuration, Actions, make sure we change the Event Source to Triggers and click on Select Action

I am going to call it Install Linux Zabbix Agent and then set the Type of Calculation to And so every condition we set must be met.

The Conditions I want met will be

Host Group equals Discovered hosts and Linux_servers

Add a trigger equals and then click on select make sure you change the group to be Template/Operating Systems and then set the Host to Template OS Linux and then finally select the Zabbix agent on Template OS Linux is unreachable for 5 minutes

Our Condtions should now look like this

Next we need to set the operation to take when the condition is met so click on the operations tab under operations click on new

I set the Default operation step duration from 1 hour to 1 minute (1m or 60s) but set this to whatever time period you want.

Set the Operation Type from Send Message to Remote Command

Target List click on new and select current host

Select Type to custom script

Make sure Execute is on Zabbix Server

In the commands we want to add in the curl command to the tower job api like my example ( Change the tower url to your tower server and the Bearer Token to your application token you generated from the previous blog post, the job_template id needs to reflect the tower job you created to install the agent mine is 123 so set this correctly, for the limit we have set this to “{HOST.CONN}” this is a Zabbix Variable that will pass the name of the server or IP to Tower and is the exact reason we want to use Zabbix as the dynamic source for this and finally the credentials we want the credential id set here in Tower do the same you did for the job template but on the machine credential and add it in here. ( If you need 2 credentials i.e machine and vault and the machine id is 11 and vault id is 7 then it would look like “credentials”: [11,7] )

/usr/bin/curl -kH “Content-Type: application/json” -H “Authorization: Bearer i1YwMiZSmYv1bucwIRYmkHPct8TMmV” -X POST -d ‘{ “limit”: “{HOST.CONN}”, “credentials”: [11]}’}’

Add a new condition set to Event Acknowledged equals Not Ack

The result should look similar to this

Now finally click add

Make sure it shows as enabled

Now any new server that is detected as Linux that doesnt have a running agent will auto instruct Tower to install it for us.

For the Windows Agents we want to set the above up but change the trigger name to Install Windows Zabbix Agent

The Condition host group should equal Discovered Hosts and Windows_Servers

And the Trigger should be Template/Operating Systems and then set the Host to Template OS Windows and then finally select the Zabbix agent on Template OS Linux is unreachable for 5 minutes

On the Operation custom script you will need to change the credential ID to the windows one.

Other thoughts for the agent install

In the playbook I want to make sure that when I install the agent on a discovered host I go ahead and remove it from the discovered hosts, this now allows me to have another job template to use if an agent stops talking to any server that isn’t new, as our trigger conditions specifically looks for the group “Discovered hosts”. I also update the name of the box from the discovered IP to the hostname, Now there is potential problem when you change the hostname, if the server has not been setup correctly and the hostname is localhost, this will break our ability to then use delegate_to: localhost on future runs as the inventory will redirect you to that box and not the tower instance. So I put a conditional wrapper around it and this is what I have at the end of my agent playbook.

when: “‘Discovered hosts’ in group_names and ‘localhost’ not in ansible_hostname”

The when statement says this will only run if we are in Discovered hosts group and our name is not localhost

I also have another task that is optional to email me if a discovered box is detected to have the hostname of localhost so I can resolve it.

This ends part 2 and we now have Zabbix detect new boxes, assign them to the relevant group windows or Linux, then detect if we have an agent installed and running and if not to auto request Ansible Tower takes care of it for us.

In the next part we will explore a few auto remediation type of jobs and how to set those up.

Leave a Reply

Your email address will not be published. Required fields are marked *