What does it bring.
Opsgenie is the glue that holds Alert and Incidents together, by allowing all tools to stay in sync during the life-cycle of the alert or incident
Ansible is the I.T automation glue that holds the configuration / orchestration management together
We are rapidly increasing the number of services per head count in the on demand era and both tools show no issues in meeting the challenge of an increased workload, but they have there own area of expertise, yet if we take a thousand foot view its clear having them work together we can achieve great things.
When an alert is generated, based on the source of the alert we can automatically trigger Ansible to run any manor or playbooks or roles, for auto-remediation or to enrich the alert with facts.
When we break down the core basics of an alert, we typically have 3 items.
Instance, Resource and Value.
For example if we monitor a Web Server we could expect to get an alert like one of the following
Service httpd has stopped on webserver01.example.com
This example clearly tells us the web service is in a stopped state, The reason for it to be stopped is certainly unknown at this point, and simply a start service may not resolve it so further information is needed. And if it does resolve from starting the service then what caused it to happen? we should document it to stop it happening again.
webserver01.example.com has less than 20% free disk space at /var/www/mysite
This example again looks clear we are now below an acceptable value for disk space but we don’t know why or what to remove from this, it could be cached files or maybe someone uploaded a large file to the location, investigation is needed before we can make a decision on what to do be it delete files or increase the disk size.
How OEC can assist
Opsgenie Edge Connector (OEC) allows us to execute scripts automatically based on alert actions ( Create, Acknowledge, Close etc) or the ability to trigger custom actions so we can use Ansible to further enrich the alert payload by initiating an Ansible playbook to gather information from the instance or provider and report it back to the Alert ID in Opsgenie.
We can send data back to the Opsgenie alert in 4 usable ways.
These are key value pairs and can be overwritten on subsequent runs, ideal to see running services and have it update to see if a change happened. We can also use these as variables in additional Ansible Playbook runs.
Which means we can go from this
These stay static to the alert and cannot be deleted, they are great if you want information to be passed to other integration points examples could be chat-ops like Slack or a ticket tool like Service Desk.
If we generate files like CSV or HTML then we can upload them to be viewed. By leveraging Ansible-CMDB we can add full rich data to view to the ticket like so.
These are actually details but should be noted as a separate type, as they will redirect you to another endpoint.
It doesn’t have to be all fully automated custom actions can be triggered from the Portal or mobile application to run when you decide to run then and can be restricted to certain users.
Opsgenie is a SAAS tool which is based in the cloud, yet many integration points could be on premise or in a private cloud, OEC does not require you to open up any ports to endpoints. All traffic to OEC is performed over HTTPS TLS v1.2
We can also make sure all playbooks executed by OEC will be logged for audit and compliance needs.