Proactive monitoring with Monit + Sengrid + Slack

The monitoring of resources is something essential to have visibility into the health of your system. There are many software solutions out there and one of my favourites is Nagios, but it requires an investment of time and knowledge.
When it comes to finding an agile and flexible solution, I would choose Monit. The syntax of configuration is easy and does not require a complex setup to have it running.

What is Monit?

Monit is a small Open Source utility for managing and monitoring Unix systems.

What can Monit do?

When it detects a problem it can send you alerts (as most solutions do). But this is not the most important thing. Monit can act if an error situation occurs and can restart services, execute custom scripts, etc. This makes Monit a proactive monitoring tool and therefore has always been among my favorite tools.

With Monit you get out of the box:

  • Automatic email alerts at event triggers
  • Automatic process maintenance
  • Capability to act on out-of-bounds values for CPU, RAM, storage and more
  • Monitoring of running services, and the ability to start, kill or restart them
  • Web and CLI interfaces for status monitoring

This post does not aim at covering everything that can be done with Monit. The official documentation is enough and there are numerous sites to extend this information, so I will focus on how we use it here at Geoblink.

Send email alerts

This is the default behavior but it all depends on how you configure it. For example, if we want to send an alert when the system memory is above 80%, we would put something like this:

 

if memory > 80% then alert

The alert will be sent to the email address that we have configured in the main section and this configuration could be something like:

set mail-format {from: monit-alerts@domain.com}
set alert sysadmin@domain.com

You can also send an alert to another email address for a particular event:

check host webserver with address app.domain.com
  if failed port 80  protocol http request / with timeout 1 seconds
  	then alert webadmin@domain.com

Configuration to relay emails through Sendgrid

We delegate all emails to Postfix, installed on every host, which is responsible for sending them through Sendgrid. Postfix is a very popular open source Mail Transfer Agent (MTA) that can be used to route and deliver email on a Unix system. Then we configured Postfix to communicate with Sendgrid and relay the email, so we avoid maintaining a dedicated mail server.
Instead of Sendgrid you could use any other similar service like Maildrop, Mailchimp, etc. but in our case Azure offers us this service I am very happy about.
The configuration section for Postfix would be like this:

relayhost = [smtp.sendgrid.net]: 587
smtp_tls_security_level = encrypt
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = static:azure_user@azure.com:pass
smtp_sasl_security_options = noanonymous
Email alert by monit

An example of alerts that arrive to us by email

Send alerts to Slack

Instead of email, sometimes is useful to to receive alerts to the Slack team channel.
For development and testing machines we just send emails, but for every production machine we also send slack messages.

In order to send messages to Slack then start by setting up an incoming webhook integration in your Slack team. The steps are as follows:

  • Go to https://<your-team>.slack.com/apps/manage/custom-integrations
  • Click Incoming WebHooks -> Configuration
  • Select an existing channel or create a new one (e.g. #monit-alerts) – you can change it later
  • Click Add Incoming WebHooks integration
  • Copy the Webhook URL

 

And now let’s create a script that uses curl to POST a message to a channel on Slack.
At Geoblink we have developed 2slack, which is a small tool that helps us for this purpose.
After installing 2slack the monit configuration is as follows:

if memory> 80% then alert
if memory usage > 80% then exec "/opt/bin/2slack"
else if succeeded then exec "/opt/bin/2slack"

Therefore, when the previous event happens we will see in our channel something like this:

monit2

Being proactive

Our monitoring system alerts to all the right people, the right teams, and is able to fix what was broken. Not only sends alerts, but also fixes, and this is what makes it proactive.
In some cases a service restart would be enough to fix the problem:

check process nginx with pidfile /var/run/nginx.pid
   start program = "/etc/init.d/nginx start"
   stop program = "/etc/init.d/nginx stop"
   if failed port 80 protocol http then restart

In other cases, such as lack of disk storage, other actions could be done. For example add more space and/or erase some data. And this could be done for example through a cleanup script:

check filesystem rootfs with path /
  if space usage > 90% then alert
  if space usage > 90% then exec "/opt/bin/cleanup"

To conclude, Monit offers everything we need in Geoblink to monitor our infrastructure. With little investment in the configuration is able to automatically fix most problems, do automatic maintenance and combined with Slack we get the alerts where we want.

Useful links:

Monit Documentation

Sendgrid

Postfix

Slack Webhooks

Geoblink2Slack

Monit Slack Notification

By José Beneyto