Heartbeat Monitor Configuration

This page details every option of the monitoring settings for Heartbeat Monitors. We do not recommend reading it start to finish - instead, use it as a reference when needed.

For details about alert settings - which are common to all monitor types - check this page.

For information about Heartbeat Monitors in general, check this page.

Basics

These options are the only ones that must be specified, to configure essential settings.

Monitor name

Default value: (blank)

Use this textbox to give your monitor a name. It is required, and any value up to 50 characters long is valid.

You should choose a name that makes sense to you, as it is shown throughout the console and on any alerts that are sent. The value has no other impact.

Note that this option is only present when creating a new monitor. To edit the name of an existing one, use the Rename panel on the General tab of the settings page.

Task duration

Default value: 5 minutes

Here, you should specify how long your task takes to complete.

This should be at least as long as the acceptable worst-case performance, and we recommend adding a grace period.

You should provide the duration using the textbox and dropdown. Any combination is allowed, so long as the equivalent number of seconds is:

a whole number (e.g. 0.5 minutes is allowed, 0.99 minutes is not)
no more than 86400 (24 hours, in seconds)
no less than the minimum allowed by your plan

We will only start processing the triggers when a task has not checked in this amount of time after it was scheduled.

For example, if a monitor has the default task duration of 5 minutes and uses a crontab schedule of 0 3 * * * (meaning that it fires every day at 3 am), then we would consider it late if it hasn't completed by 3:05 am.

If the duration is too short, then there is a risk of false positives. Your task may simply be running slightly behind, due to any number of non-deterministic factors, and we could consider it as having failed where it would have succeeded if given a little longer.

If the duration is too short, then it would increase the delay before alerts are sent. We will always wait until at least the length of the duration before considering a task to have failed. Depending on the value of the triggers, there may even be additional time allowed before further action is taken.

Task schedule

Default value: Interval: 1 hour

This option determines when or how often the monitor checks on your task, and should exactly match the schedule with which the task executes.

To monitor tasks running on a regular schedule, enter a valid interval using the textbox and dropdown. Any combination is allowed, so long as the equivalent number of seconds is:

a whole number (e.g. 0.5 minutes is allowed, 0.99 minutes is not)
no more than 2592000 (30 days, in seconds)
no less than the minimum allowed by your plan

The monitor will activate as soon as the first heartbeat webhook is sent by your task, and will then run at every specified time interval thereafter.

Business and Enterprise accounts can also use crontab scheduling. For this, enter any valid crontab time expression into the textbox. The monitor will run according to this timetable. Crontab expressions are always evaluated against a clock in the UTC timezone, so you may need to adjust the hour portion accordingly.

You must provide exactly one of interval and crontab.

Using an interval of less than 60 seconds will cause the monitor to consume additional monitor credits. See this page for more details.

Triggers

This configuration panel is used to control how the monitor status should be updated after each check.

After triggering, as per the schedule, the monitor first waits for the length of time defined in the task duration Next, it waits the length of time of the shortest trigger.

If the monitored task has checked in, the monitor's internal status is set to Online. Otherwise, the status is updated to the trigger value. Depending on the configuration of the incidents panel, this may cause an incident to be opened and for alerts to be sent.

If the task still has not checked in after the length of time to the next trigger, the value is updated to that. Note that this will only open an incident if one was not created previously (and if the incident configuration calls for one now), so no duplicate alerts are ever sent.

For example, if default values are used and the task schedule is a crontab of 0 3 * * * (meaning that it fires every day at 3 am), then the timeline for a failed task is:

3:00 am: The monitor starts waiting
3:05 am: The task duration has elapsed
3:06 am: The Degraded trigger fires, with the monitor status updated accordingly
3:10 am: The Down trigger fires, opening an incident, sending alerts, and updating the monitor status again

Down always takes priority over Degraded. If the Degraded trigger is set to a value greater than or equal to that of the Down trigger, only the Down trigger will ever fire.

Down

Default: 5 minutes

The value of this trigger determines under what condition the monitor's internal status should be set to Down, when a task has failed to check in.

Specifically, it occurs if we have not received the appropriate heartbeat webhook by this amount of time after the end of the task duration.

A monitor accumulates downtime while its internal status is Down or Partial.

By default, the task must be at least 5 minutes late for it to be considered Down. This is usually a reasonable compromise between allowing occasional spikes to be tolerated and sending prompt alerts.

However, you can use the dropdown to adjust this threshold according to your specific circumstances and to better suit your needs.

If the value is too short, then there is a risk of false positives. Your task may simply be running slightly behind, due to any number of non-deterministic factors, and we could consider it as having failed where it would have succeeded if given a little longer.

If the duration is too short, then it would increase the delay before alerts are sent (if the incidents configuration allows this).

This trigger always takes priority over Degraded. If the Degraded trigger is greater than or equal to this trigger, only this trigger will ever fire.

Degraded

Default: 1 minute

The value of this trigger determines under what condition the monitor's internal status should be set to Degraded, when a task has failed to check in.

Specifically, it occurs if we have not received the appropriate heartbeat webhook by this amount of time after the end of the task duration.

By default, the task must be at least 1 minute late for it to be considered Down. As this status is not usually configured to trigger an incident and send alerts, it is reasonable to active after only a short delay.

However, you can use the dropdown to adjust this threshold according to your specific circumstances and to better suit your needs.

The Down trigger always takes priority over this. If this trigger is greater than or equal to the Down trigger, only the Down trigger will ever fire.

Incidents

This configuration panel is used to determine various incident behaviours in response to a change in the monitor's internal status.

As alerts are sent when an incident is opened, it also affects when you will be notified about any potential issues.

Automatically open an incident

Default: True, for Down

This setting determines if an incident is created automatically when the monitor reaches a certain status.

Alerts (if any, as configured separately) are sent when an incident is opened.

By default, it is set to open if the monitor has an internal status of Down. This means that alerts will be sent if your task has failed, but not if it is simply experiencing performance issues.

While for most users this is likely the right balance between false positives and false negatives, you are free to adjust it to better suit you needs, by selecting the relevant checkboxes. You may also wish to review the relevant trigger thresholds.

If you choose the Disable incident management for this monitor option, an incident will never be automatically opened. As this will effectively prevent any alerts being sent, it is not recommended in most cases. However, it could be useful for monitors that just gather metrics, or when other incident management services are already in place.

Show on public status pages

Default: True

This option determines if automatically-opened incidents are added to public status pages by default.

It is always possible to show incidents (of any type) on public status pages by choosing this option from the incident details page, but by default we can also do this automatically. This allows your users to see the latest details immediately, and may reassure them that the issue is being dealt with.

However, if you wish to have stronger control over your status pages, you may deselect this checkbox. Any future automatically-opened incidents will not appear on status pages until manually added. Existing incidents will still show, until manually removed.

As with all status page messages, an incident will only ever appear on status pages that feature the associated monitor as part of their content.

This setting does not affect internal status pages, which always show all incidents relevant to their contents, to keep staff up-to-date with the latest developments.

This option only applies to incidents opened automatically and thus has no effect if Automatically open an incident is disabled.

Incidents opened manually from the dashboard allow the choice of where they should be shown, during the creation process. Incidents reported by team members are never shown on public status pages until confirmed or acknowledged manually.

Automatically resolve incidents

Default: True

This setting controls how automatically-opened incidents are resolved.

With this option enabled, as is the default, they are automatically resolved as soon as the monitor reaches an internal status of Online.

Only automatically-opened incidents are ever automatically-resolved. It has no effect on those opened manually from the console, or reported by team members.

However, if you would rather ensure that all incidents raised by this monitor are only ever resolvable manually, you can uncheck this option.

Last updated on Saturday 27th August 2022