Front will begin disabling application webhooks that do not respond successfully

Going forward, Front will disable application webhooks that do not respond successfully. After three failed attempts, the webhook will be disabled and Front will stop sending events to your URL.

We expect this change will mostly impact webhooks set up during development that are no longer used. However, if an active webhook experiences downtime and is disabled, you can go to the Webhook feature of your developer app and click Update to re-verify and re-enable it.

Page 1 / 1

Hello Javier,

Thank you for the notice.

We have struggled working with this feature in proiduction since it has been released. Disabling a webhook after only three failures can cause unintended outages—for example, a brief Kubernetes node restart would permanently cut off events. Worse, we need a human to restart the webhook every time. Could you consider a more tolerant retry policy (e.g. exponential backoff with a longer failure window) before disabling the webhook?

In the meantime, are there any recommended best practices or alternate approaches to ensure webhook reliability in a production environment?

Best regards,

Thomas

Hi @thomas_achache,

I have talked with the team and we acknowledge that we can evaluate:

An exponential backoff with a longer failure window
Notifications when your webhook is disabled so that you can immediately take action
A means of automatically re-enabling your webhook

I suggest voting on this idea to help our Product team prioritize. Unfortunately, we don’t have alternate approaches at the moment beyond the monitoring you are doing on your end. If you suffer downtime or suspect your webhook might be invalidated, you can contact our Support team with the name of your company and app UID so they can check the status of your app’s webhooks for you. We will work to improve this in the future.

FYI, that idea is no longer active because it got merged. Can’t vote on it.

I can’t express how flabergasted I am at this policy. Not only is the limit ridiculously low, but the failure is completely silent. THEN, if you have a partnered/published app you can’t even solve this yourself. You have to message the support team and wait.

This is going to cause us and our customers (who are also Fronts customers) massive headaches. I’m trying to do the uptime math and I’m realizing that it’s basically impossible for most companies to NOT have an annual outage with webhooks.

If my math is right, a single user of the app doing just 3.5 actions per hour is enough to pretty much ensure we’ll have at least one annual outage with our webhooks.

Four nines of uptime allows for 52.6 minutes of downtime annual
If this downtime is continuous, the 3rd request will hit just before the system recovers (at 51.4 minutes)

Obviously, that example is a bit contrived, but I’m trying to show how little usage it takes to have major uptime concerns. At something like 1 request per minute, you need seven nines of uptime to avoid your webhook going down. A 1 request per second, seven nines of uptime (3 secs of downtime per year) barely keeps the webhook up.

------

I get there are dev constraints here, but can we get this bumped up at all? Even at 10 consecutive failures, the uptime situation changes pretty drastically.

Four nines puts us at ~11 requests per hour
Five nines gets us to ~2 request per minute
Six nines gets us to ~20 request per minute
Seven nines gets us to ~3 request per second

That’s not even considering the fact that a single successful request resets the window. It’s drastically easier to avoid 10 consecutive failures than just 3 failures.

Surely, if the goal is to kill off long running stale dev instances firing off bad requests, it’s okay to wait just a tiny bit longer to block them.

Hi @wesley_harding_conveyor,

Please vote on https://front.ideas.aha.io/ideas/PRD-I-7941, which is the idea that the other votes got merged into. The team is evaluating how this can be improved.

One point of clarification: webhooks that belong to published apps do not get disabled automatically, so if your app is intended for our App Store it won’t have this problem. We recognize that there are many webhook apps in private customer instances that still serve a production and not testing purpose, so the improvements should still be evaluated for those cases.

Thanks for the clarification!

FYI, I don’t have access to that link, either.

@wesley_harding_conveyor thanks for letting me know. Our Product team is investigating the configuration of the idea, but in the meantime I cast a proxy vote for you in the backend.

The https://front.ideas.aha.io/ideas/PRD-I-7941 idea should be visible now.

Hi @Javier,

We’ve experienced another production incident due to the webhook disabling policy. While adding notifications is a step in the right direction, it doesn’t fully address the issue.

The current failure window is too narrow and it will eventually have to be extended. For example, by disabling webhooks only after several hours without a 200 response, possibly using exponential backoff.

I understand that implementing this change will take time, but in the meantime, it’s difficult to justify repeated interruptions to production workloads.

I’d strongly recommend completely rolling back the current disabling mechanism until a more balanced policy is in place. Please let us know what can be done.

Thanks,

Thomas

Hi @thomas_achache

Thanks for the thoughtful feedback. I’m really sorry to hear this has caused another production incident. We're working on improvements that include extending the retry window before disabling webhooks and adding proactive notifications. While we can’t fully roll back the current policy, I think we can find a solution that will meet your needs.

Reply

Sign up

Use your Front credentials

Login to the community

Use your Front credentials

Scanning file for viruses.

This file cannot be downloaded