Outlook and Zendesk Scheduler Extension Loading issue
Incident Report for Cronofy
Postmortem

On Wednesday 12th July between 14:00 and 16:15 UTC users of the Cronofy Scheduler extensions for Outlook and Zendesk would be unable to access the extension. Other instances of the Scheduler, such as the Chrome Extension, Integrations such as with Greenhouse, and the web version of the Scheduler, continued to operate normally.

The underlying cause was that the Cronofy Outlook add-in and Zendesk App were not manually validated during the release of a change to the Scheduler extension.

In line with our principles, we are publishing this public post-mortem to explain why this happened, and what we will do to prevent this occurring again.

Timeline

Times are from Wednesday 12th July 2023, in UTC and rounded for clarity.

At 14:04 we deployed an update to our extensions. This had gone through our normal request and review process.

At 15:49 one of our customers reported that they were unable to use the Outlook add-in to create a scheduling request. The customer observed a spinning progress wheel, and the Scheduler form did not load.

At 15:54 our support engineers replicated the issue in their own Outlook add-in, and escalated the issue internally to our first responder.

At 16:08 our engineering team located the problem, and identified the original change that caused the problem.

At 16:10 we reverted the change, and deployed this immediately. We checked this internally to verify that this deployment corrected the problem, and the Zendesk and Outlook extensions were working again.

At 16:20 the customer confirmed that the issue was resolved.

Retrospective

We ask three primary questions in our retrospective:

  • Could we have resolved it sooner?
  • Could we have identified it sooner?
  • Could we have prevented it?

The root cause for this issue is twofold. Firstly, this area is difficult to create automated tests around, as it requires the extension to be loaded inside of Outlook or Zendesk to trigger. Secondly, and more importantly, given that we know about the lack of automated tests, we failed to manually test this change to the loading process of extension using the Outlook add-in or Zendesk App. There is a different build process that affects the Outlook and Zendesk versions of the extension, where the extension is loaded in a different way. This alternate loading method triggered a bug that did not exist in the other extensions.

Once we were made aware of this issue by our customer, we resolved it in under 30 minutes. We don’t feel we can improve our response time, but we see having to be notified by a customer as a failure.

From an identification perspective, we should have identified this ourselves by checking the Outlook or Zendesk extensions once we had deployed the change manually. We favour preventing the issue over earlier identification. In the future, we could have an event that triggers in the extension if the scheduler form fails to load, and informs a separate errors service.

We feel that with some small improvements to the guidance we give our engineers, can prevent an issue like this from happening again.

Actions to be taken

  • We will ensure that engineers are familiar with the differences between the extension build processes, making it clear which areas require manual testing. We will also cover what to be aware of when publishing changes that affect multiple different platforms at the same time.
  • We will create internal guidance listing all the extensions, and how to properly check each extension.
  • We will add an additional hint to our pull request template when extension files are being changed which specifically calls out to the engineer creating the PR and the engineers reviewing it that they should examine the impact to all extensions.

We have considered adding more automated testing to this area of the solution, and we plan on discussing this in more detail within the department. Tests in this area have historically given a poor return on investment.

Further questions?

If you have any further questions, please contact us at support@cronofy.com

Posted Jul 13, 2023 - 16:05 UTC

Resolved
From 14:07 UTC - 16:08 UTC, Scheduler Extensions (such as Outlook and Zendesk) were not able to load the Scheduler, instead showing only the loading spinner.
Scheduler integrations (such as Greenhouse and Workday) were unaffected.

This was due to an update being deployed which expected some data only available on the Scheduler website but not in the extensions, which was not caught by our tests or QA before release. We apologise for any inconvenience and will be improving our processes to be more rigorous.
Posted Jul 12, 2023 - 15:00 UTC