Performance degraded & Syst...

Resolved
Mar 30, 2026 at 4:45pm UTC

Post-Mortem: Service Disruption – March 30, 2026

Duration: 8:00 AM – 4:36 PM CEST
Severity: Major
Affected: app.langdock.com
Status: Resolved

What Happened

On March 30, 2026, Langdock experienced a major service disruption caused by a platform-level degradation on Azure Container Apps (ACA). Containers that normally start in under 2 minutes took over 15 minutes during the incident, preventing our autoscaler from bringing new capacity online and leading to a full platform outage from approximately 11:30 AM CEST.

Timeline (CEST)

8:00 AM — Performance degradation detected
9:15 AM — Issue identified; hotfix deployed
9:43 AM — Performance stabilised, partial recovery
9:55 AM — Downstream issues discovered; investigation continues
10:45 AM — Core platform restored; workflows & file uploads still affected
11:30 AM — Full platform outage begins due to cascading downstream effects from increased startup time of ACA.
12:11 PM — Platform taken fully offline for controlled reboot
3:34 PM — Platform restored and accepting traffic
4:36 PM — Workflows and file uploads fully restored
4:37 PM — Incident fully resolved

Root Cause

Azure Container Apps experienced a platform-level degradation affecting container startup times. The same container image started in under 30 seconds on AKS, confirming the issue was with ACA and not our application.

Resolution & Next Steps

We relaxed startup probes, raised the minimum container count, and temporarily disabled readiness/liveness probes to restore service. We are now implementing improved failover paths, improved startup monitoring, and better baseline capacity provisioning to prevent a recurrence.

We sincerely apologize for the disruption. Please reach out to support@langdock.com with any questions.

Updated
Mar 30, 2026 at 3:32pm UTC

We will follow up with a post mortem on this incident here. Thank you for your patience today!

Updated
Mar 30, 2026 at 1:32pm UTC

We have completed the system reboot and the platform is accepting traffic again. We continue to monitor the platform status very closely.

Updated
Mar 30, 2026 at 12:43pm UTC

We are making progress with the full system reboot. Thanks for your patience!

Updated
Mar 30, 2026 at 12:12pm UTC

We continue with the full system reboot. Thanks for your patience!

Updated
Mar 30, 2026 at 11:46am UTC

We continue with the full system reboot. Thanks for your patience!

Updated
Mar 30, 2026 at 11:31am UTC

We continue with the full system reboot.

Updated
Mar 30, 2026 at 11:01am UTC

We continue with the full system reboot.

Updated
Mar 30, 2026 at 10:37am UTC

We continue to reboot the system.

Updated
Mar 30, 2026 at 10:11am UTC

The platform is currently completely shut down as we attempt a full system reboot.

Updated
Mar 30, 2026 at 9:37am UTC

We continue to investigate degraded performance and high error rates across our services.

Updated
Mar 30, 2026 at 8:45am UTC

We have stabilised the core platform. Most parts of the platform are available again, we are now working to bring workflow executions and file uploads back.

Updated
Mar 30, 2026 at 7:55am UTC

We stabilised parts of the platform but discovered downstream issues with our LLM services. We continue to investigate.

Updated
Mar 30, 2026 at 7:43am UTC

We have deployed a fix. The performance is back to normal levels.

Updated
Mar 30, 2026 at 7:15am UTC

We have identified the underlying issue and are deploying a fix.

Created
Mar 30, 2026 at 6:00am UTC

Performance of the platform is currently degraded.