Performance degraded & System outage
Resolved
Mar 30, 2026 at 4:45pm UTC
Post-Mortem: Service Disruption – March 30, 2026
Duration: 8:00 AM – 4:36 PM CEST
Severity: Major
Affected: app.langdock.com
Status: Resolved
What Happened
On March 30, 2026, Langdock experienced a major service disruption caused by a platform-level degradation on Azure Container Apps (ACA). Containers that normally start in under 2 minutes took over 15 minutes during the incident, preventing our autoscaler from bringing new capacity online and leading to a full platform outage from approximately 11:30 AM CEST.
Timeline (CEST)
8:00 AM — Performance degradation detected
9:15 AM — Issue identified; hotfix deployed
9:43 AM — Performance stabilised, partial recovery
9:55 AM — Downstream issues discovered; investigation continues
10:45 AM — Core platform restored; workflows & file uploads still affected
11:30 AM — Full platform outage begins due to cascading downstream effects from increased startup time of ACA.
12:11 PM — Platform taken fully offline for controlled reboot
3:34 PM — Platform restored and accepting traffic
4:36 PM — Workflows and file uploads fully restored
4:37 PM — Incident fully resolved
Root Cause
Azure Container Apps experienced a platform-level degradation affecting container startup times. The same container image started in under 30 seconds on AKS, confirming the issue was with ACA and not our application.
Resolution & Next Steps
We relaxed startup probes, raised the minimum container count, and temporarily disabled readiness/liveness probes to restore service. We are now implementing improved failover paths, improved startup monitoring, and better baseline capacity provisioning to prevent a recurrence.
We sincerely apologize for the disruption. Please reach out to support@langdock.com with any questions.
Affected services
Updated
Mar 30, 2026 at 3:32pm UTC
We will follow up with a post mortem on this incident here. Thank you for your patience today!
Affected services
Updated
Mar 30, 2026 at 1:32pm UTC
We have completed the system reboot and the platform is accepting traffic again. We continue to monitor the platform status very closely.
Affected services
Updated
Mar 30, 2026 at 12:43pm UTC
We are making progress with the full system reboot. Thanks for your patience!
Affected services
Updated
Mar 30, 2026 at 12:12pm UTC
We continue with the full system reboot. Thanks for your patience!
Affected services
Updated
Mar 30, 2026 at 11:46am UTC
We continue with the full system reboot. Thanks for your patience!
Affected services
Updated
Mar 30, 2026 at 11:31am UTC
We continue with the full system reboot.
Affected services
Updated
Mar 30, 2026 at 11:01am UTC
We continue with the full system reboot.
Affected services
Updated
Mar 30, 2026 at 10:37am UTC
We continue to reboot the system.
Affected services
Updated
Mar 30, 2026 at 10:11am UTC
The platform is currently completely shut down as we attempt a full system reboot.
Affected services
Updated
Mar 30, 2026 at 9:37am UTC
We continue to investigate degraded performance and high error rates across our services.
Affected services
Updated
Mar 30, 2026 at 8:45am UTC
We have stabilised the core platform. Most parts of the platform are available again, we are now working to bring workflow executions and file uploads back.
Affected services
Updated
Mar 30, 2026 at 7:55am UTC
We stabilised parts of the platform but discovered downstream issues with our LLM services. We continue to investigate.
Affected services
Updated
Mar 30, 2026 at 7:43am UTC
We have deployed a fix. The performance is back to normal levels.
Affected services
Updated
Mar 30, 2026 at 7:15am UTC
We have identified the underlying issue and are deploying a fix.
Affected services
Created
Mar 30, 2026 at 6:00am UTC
Performance of the platform is currently degraded.
Affected services