VM Connectivity on POD Alpha

Incident Report for Swisscom DCS+

Postmortem

Dear Customer,

We have recently experienced a service disruption affecting network connectivity of some VMs in our DCS cloud platform. 

The issue started on Monday 25.08.2025 at approximately 15:00 CEST and was caused by a bug in the underlying NSX virtual networking platform that lead to blocked network ports on a subset of VMs. The affected servers could not establish network connectivity, impacting both internal and internet connectivity. This is a known issue by Broadcom (Broadcom KB - Article ID: 405110). Our team resolved the problem through a carefully planned and graceful reboot of the NSX cluster nodes, ensuring no further impact, followed by manually unblocking the ports of the remaining systems. After verifying all network services, the platform was fully operational again on Tuesday 26.08.2025 by 16:30 CEST.

The root-cause analysis is still ongoing together with Broadcom to understand the details of this incident and implement measures that will prevent similar occurrences in the future. 

Apologies for the inconvenience and thank you for your patience.

Posted Sep 02, 2025 - 10:20 CEST

Resolved

No further issues have been detected. Root-cause analysis is ongoing together with Broadcom. Closing this incident.
Posted Aug 29, 2025 - 14:27 CEST

Monitoring

The fix has been implemented for the affected VMs, and we have received user confirmations that the systems are functioning normally again. We continue to monitor the situation. Please feel free to reach out to us if you are still experiencing any issues.
Posted Aug 26, 2025 - 13:56 CEST

Identified

Our engineering team, in collaboration with Broadcom, has identified the root of the issue: blocked VM port state on the NSX Manager. A fix is currently being rolled out to all affected virtual machines, and our support team is proactively contacting customers who have submitted tickets.

We are continuing our investigation to determine the root-cause of the problem.
Posted Aug 26, 2025 - 13:33 CEST

Investigating

Dear Customer,

We have detected a partial disruption of the DCS cloud connectivity. The following services are impacted;

A limited number of virtual machines (VMs) on POD Alpha have been experiencing connectivity issues since the afternoon of 25.08.2025, affecting both internal and external network traffic. If you suspect one of your VMs may be affected, please contact the DCS Service Desk for assistance.

Our engineering team is actively working on restoring full service. We will continue to update this incident with the latest information regarding the remediation process.

We sincerely apologize for the inconvenience and appreciate your understanding.

Kind regards,
Dynamic Computing Services Team
Posted Aug 26, 2025 - 09:34 CEST
This incident affected: Connectivity (Internet Connectivity).