We have MFS2600KI Compute Modules all running ESXi 5.5. There have been intermittent issues where a catastrophic error occurs and the blades reboot. It's completely random and can happen on any of the 6 blades. Here is an example of the error:
ID:2101Type:IPMIDetailed Description:A catastrophic error has occurred. The system has halted.Cause:An uncorrectable memory error is often the cause.Action:Check for other events that occurred near the same time which may help identify the cause or potential hardware failure.Extra Data:s:68:"Raw IPMI (hex): Gen:3000 Num:80 Type:07 EDir:83 ED1:a1 ED2:01 ED3:01";
The error indicates a possible memory issue but Intel support has been unable to identify the exact issue. We've replaced a module completely but others are still throwing these errors. Has anyone seen this before and know of a possible resolution?