HP EVA Cache Battery Failure Issue
The current issue I have come across in our HP storage environment is an issue with the storage controller cache battery modules. We had a module fail recently on one of our 8100 series EVAs. There can be up to four modules per controller. In our environment, we are using two modules per controller.
A healthy set of modules looks like this:
Now, for the EVA we have problems with, it looks like this:
This problem occurred after we had this particular module fail. We received a replacement from HP and swapped it out. However, after a few days, it was marked as failed again. Again we received a replacement from HP, and swapped it out. A few days later, same result. In contacting HP a third time, I explained what had occurred. In response, I received this notification:
This is just another somewhat oddball error that we deal with on a regular basis. Now, on to the fix! To restart the controller in question, first note as per Command View which controller is in question. In my case, it is Controller A (just follow the bang indicators)
A restart of the controller should be done during your change / maintenance window (all those years of ITIL ingrained in me!). To do so, you have a few choices.
The first is via Command View:
On the controller’s page, hit shutdown, then restart and the controller (A/B).
The second is via the SSSU utility (installs as part of the Command View install):
Restart controller A, but not its peer controller:
RESTART “HardwareRack 1Enclosure 7Controller A” NOALL_PEERS
Note that when restarting the controller, if it is the master controller the vdisks will transfer to the other controller without any downtime. In my experience with the EVAs, they are a touchy lot. I prefer using the SSSU utility for a halfway decent command line interface. Pretty powerful too. I’ll be writing up a blog posting discussing good uses for SSSU in the future.