Replace failed disk in VIOS

This article describes two common scenarios for replacing failing local disk in VIOS. 

Failed disk in VIO server which is used by VIO client(s)

In this scenario the failing disk contains LVs used for rootvg of VIO clients. The rootvg is mirrored to another disk presented by a second VIO server. This scenario is illustrated below:

 


Follow the procedure to replace failing disk on VIO server:
 
On VIOS, as padmin user:
 
Get and record information which will be needed for later operations and recreation of devices and configurations:
$ lsdev -virtual
To get volume group name in which failed disk participate:
$ lspv
To get list of logical volumes on disk:
$ lspv -lv <hdisk#>
To get info about logical volumes, e.g. size (number of LPs):
$ lsvg -lv <VGname>
$ lsvg <VGname>
To get info about LVs, VTD names, vhost numbers and virtual clients:
$ lsmap -all
On client(s):

Identify affected disk(s)(LVs on bad disk on VIOS):
# lscfg -vl <hdisk#>  (for all virtual SCSI disks) 
hdisk1           U9117.MMA.999999-V2-C12-T1-L8200000000000000  Virtual SCSI Disk Drive
Take note of the following:
V# - LPAR ID (this should be the LPAR ID of the affected VIOS)
C# - slot number
L# - LUN ID
# lspv
The affected disk may be listed as removed or missing depending on the failure.
# lsvg -p rootvg
Remove the bad disk from the mirror:
# unmirrorvg rootvg <hdisk#>
# reducevg rootvg <hdisk#>
# rmdev -dl hdisk#
 
On VIOS:
 
Remove all VTDs and LVs that reside on the failed disk:
$ rmvdev -vtd <VTDname> -rmlv
or

$ rmdev -dev <VTDname>
$ rmdev -dev <LVname>
Check if all logical volumes are removed form bad disk:
$ lspv -lv <hdisk#>
Remove the disk from the respective volume group:
$ reducevg <VGname> <hdisk#>

Note: If the volume group consists of only one disk then the whole VG will need to be removed from ODM. In that case use the following commands:

$ deactivatevg <VGname>
$ exportvg <VGname>
Replace the failed disk:

$ diagmenu

--> select “Task Selection

               --> select “Hot Plug Task

                         --> select “SCSI and SCSI RAID Hot Plug Manager”

                                    --> Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure

Configure the new disk:
$ cfgdev
Add the new disk to the volume group or recreate the VG in case it was removed: 
$ extendvg <VGname> <new hdisk#>
or
$ mkvg -vg <VGname>  <new hdisk#>
Recreate the LVs with the same names and size which we got in the beginning.
$ mklv -lv <LVname> <VGname> <size> 
Recreate the VTDs:
$ mkvdev -vdev <LVname> -vadapter <vhost#> -dev <VTDname>
 
On client(s):
Discover new disk(s) and rebuid mirror:
# cfgmgr
# extendvg rootvg <new hdisk>
# mirrorvg rootvg <new hdisk#>
Build boot image on both mirrored disks (just in case):
# bosboot -ad /dev/<hdisk0>
# bosboot -ad /dev/<hdisk1>
Set bootlist:
# bootlist -m normal <list names of both hdisks>
# bootlist -m normal -o
  

Bad disk in rootvg of VIO server

Usually rootvg utilize some kind of disk protection. Most often rootvg consists of disks which are LVM mirrored. To replace a mirrored hdisk in rootvg of VIO server you can use VIO commands or root AIX commands (to become root, use oem_setup_env command). In this example we will use VIO commands since this is the recommended way of managing VIOS.

Break the mirror:

$ unmirrorios <hdisk#>  , where <hdisk#> is the bad disk
 Check if any LV remained on the bad disk:
$ lspv -lv <hdisk#>

If there are any (e.g. lg_dumplv - dump device) migrate them to the other disk or remove them (dump device can be recreated later):

$ migratepv -lv <LV> <bad_hdisk> <good_hdisk>
or
$ rmlv -f <LV>
 Take out failed disk from rootvg:
$ reducevg rootvg <bad_hdisk>
 Use ”Hot Plug” procedure to replace the failed disk:

$ diagmenu

--> select “Task Selection

               --> select “Hot Plug Task

                         --> select “SCSI and SCSI RAID Hot Plug Manager”

                                    --> Replace/Remove a Device Attached to an SCSI Hot Swap Enclosure

Configure the new disk:
$ cfgdev
Verify that the new disk came back with the same number as the previous one:
$ lspv

$ extendvg rootvg <hdisk#>
$ mirrorios -defer <hdisk#>  (Note that if you do not use -defer option, your VIO server will be rebooted after mirroring completes)

Check bootlist to ensure that both disks are included as boot devices:

$ bootlist -mode normal -ls
hdisk0 blv=hd5
hdisk1 blv=hd5

Use the command below to include both disks if they do not show up in the bootlist:

$ bootlist -mode normal hdisk0 hdisk1


tags: disk replacement, VIOS


Comments:

harjuking

2014-09-12 05:53:11
nice

Hemananth

2016-06-02 15:43:17
Excellent stuff. Very helpful! Thank you

Antonio Jr

2016-08-11 10:33:39
Absolutely essential - kudos!

DiViNe

2017-08-09 06:22:55
Just great! Thank you very much!

Mani Mindecoded

2017-09-21 12:08:49
This procedure I just tried during our VIO root disk failure, it was accurate and life saver thank you!!!!!!!!!

Leave a Comment: