DLPAR or Dynamic Logical Partitioning problems

DLPAR or Dynamic Logical Partitioning problems

DLPAR provides users the ability to dynamically add, remove or modify LPAR resources such as memory, CPU, or I/O devices.

The most common problem with DLPAR operations is related to RMC (Resource Monitoring and Control). Since DLPAR function relies on RMC connection between HMC and LPARs, you should ensure that the public network interface of your HMC is properly configured and HMC can reach your LPARs via network (HMC connection to FSP (Flexisible Service Processor) of managed systems is not enough). If any firewalls between HMC and LPARs exist, check that port 657 upd/tcp is open in both directions. Ensure that RMC connection is allowed for the public interface of HMC as well:

HMC Management --> Change Network Settings --> LAN adapters (choose the public one) --> Firewall settings

To start troubleshooting any problem with Dynamic Logical Partitioning, go to HMC restricted shell and check the output of the command:

# lspartition -dlpar

The output will show you any logical partitions which are ready for DLPAR operations. If there is no output at all, that means that either there are no LPARs which can communicate with HMC via network, or there is some problem with the HMC itself. If you suspect the HMC, try with rebooting it. Experience shows that lots of strange problems associated with HMC can be solved just by reboot.

The output of lspartition -dlpar of working RMC communication for the LPAR in interest should be something similar to that:

<#1> Partition:<1*9117-MMB*XXXXXXX, hostname, 10.30.23.15>

Active:<1>, OS:<AIX, 6.1, 6100-07-04-1216>, DCaps:<0x4ebf>, CmdCaps:<0x1b, 0x1b>, PinnedMem:<384>

You should check if DCaps value is higher than 0x0 and active value is higher than 0. If it is not, perform the next steps from the LPAR you are trying to perform DLPAR operations.

Check the RMC connection to HMC using the following command:

# lsrsrc IBM.ManagementServer

If you are using AIX 7.1 type the following command instead:

# lsrsrc IBM.MCP

You should see something like this:

Resource Persistent Attributes for IBM.MCP

resource 1:

        MNName            = "10.30.23.15"

        NodeID            = 18194515442147552355

        KeyToken          = "hmc.localdomain"

        IPAddresses       = {"10.30.23.15"}

        ConnectivityNames = {"10.30.23.15"}

        HMCName           = "7042CR4*XXXXXXX"

        HMCIPAddr         = "10.30.23.10"

        HMCAddIPs         = "192.168.128.1"

        HMCAddIPv6s       = ""

        ActivePeerDomain  = ""

        NodeNameList      = {"Test"}

If you can see information about the HMC, it’s a good sign; if not, check the status of the main daemon IBM.DRM used for dynamic logical partitioning:

# lssrc -g rsct_rm

Subsystem         Group            PID          Status

 IBM.ServiceRM    rsct_rm                       inoperative

 IBM.DRM          rsct_rm                       inoperative

 IBM.ERRM         rsct_rm                       inoperative

 IBM.AuditRM      rsct_rm                       inoperative

 IBM.MgmtDomainRM rsct_rm                       inoperative

 IBM.HostRM       rsct_rm                       inoperative

If it is in inoperative state, you can restart it with the following commands (note that sometimes, especially in AIX 7, this daemon is not active all the time but only when needed):

# /usr/sbin/rsct/bin/rmcctrl -z
# /usr/sbin/rsct/bin/rmcctrl -A
# /usr/sbin/rsct/bin/rmcctrl -p

Check its status again:

# lssrc -a | grep rsct

 ctrmc            rsct             5374166      active

 IBM.HostRM       rsct_rm          14156014     active

 IBM.ServiceRM    rsct_rm          5439716      active

 IBM.MgmtDomainRM rsct_rm          9240692      active

 IBM.DRM          rsct_rm          17301604     active

 ctcas            rsct                          inoperative

 IBM.ERRM         rsct_rm                       inoperative

 IBM.AuditRM      rsct_rm                       inoperative

If the above does not change the output of lspartition -dlpar, you can try to reconfigure the RMC by using recfgct command. Basically this command recreates the RMC connection.

Before using the recfgct command make sure that your server is not part of CSM or GPFS cluster because it could bring you more trouble than non-working DLPAR.

The full path of recfgct command is:

# /usr/sbin/rsct/install/bin/recfgct

Wait 5 to 10 minutes and check the RMC deamon again:

# lssrc -g rsct_rm

....

# lsrsrc IBM.ManagementServer

or

# lsrsrc IBM.MCP

Advice:

When you change system resources dynamically do not forget to modify your LPAR profile accordingly, since at next boot system resources will be assigned according to the profile (which is not affected when performing DLPAR functions).


tags: DLPAR, RMC, RSCT


Comments:

Ravi

2013-11-22 07:15:42
Very Good trouble shoot method

Vineet Ohlyan

2014-01-31 04:43:58
make sure that your server is not part of CMS? Here what do you mean by it pls

LparBox.com Team

2014-01-31 13:09:09
Dear Vineet Ohlyan, thank you for noticing the typo there. It should be CSM (Cluster Systems Management) and is corrected now.

James Gu

2014-02-02 22:03:48
other reason causes the connection issues could be the HMC logged with the same IP address for another LPAR.

shameem

2015-04-23 00:28:21
Nice work

Igor

2016-04-20 06:12:02
Do You maybe know how to solve inactive RMC connection for systems which are under CAA environment? I have cluster nodes which don't have active RMC with new HMC (this HMC is replacement for failed unit, and has IP and hostname of HMC that failed. Command lsrsrc IBM.MCP prints out NodeID and HMC Name for failed HMC, not for the new unit. According to IBM recfgct command should not be used for systems under CAA... Best regards.

Scott

2016-11-02 13:09:02
I have similar issues as Igor mentioned. i have a linux partition on a working LPM dual VIOS dual managed system environment but my LPAR running sles11 sp1 fails to migrate... and checking for RMC connections shows none present.. i just rebooted my HMC just to be sure and same as previous... I have opened a call with IBM and waiting for them to return my call now....

Leave a Comment: