This article mostly use various commands to show how to manag cluster resources. Before get started, here is the cluster current status
# pcs status
Cluster name: pacemaker_cluster
Last updated: Sat Aug 29 08:21:24 2015
Last change: Sat Aug 29 08:21:04 2015
Current DC: nodeA - partition with quorum
4 Nodes configured
18 Resources configured
Online: [ nodeA nodeB nodeC nodeD ]
Full list of resources:
fs11 (ocf::heartbeat:Filesystem): Started nodeA
fs12 (ocf::heartbeat:Filesystem): Started nodeA
fs13 (ocf::heartbeat:Filesystem): Started nodeB
fs14 (ocf::heartbeat:Filesystem): Started nodeB
fs15 (ocf::heartbeat:Filesystem): Started nodeC
fs16 (ocf::heartbeat:Filesystem): Started nodeC
Create a resource
To create a new resource
#pcs ource create fs21 ocf:heartbeat:Filesystem params device=/dev/mapper/LUN21 directory=/lun21 fstype="xfs" fast_stop="no" force_unmount="safe" op stop on-fail=stop timeout=200 op monitor on-fail=stop timeout=200 OCF_CHECK_LEVEL=10
Delete a resource
To delete a resource
#pcs delete resource fs11
Manually Moving Resources Around the Cluster
To mover a resource to any other nodes
nodeA# pcs resource move fs21
To move a resource to a particular node
nodeA# pcs resource move fs21 nodeC
To mover a resource to its origional nodeD
nodeA# pcs resource clear fs21
Note: You can also use the move command to move resource back to its origional node, however, it won't clear the constraint that move command generated. Thus, it's better to use 'resource clear' to move back to its normal status.
Note2: When moving a resource, any other resources that has constraint to the resource to be moved will get moved too.
Moving Resources Due to Failure
By default, there is no threshold defined, so pacemaker will move resource to other nodes whenever it fails. To define a threadhold to 3,run
pcs resource meta fs21 migration-threshold=3
The command above defines the resource fs21 to move after 3 failures.
Note: after a resource move due to failure, it will not run on the origional node until the failcount is reset, or failure timeout reached.
To set all resource threshold to 3, so all resources in the cluster will move after 3 times fails
pcs resource defaults migration-threshold=10
To show current failcount
# pcs resource failcount show fs21
No failcounts for fs21
To cear the failcount, run
# pcs resource failcount reset fs21
Note: the threshold only works when in normal mode, not for start and stop operation.
Start failures cause the failcount to be set to INFINITY and thus always cause the resource to move immediately.
Stop failures are slightly different and crucial. If a resource fails to stop and STONITH is enabled, then the cluster will fence the node in order to be able to start the resource elsewhere. If STONITH is not enabled, then the cluster has no way to continue and will not try to start the resource elsewhere, but will try to stop it again after the failure timeout.
Moving Resources Due to Connectivity Changes
Whthin the cluster, one a node has connection issue with other nodes, this node will be fenced off(depends on the fencing properity). How about external connectivity? What happens if a node has external connectivity issue?
The solution is to have a pingd resource created, and configure a location constraint for the resource that will move the resource to a different node when connectivity is lost.
#pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=<external ip or host>
set collocation constraint to other resource on the node
pcs constraint location <resource id> rule score=-INFINITY pingd lt 1 or not_defined pingd
Disabling, and Banning Cluster Resources
To stop a resource on a node and don't want it get started on other nodes.
#pcs resource disabled <resource id>
To start a resource on a node and back the resource to normal state
#pcs resource enabled <resource id>
To ban a resource on a node
#pcs resource ban <resource id> [node]
If no node specified, it's banned on current node
To remove the ban constraint, run
#pcs resource clear <resource id>
To debug a resrouce start
#pcs resource debug-start <resource id>
Disabling a Monitor Operations
To disable monitor operation for a resource
#pcs resource update filesystem21 op monitor enabled="false"
To enable monitor operation for a resource
#pcs resource update fs21 op monitor enabled="true"
To permenant stop a resource monitoring , just delete the monitoring
To set a resource to unmanaged state, compare to the resource deletion, unmanaged resource is still in the cluster configuration, but pacemaker doesn't manage it.
#pcs resource unmanage <resource id>
To set a resource to managed state
#pcs resource manage <resource id>