BLUE21neo: [CentOS7] Pacemaker + corosync のインストール

CentOS7 に Pacemaker(corosync) をインストールして基本動作を確認します。

Pacemaker(corosync)のインストール、設定については、以下のページを参考にしました。
特に、RedHatのマニュアルが参考になりました。

検証環境は以下のとおり。

[環境]

サーバ１
- OS: CentOS7.0
- IPアドレス： 10.1.0.71
- ホスト名： mysql01

サーバ２
- OS: CentOS7.0
- IPアドレス： 10.1.0.72
- ホスト名： mysql02

１．Pacemakerのインストール

CentOS7.0の標準レポジトリから yum でpacemaker、corosync、pcs をインストールします。
「サーバ１」と「サーバ２」にインストールします。

# yum install pacemaker corosync pcs

pcs はクラスタ管理用のCLIツールです。
pcsを使用したクラスタ操作は、クラスタメンバのサーバであれば、どれか１サーバで実行すれば良いので、ここでは、基本的に「サーバ１」で pcs コマンドを実行するようにしています。

２．OS設定

「サーバ１」と「サーバ２」でSELinuxとファイヤーウォールが無効になっていることを確認します。

# getenforce
Disabled
# systemctl list-unit-files -t service | grep firewalld
firewalld.service                           disabled
# iptables -L -n
Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

クラスタ設定ではホスト名を使用したいので、「サーバ１」と「サーバ２」の /etc/hosts にホスト名を登録します。

10.1.0.71 mysql01
10.1.0.72 mysql02

「サーバ１」と「サーバ２」の双方向で疎通を確認します。

# ping -c1 mysql01（または、mysql02）

pcsでクラスタの設定を行うときに、ssh を使用します。「サーバ１」と「サーバ２」の双方向でsshでログインできることを確認します。

# ssh -l root mysql01(または、mysql02）

pcsでコマンドを実行するときに hacluster ユーザを使用しますが、初期状態ではパスワードが設定されていないので、パスワードを設定します。「サーバ１」と「サーバ２」で実施します。

# passwd hacluster

３．クラスタ設定

pcs でクラスタの設定を行います。ます、pcsd を起動し、自動起動を有効にします。

# systemctl start pcsd
# systemctl enable pcsd

「サーバ１」と「サーバ２」の信頼関係を設定します。hacluster ユーザで認証できるようにします。[Password] は、上記２で設定したパスワードを指定します。「サーバ１」だけで実行します。

# pcs cluster auth mysql01 mysql02
Username: hacluster
Password:
mysql01: Authorized
mysql02: Authorized

クラスタ名を "cluster_mysql" として、mysql01 と mysql02 でクラスタをセットアップします。「サーバ１」だけで実行します。

# pcs cluster setup --name cluster_mysql mysql01 mysql02
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
mysql01: Succeeded
mysql02: Succeeded

このコマンドの実行により、「サーバ１」と「サーバ２」に /etc/corosync/corosync.conf が作成されます。内容は以下のとおり。

# cat /etc/corosync/corosync.conf
totem {
version: 2
secauth: off
cluster_name: cluster_mysql
transport: udpu
}

nodelist {
  node {
        ring0_addr: 10.1.0.71
        nodeid: 1
       }
  node {
        ring0_addr: 10.1.0.72
        nodeid: 2
       }
}

quorum {
provider: corosync_votequorum
two_node: 1
}

logging {
to_syslog: yes
}

この設定だとPacemakerのログは /var/log/messages に出力されます。/etc/corosync に設定のサンプルファイルがあるので、logging の設定を変更して、Pacemaker用のログファイルに出力するようにしたほうが見やすいと思います。
直接、このファイルを編集して設定変更できます。設定変更後はクラスタを再起動します。

４．クラスタ起動と状態確認

「サーバ１」と「サーバ２」のクラスタ(pacemaker + corosync)を起動します。
下記コマンドを「サーバ１」で実行します。

# pcs cluster start --all
mysql01: Starting Cluster...
mysql02: Starting Cluster...

クラスタの自動起動／停止は、pacemaker + corocyncを自動起動／停止するように「サーバ１」と「サーバ２」に設定します。
なお、corosync の自動起動に失敗する場合があるようです。こちらのページの最後のほうを参考にしてください。
私の環境では問題なく起動しました。

# systemctl enable pacemaker
# systemctl enable corosync

クラスタのプロセスは以下のとおり。
「サーバ１」と「サーバ２」でpacemakerとcorosyncが起動します。

# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
～ 省略 ～
root     23266     1  0 20:56 ?        00:00:00 /bin/sh /usr/lib/pcsd/pcsd start
root     23270 23266  0 20:56 ?        00:00:00 /bin/bash -c ulimit -S -c 0 >/dev/null 2>&1 ; /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.
root     23271 23270  0 20:56 ?        00:00:01 /usr/bin/ruby -I/usr/lib/pcsd /usr/lib/pcsd/ssl.rb
root     23762     2  0 21:05 ?        00:00:00 [kworker/0:0]
root     23995     2  0 21:07 ?        00:00:00 [kworker/1:0]
root     24950     1  2 21:24 ?        00:00:00 corosync
root     24965     1  0 21:24 ?        00:00:00 /usr/sbin/pacemakerd -f
haclust+ 24966 24965  0 21:24 ?        00:00:00 /usr/libexec/pacemaker/cib
root     24967 24965  0 21:24 ?        00:00:00 /usr/libexec/pacemaker/stonithd
root     24968 24965  0 21:24 ?        00:00:00 /usr/libexec/pacemaker/lrmd
haclust+ 24969 24965  0 21:24 ?        00:00:00 /usr/libexec/pacemaker/attrd
haclust+ 24970 24965  0 21:24 ?        00:00:00 /usr/libexec/pacemaker/pengine
haclust+ 24971 24965  0 21:24 ?        00:00:00 /usr/libexec/pacemaker/crmd
root     25015   626  0 21:25 ?        00:00:00 sleep 60
root     25017  2472  0 21:25 pts/0    00:00:00 ps -ef

クラスタの状態を確認します。「サーバ１」で実行します。

# pcs status cluster
Cluster Status:
 Last updated: Mon May  4 11:19:05 2015
 Last change: Mon May  4 10:50:30 2015
 Stack: corosync
 Current DC: mysql01 (1) - partition with quorum
 Version: 1.1.12-a14efad
 2 Nodes configured
 0 Resources configured

クラスタメンバ（ノード）の状態を確認します。「サーバ１」で実行します。

# pcs status nodes
Pacemaker Nodes:
 Online: mysql01 mysql02
 Standby:
 Offline:
# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(10.1.0.71)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(10.1.0.72)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
# pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 mysql01 (local)
         2          1 mysql02

クラスタ、ノード、リソース、デーモンの状態を確認します。「サーバ１」で実行します。

# pcs status
Cluster name: cluster_mysql
Last updated: Mon May  4 12:29:19 2015
Last change: Mon May  4 11:01:42 2015
Stack: corosync
Current DC: mysql01 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
0 Resources configured


Online: [ mysql01 ]
OFFLINE: [ mysql02 ]

Full list of resources:


PCSD Status:
  mysql01: Online
  mysql02: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

５．クラスタ設定のチェック

クラスタ設定をチェックします。｢サーバ1｣で実行します。

# crm_verify -L -V
   error: unpack_resources:     Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:     Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:     NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

STONITHの設定をしていないのでエラーになっています。STONITHは、Pacemakerがスプリットブレインを検知したときに強制的にH/Wを電源OFF/ONする機能だそうです。STONISHについては、以下のページが詳しいです。

HAクラスタをフェイルオーバ失敗から救おう！(PDF)

STONISH はデフォルトで有効になっていますが、検証用の仮想化環境では使用しないので無効にします。

# pcs property set stonith-enabled=false

エラーにはなっていませんが、クォーラムの設定も変更します。クォーラムについては、以下が詳しいです。

Pacemakerを使いこなそう

今回は２台構成なので、スプリットブレインが発生しても quorum が特別な動作を行わないように無効にします。

# pcs property set no-quorum-policy=ignore

パラメータを確認します。

# pcs property
Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster_mysql
 dc-version: 1.1.12-a14efad
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false

クラスタの再起動は不要です。変更は即時反映されます。

６．クラスタ操作

pcsコマンドでクラスタを操作してみます。ここでは、「サーバ１」でpcsコマンドを実行します。

クラスタを停止する場合は、以下のようにします。

# pcs cluster stop --all
mysql01: Stopping Cluster (pacemaker)...
mysql02: Stopping Cluster (pacemaker)...
mysql02: Stopping Cluster (corosync)...
mysql01: Stopping Cluster (corosync)...

クラスタを再起動したい場合は、stop してから start します。

# pcs cluster stop --all && pcs cluster start --all

「サーバ２」をスタンバイにして、状態を確認してみます。

# pcs cluster standby mysql02
# pcs status nodes
Pacemaker Nodes:
 Online: mysql01
 Standby: mysql02
 Offline:

「サーバ２」をオンラインに戻します。

# pcs cluster unstandby mysql02
# pcs status nodes
Pacemaker Nodes:
 Online: mysql01 mysql02
 Standby:
 Offline:

「サーバ２」だけ停止します。

# pcs cluster stop mysql02
mysql02: Stopping Cluster (pacemaker)...
mysql02: Stopping Cluster (corosync)...
# pcs status nodes
Pacemaker Nodes:
 Online: mysql01
 Standby:
 Offline: mysql02

2015年10月18日日曜日

[CentOS7] Pacemaker + corosync のインストール