Upgrading Intel® EE for Lustre* 3.1.1.0 to Lustre* 2.10.x LTS and Intel® Manager for Lustre* 4.0
Introduction
This document provides a description of how to upgrade an existing Lustre* server file system installation from Intel® EE for Lustre* version 3.1.1.0 running on the RHEL/CentOS 7 OS distribution to Lustre* 2.10 LTS and Intel® Manager for Lustre* version 4 running on RHEL/CentOS 7.
CentOS is used for the examples. RHEL users will need to refer to Red Hat for instructions on enabling the High Availability add-on needed to install Pacemaker, Corosync and related support tools.
Risks
The process of upgrading Intel® EE for Lustre* to a newer Lustre* and Intel® Manager for Lustre* version requires careful consideration and planning. There will always be some disruption to services when major maintenance works are undertaken, although this can be contained and minimized.
The reference platform used throughout the documentation has been installed and is being managed by Intel® Manager for Lustre* (IML), but the methods for the Lustre* server components can be broadly applied to any approximately equivalent Lustre* server environment running the RHEL or CentOS OS.
Process Overview
- Upgrade the Intel® Manager for Lustre* manager server
- Backup the Intel® Manager for Lustre* server (IML manager) configuration and database
- Install the latest version of the IML manager software for EL7
- Upgrade the metadata and object storage server pairs. For each HA pair:
- Backup the server configuration for both machines
- Machines are upgraded one at a time
- Failover all resources to a single node, away from the node being upgraded
- Set the node to standby mode and disable the node in Pacemaker
- Unload the Lustre* kernel modules
- Unload the ZFS and SPL kernel modules, if used
- Install the new EE software
- Move the Pacemaker resources from the secondary node to the upgraded node
- Update the secondary node
- Re-enable the secondary node in the Pacemaker configuration
- Re-balance the resources
Upgrade Intel® Manager for Lustre*
The first component in the environment to upgrade is the Intel® Manager for Lustre* server and software. The manager server upgrade can be conducted without any impact to the Lustre* file system services.
Backup the Existing Intel® Manager for Lustre* Configuration
-
Backup the Existing configuration. Prior to commencing the upgrade, it is essential that a backup of the existing configuration is completed. This will enable recovery of the original configuration in the event of a problem occurring during execution of the upgrade.
The following shell script can be used to capture the essential configuration information that is relevant to the Intel® Manager for Lustre* software itself:
#!/bin/sh # EE Intel Manager for Lustre (IML) server backup script BCKNAME=bck-$HOSTNAME-`date +%Y%m%d-%H%M%S` BCKROOT=$HOME/$BCKNAME mkdir -p $BCKROOT tar cf - --exclude=/var/lib/chroma/repo \ /var/lib/chroma \ /etc/sysconfig/network \ /etc/sysconfig/network-scripts/ifcfg-* \ /etc/yum.conf \ /etc/yum.repos.d \ /etc/hosts \ /etc/passwd \ /etc/group \ /etc/shadow \ /etc/gshadow \ /etc/sudoers \ /etc/resolv.conf \ /etc/nsswitch.conf \ /etc/rsyslog.conf \ /etc/ntp.conf \ /etc/selinux/config \ /etc/ssh \ /root/.ssh \ | (cd $BCKROOT && tar xf -) # IML Database su - postgres -c "/usr/bin/pg_dumpall --clean" | /bin/gzip > $BCKROOT/pgbackup-`date +\%Y-\%m-\%d-\%H:\%M:\%S`.sql.gz cd `dirname $BCKROOT` tar zcf $BCKROOT.tgz `basename $BCKROOT`
-
Copy the backup tarball to a safe location that is not on the server being upgraded.
Note: This script is not intended to provide a comprehensive backup of the entire operating system configuration. It covers the essential components pertinent to Lustre* servers managed by Intel® Manager for Lustre* that are difficult to re-create if deleted.
Install the Intel® Manager for Lustre* Upgrade
The software upgrade process requires super-user privileges to run. Login as the root
user or use sudo
to elevate privileges as required.
-
If upgrading from e.g. EL 7.3 to EL 7.4, run the OS upgrade first. For example:
yum clean all yum update reboot
Refer to the operating system documentation for details on the correct procedure for upgrading between minor OS releases.
-
Download the latest Intel® Manager for Lustre* software from the project’s release page:
https://github.com/intel-hpdd/intel-manager-for-lustre/releases
-
Extract the Intel® Manager for Lustre* bundle. For example:
cd $HOME tar zxf iml-4.0.0.0.tar.gz
-
As root, run the installer:
cd $HOME/iml-* ./install [-h] | [-r] [-p] [-d]
Note: For an explanation of the installation options, run the install script with the
--help
flag:usage: install [-h] [--no-repo-check] [--no-platform-check] [--no-dbspace-check] optional arguments: -h, --help show this help message and exit --no-repo-check, -r --no-platform-check, -p --no-dbspace-check, -d
- The installation program will detect that there is a previous installation and will run the upgrade process.
- When the upgrade is complete, connect to the Intel® Manager for Lustre* service using a web browser and verify that the upgraded version has been installed and is running. Intel® Manager for Lustre* Agents running version EE 3.1 will still be able to communicate with the new version of the manager service. If the Intel® Manager for Lustre* manager browser window was left open during the upgrade process, the window must be reloaded with a hard refresh: hold the control key (Windows/Linux) or shift key (Mac) and hit the reload button. Alternatively, close the window or tab and open a fresh copy of the page.
Create Local Repositories for the Lustre* Packages
The Intel® Manager for Lustre* distribution does not include Lustre* software packages. These need to be acquired separately from the Lustre* project’s download server. The following instructions will establish the manager node as a YUM repository server for the network. The Intel® Manager for Lustre* server is automatically configured as a web server, so it is a convenient location for the repository mirrors.
An alternative strategy is to copy the repository definition from step 1 directly onto each Lustre* server and client (saving the file in /etc/yum.repos.d
on each node), and skip the other steps that follow. This avoids creating a local repository on the manager node, and uses the Lustre* download servers directly to download packages.
Also note that the manager server distribution includes a default repository definition in /usr/share/chroma-manager/storage_server.repo
. This can be copied onto each Lustre* server and client into /etc/yum.repos.d
instead of using these instructions. The choice of method for distributing Lustre* onto the target nodes is a matter of preference and suitability for the target environment.
-
Create a temporary YUM repository definition. This will be used to assist with the initial acquisition of the Lustre* and related packages.
cat >/tmp/lustre-repo.conf <<\__EOF [lustre-server] name=lustre-server baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-release/el7/server exclude=*debuginfo* gpgcheck=0 [lustre-client] name=lustre-client baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-release/el7/client exclude=*debuginfo* gpgcheck=0 [e2fsprogs-wc] name=e2fsprogs-wc baseurl=https://downloads.hpdd.intel.com/public/e2fsprogs/latest/el7 exclude=*debuginfo* gpgcheck=0 __EOF
Note: The above example references the latest Lustre* release available. To use a specific version, replace
latest-release
in the[lustre-server]
and[lustre-client]
baseurl
variables with the version required, e.g.,lustre-2.10.1
. Always use the lateste2fsprogs
package unless directed otherwise.Note: With the release of Lustre* version 2.10.1, it is possible to use patchless kernels for Lustre* servers running LDISKFS. The patchless LDISKFS server distribution does not include a Linux kernel. Instead, patchless servers will use the kernel distributed with the operating system. To use patchless kernels for the Lustre* servers, replace the string
server
withpatchless-ldiskfs-server
at the end of the[lustre-server]
baseurl
string. For example:baseurl=https://downloads.hpdd.intel.com/public/lustre/latest-release/el7/patchless-ldiskfs-server
Also note that the
debuginfo
packages are excluded in the example repository definitions. This is simply to cut down on the size of the download. It is usually a good idea to pull in these files as well, to assist with debugging of issues. -
Use the
reposync
command (distributed in theyum-utils
package) to download mirrors of the Lustre* repositories to the manager server:cd /var/lib/chroma/repo reposync -c /tmp/lustre-repo.conf -n \ -r lustre-server \ -r lustre-client \ -r e2fsprogs-wc
-
Create the repository metadata:
cd /var/lib/chroma/repo for i in e2fsprogs-wc lustre-client lustre-server; do (cd $i && createrepo .) done
-
Create a YUM Repository Definition File
The following script creates a file containing repository definitions for the Intel® Manager for Lustre* Agent software and the Lustre* packages downloaded in the previous section. Review the content and adjust according to the requirements of the target environment. Run the script on the upgraded Intel® Manager for Lustre* host:
hn=`hostname --fqdn` cat >/var/lib/chroma/repo/Manager-for-Lustre.repo <<__EOF [iml-agent] name=Intel Manager for Lustre Agent baseurl=https://$hn/repo/iml-agent/7 enabled=1 gpgcheck=0 sslverify = 1 sslcacert = /var/lib/chroma/authority.crt sslclientkey = /var/lib/chroma/private.pem sslclientcert = /var/lib/chroma/self.crt proxy=_none_ [lustre-server] name=lustre-server baseurl=https://$hn/repo/lustre-server enabled=1 gpgcheck=0 sslverify = 1 sslcacert = /var/lib/chroma/authority.crt sslclientkey = /var/lib/chroma/private.pem sslclientcert = /var/lib/chroma/self.crt proxy=_none_ [lustre-client] name=lustre-client baseurl=https://$hn/repo/lustre-client enabled=1 gpgcheck=0 sslverify = 1 sslcacert = /var/lib/chroma/authority.crt sslclientkey = /var/lib/chroma/private.pem sslclientcert = /var/lib/chroma/self.crt proxy=_none_ [e2fsprogs-wc] name=e2fsprogs-wc baseurl=https://$hn/repo/e2fsprogs-wc enabled=1 gpgcheck=0 sslverify = 1 sslcacert = /var/lib/chroma/authority.crt sslclientkey = /var/lib/chroma/private.pem sslclientcert = /var/lib/chroma/self.crt proxy=_none_ __EOF
This file needs to be distributed to each of the Lustre* servers during the upgrade process to facilitate installation of the software.
Make sure that the
$hn
variable matches the host name that will be used by the Lustre* servers to access the Intel® Manager for Lustre* host.
Upgrade the Lustre* Servers
Lustre* server upgrades can be coordinated as either an online roll-out, leveraging the failover HA mechanism to migrate services between nodes and minimize disruption, or as an offline service outage, which has the advantage of usually being faster to deploy overall, with generally lower risk.
The upgrade procedure documented here shows how to execute the upgrade while the file system is online. It assumes that the Lustre* servers have been installed in pairs, where each server pair forms an independent high-availability cluster built on Pacemaker and Corosync. Intel® Manager for Lustre* deploys these configurations and includes its own resource agent for managing Lustre* assets, called ocf:chroma:Target
. Intel® Manager for Lustre* can also configure STONITH agents to provide node fencing in the event of a cluster partition or loss of quorum.
This documentation will demonstrate how to upgrade a singleLustre* server HA pair. The process needs to be repeated for all servers in the cluster. It is possible to execute this upgrade while services are still online, with only minor disruption during critical phases. Nevertheless, where possible it is recommended that the upgrade operation is conducted during a planned maintenance window with the file system stopped.
In the procedure, “Node 1” or “first node” will be used to refer to the first server in each HA cluster to upgrade, and “Node 2” or “second node” will be used to refer to the second server in each HA cluster pair.
Upgrade one server at a time in each cluster pair, starting with Node 1, and make sure the upgrade is complete on one server before moving on to the second server in the pair.
The software upgrade process requires super-user privileges to run. Login as the root
user or use sudo
to elevate privileges as required to complete tasks.
Note: It is recommended that Lustre* clients are quiesced prior to executing the upgrade. This reduces the risk of disruption to the upgrade process itself. It is not usually necessary to unmount the Lustre* clients, although this is a sensible precaution.
Prepare for the Upgrade
Upgrade one server at a time in each cluster pair, starting with Node 1, and make sure the upgrade is complete on one server before moving on to the second server in the pair.
-
As a precaution, create a backup of the existing configuration for each server. The following shell script can be used to capture the essential configuration information that is relevant to Intel® Manager for Lustre* managed mode servers:
#!/bin/sh BCKNAME=bck-$HOSTNAME-`date +%Y%m%d-%H%M%S` BCKROOT=$HOME/$BCKNAME mkdir -p $BCKROOT tar cf - \ /var/lib/chroma \ /etc/sysconfig/network \ /etc/sysconfig/network-scripts/ifcfg-* \ /etc/yum.conf \ /etc/yum.repos.d \ /etc/hosts \ /etc/passwd \ /etc/group \ /etc/shadow \ /etc/gshadow \ /etc/sudoers \ /etc/resolv.conf \ /etc/nsswitch.conf \ /etc/rsyslog.conf \ /etc/ntp.conf \ /etc/selinux/config \ /etc/modprobe.d/iml_lnet_module_parameters.conf \ /etc/corosync/corosync.conf \ /etc/ssh \ /root/.ssh \ | (cd $BCKROOT && tar xf -) # Pacemaker Configuration: cibadmin --query > $BCKROOT/cluster-cfg-$HOSTNAME.xml cd `dirname $BCKROOT` tar zcf $BCKROOT.tgz `basename $BCKROOT`
Note: This is not intended to be a comprehensive backup of the entire operating system configuration. It covers the essential components pertinent to Lustre* servers managed by Intel® Manager for Lustre* that are difficult to re-create if deleted.
-
Copy the backups for each server’s configuration to a safe location that is not on the servers being upgraded.
Online Upgrade
The upgrade procedure documented here shows how to execute the upgrade while the file system is online. It assumes that the Lustre* servers have been installed by Intel® Manager for Lustre* in pairs, where each server pair forms an independent high-availability cluster built on Pacemaker and Corosync.
Migrate the Resources on Node 1
-
Login to node 1 and failover all resources to the standby node. The easiest way to do this is to set the server to standby mode:
pcs cluster standby [<node name>]
Note: if the node name is omitted from the command, only the node currently logged into will be affected.
-
Verify that the resources have moved to the second node and that the resources are running (all storage targets are mounted on the second node only):
# On node 2: pcs resource show df -ht lustre
-
Disable the node that is being upgraded:
pcs cluster disable [<node name>]
Note: This command prevents the Corosync and Pacemaker services from starting automatically on system boot. The command is a simple and useful way to ensure that when the server is rebooted, it won’t try to join the cluster and disrupt services running on the secondary node. If the node name is omitted from the command, only the node currently logged into will be affected.
[If Needed] Upgrade the OS on Node 1
- Login to node 1.
-
If upgrading from e.g. EL 7.3 to EL 7.4, run the OS upgrade first. For example:
yum clean all yum update reboot
Refer to the operating system documentation for details on the correct procedure for upgrading between minor OS releases.
Install the EPEL Repository Definition on Node 1
- Login to node 1.
-
Install EPEL repository support:
yum -y install epel-release
Upgrade the Intel® Manager for Lustre* Agent on Node 1
- Login to node 1.
-
Install the Intel® Manager for Lustre* COPR Repository definition, which contains some dependencies for the Intel® Manager for Lustre* Agent:
yum-config-manager --add-repo \ https://copr.fedorainfracloud.org/coprs/managerforlustre/manager-for-lustre/repo/epel-7/managerforlustre-manager-for-lustre-epel-7.repo
-
Install the DNF project COPR Repository definition. DNF is a package manager, and is used as a replacement for YUM in many distributions, such as Fedora. It does not replace YUM in CentOS, but Intel® Manager for Lustre* does make use of some of the newer features in DNF for some of its tasks:
yum-config-manager --add-repo \ https://copr.fedorainfracloud.org/coprs/ngompa/dnf-el7/repo/epel-7/ngompa-dnf-el7-epel-7.repo
-
Remove or rename the old Intel® Manager for Lustre* YUM repository definition:
mv /etc/yum.repos.d/Intel-Lustre-Agent.repo \ $HOME/Intel-Lustre-Agent.repo.bak
-
Install the Intel® Manager for Lustre* Agent repository definition:
curl -o /etc/yum.repos.d/Manager-for-Lustre.repo \ --cacert /var/lib/chroma/authority.crt \ --cert /var/lib/chroma/self.crt \ --key /var/lib/chroma/private.pem \ https://<admin server>/repo/Manager-for-Lustre.repo
Replace
<admin server>
in thehttps
URL with the appropriate Intel® Manager for Lustre* hostname (normally the fully-qualified domain name). -
Upgrade the Intel® Manager for Lustre* Agent and Diagnostics packages
yum -y install chroma-\*
Note: following warning during install / update is benign and can be ignored:
... /var/tmp/rpm-tmp.MV5LGi: line 5: /selinux/enforce: No such file or directory [] { "result": null, "success": true } ...
Upgrade the Lustre* Server Software on Node 1
-
Stop ZED and unload the ZFS and SPL modules:
systemctl stop zed rmmod zfs zcommon znvpair spl
-
Remove the Lustre* ZFS and SPL packages, as they will be replaced with upgraded packages. The upgrade process will fail if these packages are not preemptively removed:
yum erase zfs-dkms spl-dkms \ lustre \ lustre-modules \ lustre-osd-ldiskfs \ lustre-osd-zfs \ lustre-osd-ldiskfs-mount \ lustre-osd-zfs-mount \ libzpool2 \ libzfs2
-
Clean up the YUM cache to remove any residual information on old repositories and package versions:
yum clean all
Note: If an error similar to the following is observed when installing packages at any point in the process, re-run the
yum clean all
command, then re-try the install:libzfs2-0.6.5.7-1.el7.x86_64.r FAILED http://download.zfsonlinux.org/epel/7.3/x86_64/libzfs2-0.6.5.7-1.el7.x86_64.rpm: [Errno 14] HTTP Error 404 - Not Found Trying other mirror. To address this issue please refer to the below knowledge base article https://access.redhat.com/articles/1320623 If above article doesn't help to resolve this issue please create a bug on https://bugs.centos.org/ ... spl-dkms-0.6.5.7-1.el7.noarch. FAILED http://download.zfsonlinux.org/epel/7.3/x86_64/spl-dkms-0.6.5.7-1.el7.noarch.rpm: [Errno 14] HTTP Error 404 - Not Found Trying other mirror. libzpool2-0.6.5.7-1.el7.x86_64 FAILED http://download.zfsonlinux.org/epel/7.3/x86_64/libzpool2-0.6.5.7-1.el7.x86_64.rpm: [Errno 14] HTTP Error 404 - Not Found Trying other mirror. zfs-dkms-0.6.5.7-1.el7.noarch. FAILED http://download.zfsonlinux.org/epel/7.3/x86_64/zfs-dkms-0.6.5.7-1.el7.noarch.rpm: [Errno 14] HTTP Error 404 - Not Found Trying other mirror. Error downloading packages: spl-dkms-0.6.5.7-1.el7.noarch: [Errno 256] No more mirrors to try. libzpool2-0.6.5.7-1.el7.x86_64: [Errno 256] No more mirrors to try. zfs-dkms-0.6.5.7-1.el7.noarch: [Errno 256] No more mirrors to try. libzfs2-0.6.5.7-1.el7.x86_64: [Errno 256] No more mirrors to try.
-
For systems that have both LDISKFS and ZFS OSDs:
Note: This is the configuration that Intel® Manager for Lustre* installs for all managed-mode Lustre* storage clusters. Use this configuration for the highest level of compatibility with Intel® Manager for Lustre*.
-
Install the Lustre*
e2fsprogs
distribution:yum --nogpgcheck --disablerepo=* --enablerepo=e2fsprogs-wc \ install e2fsprogs e2fsprogs-devel
-
If the installed
kernel-tools
andkernel-tools-libs
packages are at a higher revision than the patched kernel packages in the Lustre* server repository, they will need to be removed:yum erase kernel-tools kernel-tools-libs
-
Install the Lustre-patched kernel packages. Ensure that the Lustre* repository is picked for the kernel packages, by disabling the OS repos:
yum --nogpgcheck --disablerepo=base,extras,updates \ install \ kernel \ kernel-devel \ kernel-headers \ kernel-tools \ kernel-tools-libs \ kernel-tools-libs-devel
-
Install additional development packages. These are needed to enable support for some of the newer features in Lustre* – if certain packages are not detected by Lustre’s* configure script when its DKMS package is installed, the features will not be enabled when Lustre* is compiled. Notable in this list are
krb5-devel
andlibselinux-devel
, needed for Kerberos and SELinux support, respectively.yum install \ asciidoc audit-libs-devel automake bc \ binutils-devel bison device-mapper-devel elfutils-devel \ elfutils-libelf-devel expect flex gcc gcc-c++ git \ glib2 glib2-devel hmaccalc keyutils-libs-devel krb5-devel ksh \ libattr-devel libblkid-devel libselinux-devel libtool \ libuuid-devel libyaml-devel lsscsi make ncurses-devel \ net-snmp-devel net-tools newt-devel numactl-devel \ parted patchutils pciutils-devel perl-ExtUtils-Embed \ pesign python-devel redhat-rpm-config rpm-build systemd-devel \ tcl tcl-devel tk tk-devel wget xmlto yum-utils zlib-devel
-
Ensure that a persistent hostid has been generated on the machine. If necessary, generate a persistent hostid (needed to help protect zpools against simultaneous imports on multiple servers). For example:
hid=`[ -f /etc/hostid ] && od -An -tx /etc/hostid|sed 's/ //g'` [ "$hid" = `hostid` ] || genhostid
-
Reboot the node.
reboot
-
Install the metapackage that will install Lustre* and the LDISKFS and ZFS ‘kmod’ packages:
yum --nogpgcheck install lustre-ldiskfs-zfs
-
Verify that the DKMS kernel modules for Lustre*, SPL and ZFS have installed correctly:
dkms status
All packages should have the status
installed
. For example:# dkms status lustre, 2.10.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed spl, 0.7.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed zfs, 0.7.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed
-
Load the Lustre* and ZFS kernel modules to verify that the software has installed correctly:
modprobe -v zfs modprobe -v lustre
-
- For systems that use LDISKFS OSDs:
-
Upgrade the Lustre*
e2fsprogs
distribution:yum --nogpgcheck \ --disablerepo=* --enablerepo=e2fsprogs-wc install e2fsprogs
-
If the installed
kernel-tools
andkernel-tools-libs
packages are at a higher revision than the patched kernel packages in the Lustre* server repository, they will need to be removed:yum erase kernel-tools kernel-tools-libs
-
Install the Lustre* patched kernel packages. Ensure that the Lustre* repository is picked for the kernel packages, by disabling the OS repos:
yum --nogpgcheck --disablerepo=base,extras,updates \ install \ kernel \ kernel-devel \ kernel-headers \ kernel-tools \ kernel-tools-libs \ kernel-tools-libs-devel
-
Reboot the node.
reboot
-
Install the LDISKFS
kmod
and other Lustre* packages:yum --nogpgcheck install \ kmod-lustre \ kmod-lustre-osd-ldiskfs \ lustre-osd-ldiskfs-mount \ lustre \ lustre-resource-agents
-
Load the Lustre* kernel modules to verify that the software has installed correctly:
modprobe -v lustre
-
- For systems with only ZFS-based OSDs:
-
Install the kernel packages that match the latest supported version for the Lustre* release:
yum install \ kernel \ kernel-devel \ kernel-headers \ kernel-tools \ kernel-tools-libs \ kernel-tools-libs-devel
It may be necessary to specify the kernel package version number in order to ensure that a kernel that is compatible with Lustre* is installed. For example, Lustre* 2.10.1 has support for RHEL kernel 3.10.0-693.2.2.el7:
yum install \ kernel-3.10.0-693.2.2.el7 \ kernel-devel-3.10.0-693.2.2.el7 \ kernel-headers-3.10.0-693.2.2.el7 \ kernel-tools-3.10.0-693.2.2.el7 \ kernel-tools-libs-3.10.0-693.2.2.el7 \ kernel-tools-libs-devel-3.10.0-693.2.2.el7
Note: If the
kernel-tools
andkernel-tools-libs
packages that have been installed on the host prior to running this command are at a higher revision than the kernel version supported by Lustre*, they will need to be removed first:yum erase kernel-tools kernel-tools-libs
Refer to the Lustre* Changelog for the list of supported kernels.
-
Install additional development packages. These are needed to enable support for some of the newer features in Lustre* – if certain packages are not detected by Lustre’s* configure script when its DKMS package is installed, the features will not be enabled when Lustre* is compiled. Notable in this list are
krb5-devel
andlibselinux-devel
, needed for Kerberos and SELinux support, respectively.yum install \ asciidoc audit-libs-devel automake bc \ binutils-devel bison device-mapper-devel elfutils-devel \ elfutils-libelf-devel expect flex gcc gcc-c++ git \ glib2 glib2-devel hmaccalc keyutils-libs-devel krb5-devel ksh \ libattr-devel libblkid-devel libselinux-devel libtool \ libuuid-devel libyaml-devel lsscsi make ncurses-devel \ net-snmp-devel net-tools newt-devel numactl-devel \ parted patchutils pciutils-devel perl-ExtUtils-Embed \ pesign python-devel redhat-rpm-config rpm-build systemd-devel \ tcl tcl-devel tk tk-devel wget xmlto yum-utils zlib-devel
-
Ensure that a persistent hostid has been generated on the machine. If necessary, generate a persistent hostid (needed to help protect zpools against simultaneous imports on multiple servers). For example:
hid=`[ -f /etc/hostid ] && od -An -tx /etc/hostid|sed 's/ //g'` [ "$hid" = `hostid` ] || genhostid
-
Reboot the node.
reboot
-
Install the packages for Lustre* and ZFS:
yum --nogpgcheck install \ lustre-dkms \ lustre-osd-zfs-mount \ lustre \ lustre-resource-agents \ zfs
-
Load the Lustre* and ZFS kernel modules to verify that the software has installed correctly:
modprobe -v zfs modprobe -v lustre
-
Start the Cluster Framework on Node 1
-
Login to node 1 and start the cluster framework as follows:
pcs cluster start
-
Take node 1 out of standby mode:
pcs cluster unstandby [<node 1>]
Note: If the node name is omitted from the command, the currently logged in node will be removed from standby mode.
-
Verify that the node is active:
pcs status
Node 1 should now be in the online state.
Migrate the Lustre* Services to Node 1
-
Verify that Pacemaker is able to identify node 1 as a valid target for hosting its preferred resources:
pcs resource relocate show
The command output should list the set of resources that would be relocated to node 1 according to the location constraints applied to each resource.
-
Nominate a single cluster resource and move it back to node 1 to validate the upgrade node configuration:
pcs resource relocate run <resource name>
-
Verify that the resource has started successfully on node 1:
pcs status df -ht lustre
-
Once the first resource has been successfully started on node 1, migrate all of the remaining resources away from node 2. The easiest way to do this is to set the node 2 to standby mode, using the following command:
pcs cluster standby <node 2>
Note: if the node name is omitted from the command, the node currently logged into will be affected. To be completely safe, log into node 2 to run the command. When specifying the node, ensure that the node name exactly matches the name used in the Pacemaker configuration. If Pacemaker uses the FQDN of a host, then the FQDN must be used in the
pcs
commands. -
Verify that the resources have moved to node 1 and that the resources are running (all storage targets are mounted on the second node only):
# On node 1: pcs resource show df -ht lustre
-
Disable node 2 in the cluster framework:
pcs cluster disable <node 2>
Node 1 upgrade is complete.
[If Needed] Upgrade the OS on Node 2
- Login to node 2.
-
If upgrading from e.g. EL 7.3 to EL 7.4, run the OS upgrade first. For example:
yum clean all yum update reboot
Refer to the operating system documentation for details on the correct procedure for upgrading between minor OS releases.
Install the EPEL Repository Definition on Node 2
- Login to node 2.
-
Install EPEL repository support:
yum -y install epel-release
Upgrade the Intel® Manager for Lustre* Agent on Node 2
- Login to node 2.
-
Install the Intel® Manager for Lustre* COPR Repository definition, which contains some dependencies for the Intel® Manager for Lustre* Agent:
yum-config-manager --add-repo \ https://copr.fedorainfracloud.org/coprs/managerforlustre/manager-for-lustre/repo/epel-7/managerforlustre-manager-for-lustre-epel-7.repo
-
Install the DNF project COPR Repository definition. DNF is a package manager, and is used as a replacement for YUM in many distributions, such as Fedora. It does not replace YUM in CentOS, but Intel® Manager for Lustre* does make use of some of the newer features in DNF for some of its tasks:
yum-config-manager --add-repo \ https://copr.fedorainfracloud.org/coprs/ngompa/dnf-el7/repo/epel-7/ngompa-dnf-el7-epel-7.repo
-
Remove or rename the old Intel® Manager for Lustre* YUM repository definition:
mv /etc/yum.repos.d/Intel-Lustre-Agent.repo \ $HOME/Intel-Lustre-Agent.repo.bak
-
Install the Intel® Manager for Lustre* Agent repository definition:
curl -o /etc/yum.repos.d/Manager-for-Lustre.repo \ --cacert /var/lib/chroma/authority.crt \ --cert /var/lib/chroma/self.crt \ --key /var/lib/chroma/private.pem \ https://<admin server>/repo/Manager-for-Lustre.repo
Replace
<admin server>
in thehttps
URL with the appropriate Intel® Manager for Lustre* hostname (normally the fully-qualified domain name). -
Upgrade the Intel® Manager for Lustre* Agent and Diagnostics packages
yum -y install chroma-\*
Note: following warning during install / update is benign and can be ignored:
... /var/tmp/rpm-tmp.MV5LGi: line 5: /selinux/enforce: No such file or directory [] { "result": null, "success": true } ...
Upgrade the Lustre* Server Software on Node 2
-
Stop ZED and unload the ZFS and SPL modules:
systemctl stop zed rmmod zfs zcommon znvpair spl
-
Remove the Lustre* ZFS and SPL packages, as they will be replaced with upgraded packages. The upgrade process will fail if these packages are not preemptively removed.
yum erase zfs-dkms spl-dkms \ lustre \ lustre-modules \ lustre-osd-ldiskfs \ lustre-osd-zfs \ lustre-osd-ldiskfs-mount \ lustre-osd-zfs-mount \ libzpool2 \ libzfs2
-
Clean up the YUM cache to remove any residual information on old repositories and package versions:
yum clean all
-
For systems that have both LDISKFS and ZFS OSDs:
Note: This is the configuration that Intel® Manager for Lustre* installs for all managed-mode Lustre* storage clusters. Use this configuration for the highest level of compatibility with Intel® Manager for Lustre*.
-
Install the Lustre*
e2fsprogs
distribution:yum --nogpgcheck --disablerepo=* --enablerepo=e2fsprogs-wc \ install e2fsprogs e2fsprogs-devel
-
If the installed
kernel-tools
andkernel-tools-libs
packages are at a higher revision than the patched kernel packages in the Lustre* server repository, they will need to be removed:yum erase kernel-tools kernel-tools-libs
-
Install the Lustre-patched kernel packages. Ensure that the Lustre* repository is picked for the kernel packages, by disabling the OS repos:
yum --nogpgcheck --disablerepo=base,extras,updates \ install \ kernel \ kernel-devel \ kernel-headers \ kernel-tools \ kernel-tools-libs \ kernel-tools-libs-devel
-
Install additional development packages. These are needed to enable support for some of the newer features in Lustre* – if certain packages are not detected by Lustre’s* configure script when its DKMS package is installed, the features will not be enabled when Lustre* is compiled. Notable in this list are
krb5-devel
andlibselinux-devel
, needed for Kerberos and SELinux support, respectively.yum install asciidoc audit-libs-devel automake bc \ binutils-devel bison device-mapper-devel elfutils-devel \ elfutils-libelf-devel expect flex gcc gcc-c++ git glib2 \ glib2-devel hmaccalc keyutils-libs-devel krb5-devel ksh \ libattr-devel libblkid-devel libselinux-devel libtool \ libuuid-devel libyaml-devel lsscsi make ncurses-devel \ net-snmp-devel net-tools newt-devel numactl-devel \ parted patchutils pciutils-devel perl-ExtUtils-Embed \ pesign python-devel redhat-rpm-config rpm-build systemd-devel \ tcl tcl-devel tk tk-devel wget xmlto yum-utils zlib-devel
-
Ensure that a persistent hostid has been generated on the machine. If necessary, generate a persistent hostid (needed to help protect zpools against simultaneous imports on multiple servers). For example:
hid=`[ -f /etc/hostid ] && od -An -tx /etc/hostid|sed 's/ //g'` [ "$hid" = `hostid` ] || genhostid
-
Reboot the node.
reboot
-
Install the metapackage that will install Lustre* and the LDISKFS and ZFS ‘kmod’ packages:
yum --nogpgcheck install lustre-ldiskfs-zfs
-
Verify that the DKMS kernel modules for Lustre* SPL and ZFS have installed correctly:
dkms status
All packages should have the status
installed
. For example:# dkms status lustre, 2.10.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed spl, 0.7.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed zfs, 0.7.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed
-
Load the Lustre* and ZFS kernel modules to verify that the software has installed correctly:
modprobe -v zfs modprobe -v lustre
-
- For systems that use only LDISKFS OSDs:
-
Upgrade the Lustre*
e2fsprogs
distribution:yum --nogpgcheck \ --disablerepo=* --enablerepo=e2fsprogs-wc install e2fsprogs
-
If the installed
kernel-tools
andkernel-tools-libs
packages are at a higher revision than the patched kernel packages in the Lustre* server repository, they will need to be removed:yum erase kernel-tools kernel-tools-libs
-
Install the Lustre* patched kernel packages. Ensure that the Lustre* repository is picked for the kernel packages, by disabling the OS repos:
yum --nogpgcheck --disablerepo=base,extras,updates \ install \ kernel \ kernel-devel \ kernel-headers \ kernel-tools \ kernel-tools-libs \ kernel-tools-libs-devel
-
Reboot the node.
reboot
-
Install the LDISKFS
kmod
and other Lustre* packages:yum --nogpgcheck install \ kmod-lustre \ kmod-lustre-osd-ldiskfs \ lustre-osd-ldiskfs-mount \ lustre \ lustre-resource-agents
-
Load the Lustre* kernel modules to verify that the software has installed correctly:
modprobe -v lustre
-
- For systems with ZFS-based OSDs:
-
Install the kernel packages that match the latest supported version for the Lustre* release:
yum install \ kernel \ kernel-devel \ kernel-headers \ kernel-tools \ kernel-tools-libs \ kernel-tools-libs-devel
It may be necessary to specify the kernel package version number in order to ensure that a kernel that is compatible with Lustre* is installed. For example, Lustre* 2.10.1 has support for RHEL kernel 3.10.0-693.2.2.el7:
yum install \ kernel-3.10.0-693.2.2.el7 \ kernel-devel-3.10.0-693.2.2.el7 \ kernel-headers-3.10.0-693.2.2.el7 \ kernel-tools-3.10.0-693.2.2.el7 \ kernel-tools-libs-3.10.0-693.2.2.el7 \ kernel-tools-libs-devel-3.10.0-693.2.2.el7
Note: If the
kernel-tools
andkernel-tools-libs
packages that have been installed on the host prior to running this command are at a higher revision than the kernel version supported by Lustre*, they will need to be removed first:yum erase kernel-tools kernel-tools-libs
Refer to the Lustre* Changelog for the list of supported kernels.
-
Install additional development packages. These are needed to enable support for some of the newer features in Lustre* – if certain packages are not detected by Lustre’s* configure script when its DKMS package is installed, the features will not be enabled when Lustre* is compiled. Notable in this list are
krb5-devel
andlibselinux-devel
, needed for Kerberos and SELinux support, respectively.yum install \ asciidoc audit-libs-devel automake bc \ binutils-devel bison device-mapper-devel elfutils-devel \ elfutils-libelf-devel expect flex gcc gcc-c++ git \ glib2 glib2-devel hmaccalc keyutils-libs-devel krb5-devel ksh \ libattr-devel libblkid-devel libselinux-devel libtool \ libuuid-devel libyaml-devel lsscsi make ncurses-devel \ net-snmp-devel net-tools newt-devel numactl-devel \ parted patchutils pciutils-devel perl-ExtUtils-Embed \ pesign python-devel redhat-rpm-config rpm-build systemd-devel \ tcl tcl-devel tk tk-devel wget xmlto yum-utils zlib-devel
-
Ensure that a persistent hostid has been generated on the machine. If necessary, generate a persistent hostid (needed to help protect zpools against simultaneous imports on multiple servers). For example:
hid=`[ -f /etc/hostid ] && od -An -tx /etc/hostid|sed 's/ //g'` [ "$hid" = `hostid` ] || genhostid
-
Reboot the node.
reboot
-
Install the packages for Lustre* and ZFS:
yum --nogpgcheck install \ lustre-dkms \ lustre-osd-zfs-mount \ lustre \ lustre-resource-agents \ zfs
-
Verify that the DKMS kernel modules for Lustre* SPL and ZFS have installed correctly:
dkms status
All packages should have the status
installed
. For example:# dkms status lustre, 2.10.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed spl, 0.7.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed zfs, 0.7.1, 3.10.0-693.2.2.el7_lustre.x86_64, x86_64: installed
-
Load the Lustre* and ZFS kernel modules to verify that the software has installed correctly:
modprobe -v zfs modprobe -v lustre
-
Start the Cluster Framework on Node 2
-
Login to node 2 and start the cluster framework as follows:
pcs cluster start
-
Take node 2 out of standby mode:
pcs cluster unstandby [<node 2>]
Note: If the node name is omitted from the command, the currently logged in node will be removed from standby mode.
-
Verify that the node is active:
pcs status
Node 2 should now be in the online state.
Rebalance the Distribution of Lustre* Resources Across all Cluster Nodes
-
Verify that Pacemaker is able to identify node 2 as a valid target for hosting its preferred resources:
pcs resource relocate show
The command output should list the set of resources that would be relocated to node 2 according to the location constraints applied to each resource.
-
Nominate a single cluster resource and move it back to node 2 to validate the upgrade node configuration:
pcs resource relocate run <resource name>
-
Verify that the resource has started successfully on node 2:
pcs status df -ht lustre
-
Relocate the remaining cluster resources to their preferred nodes:
pcs resource relocate run
Note: When the
relocate run
command is invoked without any resources in the argument list, all of the resources are evaluated and moved to their preferred nodes.
Node 2 upgrade is complete
Repeat this procedure on all Lustre* server HA cluster pairs for the file system.
* Other names and brands may be claimed as the property of others.