VERITAS CLUSTER SERVER™
- Support for a wide range of applications, SAN configurations and client/server configurations
- VERITAS Cluster Server™ protects a wide variety of applications in UNIX and Windows clusters ranging from a single database instance to large, multi-application clusters in networked storage environments.
- Using Cluster Server, administrators can construct availability clusters of up to 32 nodes, optimising existing resources while protecting critical applications.
- Administrators can define multiple policy-based failover scenarios to meet individual application uptime requirements.
- Administrators can manage availability and performance proactively with intelligent workload management and ad-hoc failover for administrative tasks.
Clusters
- A VCS cluster consists of multiple systems connected in various combinations to shared storage devices. All systems within a cluster have the same cluster ID, and are connected by redundant private networks over which they communicate by heartbeats, signals sent periodically from one system to another.
- Within a single VCS cluster, all nodes must run on the same operating system.
Common Configurations
Asymmetric
- Application runs on a primary server
- Secondary, or "backup," server is present to take over when the primary fails.
- Backup server is passive, meaning it is not configured to perform any other functions.
In the following illustration, a database application is moved, or "failed over," from the primary
to the backup. Notice the IP address used by the clients moves as well. If IP addresses were not moved, all clients would have to be updated on each server failover.
Symmetric
- Each server is configured to run a specific application or service
- Each server provides backup for its peer.
In the example below, the file server fails and its peer takes on both roles. Notice the surviving peer server now has two IP addresses assigned.
- When a server fails in this configuration, performance level remains acceptable for the short time it takes to restore the server.
Advanced Configurations
N + 1
- In advanced N + 1 configurations, an extra, or "spare," server in the cluster provides additional horsepower only.
- When a primary server fails, the application restarts on the spare. When the original, primary server is repaired, it then becomes the spare server.
- This configuration eliminates the need for a second application failure to fail back the service group to the primary system.
- Any server can act as the spare to any other server.
- This allows clusters of eight or more nodes to use a single spare server.
- Cascading failover can also accommodate multiple server failures; however, this requires thorough testing
- and planning.
N-to-N
N-to-N clustering is at the core of HA architecture supporting multiple applications.
N-to-N refers to multiple service groups running on multiple servers, with each service group capable of being failed over to different servers in the cluster. For example, consider a four-node cluster with each node supporting three critical database instances. If any node fails, each instance is started on a different node, ensuring no single node becomes overloaded. This configuration is a logical evolution of N + 1: it provides for the cluster standby capacity instead of a standby server. In an N-to-N configuration, cascading failover is also possible. The benefit of an N-to-N configuration is early testing of application inoperability. Configuring cascading failover is simply a matter of determining additional load each server can handle.
Cluster Communications
- Cluster communications ensure VCS is continuously aware of the status of each system's service groups and resources.
- They enable VCS to recognize which systems are active members of the cluster, which are joining or leaving the cluster, and which have failed.
- On each cluster system, agents monitor the status of resources and communicate the status to the high-availability daemon, "HAD."
- HAD then communicates the status on the local system to other systems in the cluster via the Group Membership Services/Atomic Broadcast protocol (GAB) and the Low Latency Transport (LLT).
Key Components of Cluster Communications
Group Membership Services/Atomic Broadcast (GAB)
- GAB is the mechanism for monitoring cluster memberships, tracking cluster state, and distributing the information to cluster systems.
- In VCS, cluster membership is defined as all systems configured with the same cluster ID interconnected via a pair of redundant heartbeat networks.
- During standard operation, all systems configured as part of the physical cluster during system installation are actively participating in cluster communications.
- Cluster membership enables VCS to dynamically track the entire cluster topology.
- Systems join a cluster by issuing a "Cluster Join" message during GAB startup.
- Cluster membership is maintained by the use of heartbeats, signals sent periodically from one system to another to verify the systems are active.
- When systems stop receiving heartbeats from a peer for the interval specified in the Heartbeat Timeout attribute, the system in question is marked DOWN and excluded from the cluster. Its applications are then migrated to the other systems.
- Cluster state refers to tracking the status of all resources and groups in the cluster. This is the function of the atomic broadcast capability of GAB. Atomic broadcast ensures all systems within the cluster are immediately notified of changes in resource status, cluster membership, and configuration.
- Atomic means all systems receive updates, or are "rolled back" to the previous state, much like a database atomic commit. If a failure occurs while transmitting status changes, GAB's atomicity ensures that upon recovery, all systems have the same information regarding the status of any monitored resource in the cluster.
HAD
- The high-availability daemon, HAD, is the primary process running on each system and is sometimes referred to as the "VCS engine."
- It receives information from various agents regarding resources on the local system and forwards the information to each member system.
- It receives information from other cluster members, which it uses to update its own "view" of the cluster. HAD is monitored and, when required, restarted by a process called "hashadow," which also runs on each system in the cluster.
Low Latency Transport (LLT)
- LLT is the mechanism that provides communications between systems in a cluster.
- It provides fast, kernel-to-kernel communications, and monitors network connections.
- It serves as a replacement for the IP stack on systems, and runs directly on top of the Data Link Protocol Interface (DLPI) layer on UNIX, and the Network Driver Interface Specification (NDIS) on Windows. Using LLT rather than IP removes latency and overhead associated with the IP stack, and ensures that events such as state changes are reflected more quickly.
- LLT distributes ("load-balances") internode communication across private network links. i.e cluster state information is evenly distributed across all private network links (maximum eight) to ensure performance and fault resilience. When a link fails, traffic is redirected to remaining links.
- LLT is also responsible for sending and receiving heartbeat traffic over network links. The frequency of heartbeats can be set in the file /etc/llttab. Heartbeats are used to determine the "health" of nodes in the cluster.
- LLT also informs GAB if communications to a peer are reliable or unreliable. A peer connection is reliable if more than one network link exists between them.
- LLT monitors multiple links and routes network traffic over the surviving links.
- For reliable communication to work, it is critical the networks fail independently. LLT supports multiple independent links between systems. Using different interfaces and connecting infrastructure reduces the risk of simultaneous link failure and increases overall reliability.
LLT's Low Priority Link
- LLT can be configured to use a low-priority network link as a backup to standard heartbeat channels.
- Low-priority links are typically configured on the customer's public network or administrative network. T
- he low-priority link is not used for cluster membership traffic until it is the only remaining link.
- During standard operation, the low-priority link carries heartbeat traffic for cluster membership and link state maintenance only.
- The frequency of heartbeats decreases 50 percent to reduce network overhead. When the low-priority link is the sole network link remaining, LLT switches all cluster status traffic to the low-priority link.
- Upon repair of any configured private link, LLT returns cluster status traffic to the high-priority link.
Veritas Service Groups and Resources
Service Groups
- Service groups mark the primary difference between first and second-generation high-availability (HA) packages.
- Early systems used the entire server as the granularity level for failover: if an application failed, all applications were migrated to a second system.
- Second-generation HA packages, such as VCS, greatly increase the granularity of application control.
- This smaller container for applications and associated resources is called a service group.
For example, a service group for a Web application may consist of:
- _ disk groups on which Web pages are stored
- _ a volume built in the disk group
- _ a file system using the volume
- _ a database whose table spaces are files and whose rows contain page pointers
- _ network interface cards to export the Web service
- _ one or more IP addresses associated with the network cards
- _ the application program and associated code libraries
- VCS performs administrative operations on resources at the service group level, including starting, stopping, restarting, and monitoring.
- When a service group is brought online, all resources within the group are also brought online.
- When a failover occurs in VCS, resources never fail over individually: the entire service group containing the resource fails over as a unit.
- If there is more than one group defined on a server, one group may fail over without affecting the other groups.
Additionally:
- If a service group is to run on a particular server, all of the group's required resources must be available to the server.
- Resources within a service group have dependencies; that is, some resources, such as volumes, must be operational before other resources, such as the file system, can become operational.
Failover Groups
Failover groups are used for many application services, such as most databases and NFS servers. HAD assures that a failover service group is online only, partially online, or in a state other than OFFLINE (such as attempting to go online or attempting to go offline) on only one system at a time.
Parallel Groups
Parallel groups are used far less frequently than failover groups, and are more complex. They require applications that can be started safely on multiple systems, without threat of data corruption. They also require that applications running multiple instances allow all instances access to the same data.
Resources and Resource Types
- Resources are hardware or software entities, such as network interface cards (NICs), IP addresses, applications, and databases, that are brought online, taken offline, or monitored by VCS.
- Each resource is identified by a unique name. Resources with similar characteristics are known collectively as a resource type; for example, two IP resources are both classified as type IP.
- How VCS starts and stops a resource is specific to the resource type. An IP resource is started by assigning the IP address to a NIC.
- Monitoring a resource means testing it to determine if it is online or offline.
- How VCS monitors a resource is also specific to the resource type. E.g an IP address tests as online if it is present in the IP address of the NIC.
Resource Categories
- There are three categories of resources in VCS: On-Off, On-Only, and Persistent.
- Most resources are On-Off, meaning VCS starts and stops them as required. For example, VCS
- assigns the IP address to the specified NIC and removes the assigned IP when the associated service group is taken offline.
- Other resources may also be required by VCS and external applications; for example, NFS daemons. VCS requires NFS daemons to be running to export a file system.
- There may also be other file systems exported locally, outside VCS control. The NFS resource is an On-Only resource, meaning that VCS starts the daemons if required, but does not stop them if the associated service group is taken offline.
- An On-Only resource is brought online when required by VCS, but is not taken offline when the associated service group is taken offline.
- A Persistent resource cannot be brought online or taken offline, yet VCS requires the resource to be present in the configuration. For example, a NIC cannot be started or stopped, but it is required to configure an IP address.
- VCS monitors Persistent resources to ensure their status and operation.
Resource Dependencies
- One of the most important concepts of the service group definition is resource dependencies.
- When a service group is brought online or taken offline, the resource dependencies within the group specify the order in which the resources are brought online and taken offline. For example, a VERITAS Volume Manager™ disk group must be imported before volumes in the disk group are started, and volumes must be started before file systems are mounted. Conversely, file systems must be unmounted before volumes are stopped, and volumes must be stopped before disk groups are deported.
- In VCS terminology, resources are categorized as parents or children, depending on how they are configured.
- Diagramming the relationship between them forms a graph. Parent resources appear at the top of the "arcs" that connect them to their child resources.
- Typically, child resources are brought online before parent resources, and parent resources are taken offline before child resources. Resources must adhere to the established order of dependency.
- The dependency graph is an easy way to document resource dependencies within a service group. The following figure shows a resource dependency graph for a cluster service.
Cluster Configuration and Operation
LLT
- To configure LLT, /etc/llttab configuration file set up on each system in the cluster.
- Each /etc/llttab file must specify the system's ID number, the network interfaces to use, and other directives.
- The following example shows a simple llttab with minimum directives.
set-node 1
link qfe0 /dev/qfe:0 - ether - -
link qfe1 /dev/qfe:1 - ether - -
start
GAB
- To configure GAB, /etc/gabtab configuration file set up on each system in the cluster.
- Each /etc/gabtab file must specify the number of systems in the cluster.
- The following example shows a simple gabtab:
gabconfig -c -n 2
- This configuration has no communication disks and expects two systems to join before VCS is seeded automatically.
Resource Configuration
/etc/VRTSvcs/conf/config/main.cf
/etc/VRTSvcs/conf/config/types.cf
/opt/VRTS/bin/AGENT/
Starting VCS
Start LLT
Lltstat ?
Start GAB
?
Start VCS
Hastart
Hastatus
Stopping VCS
Hastop
Hastatus
Resource/Cluster Status / Manipulation
Hagrp
Hares
Haclus
Hasys
Veritas Troubleshooting and Logs
Agent Development and Testing
- Each VCS agent manages resources of a particular type within a highly available cluster environment.
- An agent typically brings resources online, takes resources offline, and monitors resources to determine their state.
- Agents packaged with VCS are referred to as bundled agents. Examples of bundled agents include Share, IP (Internet Protocol) and NIC (Network Interface Card) agents.
- Additional agents can be developed easily using the VCS agent framework included in the VCS package.
- A single VCS agent can monitor multiple resources of the same resource type on one host. For example, The NIC agent manages all NIC resources.
- When the VCS engine had comes up on a system, it automatically starts the required agents according to the type of configuration.
- When an agent is started, it "pulls" the necessary configuration information from had. It then periodically monitors the resources and updates had with their status.
- The agent also carries out online and offline commands received from had. If an agent crashes or hangs, had detects it and restarts the agent.
No comments:
Post a Comment