Introduction
Warm Standby Replication is a resilience and continuity strategy designed for systems utilizing RabbitMQ clusters, particularly setup within kubernetes. This approach involves the continuous replication or copying of data, including schema definitions and messages, from a primary (upstream) RabbitMQ cluster to a standby (downstream) cluster. The core aim of Warm Standby Replication is to ensure minimal downtime and data loss in the event of a failure within the primary cluster.
Note: WSR feature is available, only in commercial version of RabbitMQ
For most updated information and advanced configuration, please refer to Warm Standby Replication-Tanzu Docs
Understanding Warm Standby Replication
Let’s visualize a scenario within a RabbitMQ environment with 2 clusters: Cluster-A (active, primary, or upstream) bustling with activity, and Cluster-B (passive, standby, or downstream) prepared for disaster recovery. This setup ensures business continuity and data resilience through warm replication and synchronization processes.
Behind-the-Scenes Connection and Synchronization
Cluster-B
Discreetly linked to Cluster-A, Cluster-B is an unseen yet crucial component for the disaster recovery strategy. It silently maintains a state of readiness by mirroring crucial schema and data from Cluster-A. Cluster-B is tasked with two primary synchronization duties.
(1) Schema Replication
It mirrors all changes made to the cluster’s schema, which includes configured virtual hosts, it’s related users, permissions, queues, and bindings. Notably, direct message contents are excluded from this replication.
(2) Selective Message Replication: Messages that are published to a specifically configured set of quorum queues are replicated, but stored aside outside of the queues.
Note: As of the GA releases dated February 16, 2024, this replication works exclusively on quorum queues.
Disaster Recovery and Cluster Promotion
(1) Promotion of Cluster-B
In the event of Cluster-A facing a catastrophic failure, the responsibility falls to a human operator to designate Cluster-B as the new primary cluster by following Promoting the Downstream (Standby) Cluster for Disaster Recovery. This crucial action ensures the continuation of services with minimal disruption.
(2) Configurable Recovery
The promotion process involves applying predetermined configuration parameters, such as specific time-frame and any excluded virtual hosts. This ensures that messages from the configured queues within the defined time-frame are accurately replayed and republished into the replicated queues of Cluster-B, now the new primary.
Post-Disaster Continuity
(1) Re-establishing the Standby Setup
Once Cluster-B ascends to the primary position, the focus shifts to creating and configuring new standby cluster(s) as downstream backups, ensuring the resilience cycle is perpetuated.
(2) Restoration and Role Reversal
When Cluster-A is brought back online and restored to operational status, it can seamlessly transition into the standby role, ready to step in should the need arise again or optionally, can be restored as primary cluster.
Getting Started with Configuration
Prerequisites
At least two RabbitMQ clusters deployed using the Cluster Operator, where one can be used as upstream and the other(s) as downstream(s). For detailed installation instructions, refer to:
- Installing VMware RabbitMQ on Kubernetes, if you’re working with VMware RabbitMQ.
Note: For effective disaster recovery, it is advisable to install the standby RabbitMQ cluster in a Kubernetes cluster that is located in a different zone or location. This geographic distribution helps ensure that the system remains operational even if one location experiences an outage or other disaster scenarios. By doing so, you enhance the resilience and reliability of your infrastructure.
Required operator privileges on the kubernetes cluster for installation.
Cluster Essentials for VMware Tanzu installed in the Kubernetes cluster.
Carvel Tools installed
Kubectl installed
kapp installed
Assumptions
Before getting further down, let’s assume that the clusters are named as rabbitmq-a (primary) and rabbitmq-b (standby) within namespaces rabbitmq-clusters-a and rabbitmq-clusters-b respectively.
Summary
Below are the summary of what we will be doing next, to configure replication on each of the cluster (primary and standby). Below will be repeated on both the clusters, but with respective configurations.
- Configure
additional plugins
andadditional config
- Create Replication
User
andPermissions
- Configure
Schema Replication
- Configure
Standby Replication
Finally, perform testing to make sure that replication works as expected.
Configure Primary Cluster for Warm Standby Replication
Let’s configure the Primary Cluster first, verify and then configure the Standby Cluster.
Configure additional plugins
and additional config
Add the below plugins to the RabbitMQ cluster.
rabbitmq:
spec:
rabbitmq:
additionalPlugins:
- rabbitmq_stream
- rabbitmq_stream_management
- rabbitmq_schema_definition_sync
- rabbitmq_schema_definition_sync_prometheus
- rabbitmq_standby_replication
Similarly, add the below configuration to additionalConfig
section.
rabbitmq:
spec:
rabbitmq:
additionalConfig: |
schema_definition_sync.operating_mode = upstream
standby.replication.operating_mode = upstream
standby.replication.retention.size_limit.messages = 5000000000
schema_definition_sync.downstream.default_amqps_port = 5672
standby.replication.downstream.default_stream_protocol_port_without_tls = 5552
Make sure to apply these changes and wait for all the RabbitMQ nodes to be configured.
Create Replication User & Permissions
The below command will apply a combined configuration, creating the Kubernetes secret for storing the RabbitMQ replicator user’s credentials, defining the RabbitMQ user itself, and establishing the necessary permissions for both schema and standby replication for each Vhost in Primary Cluster rabbitmq-clusters-a
. To know more about users and permissions, please scroll to the end and refer to Additional Info
section.
Note (1): It’s crucial to ensure that all these resources are created within the namespace where your RabbitMQ cluster is located.
Note (2): Please make sure to replace
<rabbitmq replicator password>
with the right password.
Note (3): Add seperate permission for each Vhost. in this example, only default
/
vhost is considered.
kapp deploy -a rabbitmq-replicator-a -y -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-replicator-secret
namespace: rabbitmq-clusters-a
type: Opaque
stringData:
username: rabbitmq-replicator-user
password: <rabbitmq replicator password>
---
apiVersion: rabbitmq.com/v1beta1
kind: User
metadata:
name: rabbitmq-replicator-user
namespace: rabbitmq-clusters-a
spec:
rabbitmqClusterReference:
name: rabbitmq-a
namespace: rabbitmq-clusters-a
importCredentialsSecret:
name: rabbitmq-replicator-secret
namespace: rabbitmq-clusters-a
---
apiVersion: rabbitmq.com/v1beta1
kind: Permission
metadata:
name: rabbitmq-replicator.rabbitmq-schema-definition-sync.all
namespace: rabbitmq-clusters-a
spec:
vhost: rabbitmq_schema_definition_sync
userReference:
name: rabbitmq-replicator-user
namespace: rabbitmq-clusters-a
permissions:
write: ".*"
configure: ".*"
read: ".*"
rabbitmqClusterReference:
name: rabbitmq-a
namespace: rabbitmq-clusters-a
---
apiVersion: rabbitmq.com/v1beta1
kind: Permission
metadata:
name: rabbitmq-replicator.default.all
namespace: rabbitmq-clusters-a
spec:
vhost: "/"
userReference:
name: rabbitmq-replicator-user
namespace: rabbitmq-clusters-a
permissions:
write: ".*"
configure: ".*"
read: ".*"
rabbitmqClusterReference:
name: rabbitmq-a
namespace: rabbitmq-clusters-a
EOF
Now that user and permissions configured for primary cluster, we need to configure the replicators.
Configure Schema Replication
Create the SchemaReplication
resource by running the below command in the Primary Cluster. Before executing, replace 1.0.0.0
ipaddress with the primary cluster’s kubernetes service’s external IP. You can obtain this using command kubectl get svc rabbitmq -n rabbitmq-clusters -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}"
kapp deploy -a rabbitmq-schema-replication-a -y -f - <<EOF
apiVersion: rabbitmq.com/v1beta1
kind: SchemaReplication
metadata:
name: rabbitmq-schema-replication
namespace: rabbitmq-clusters-a
spec:
endpoints: "1.0.0.0:5672"
upstreamSecret:
name: rabbitmq-replicator-secret
namespace: rabbitmq-clusters-a
rabbitmqClusterReference:
name: rabbitmq-a
namespace: rabbitmq-clusters-a
EOF
Now that we have the Schema Replication configured, we have the option to create the Users
and Vhosts
for fine grained access, using the Topology operator. Below example will create the default vhost /
using the Topology operator.
Important thing to note here is the tag named standby_replication
which is the identifier for the schema replicator to pick the vhost for replication.
kapp deploy -a rabbitmq-vhosts-a -y -f - <<EOF
apiVersion: rabbitmq.com/v1beta1
kind: Vhost
metadata:
name: default
namespace: rabbitmq-clusters-a
spec:
name: "/"
defaultQueueType: quorum
tags:
- default
- "standby_replication"
rabbitmqClusterReference:
name: rabbitmq-a
namespace: rabbitmq-clusters-a
EOF
Make sure that the default vhost /
and the ones added (if any) are created. This once again ensures that the Topology operator is working as expected.
Configure Standby Replication
Create the StandbyReplication
resource by running the below command in the Primary Cluster.
Note (1): Operating Mode is set as
upstream
for the primary cluster
Note (2): Replication policies must be configured for each
Vhost
, which needs message replication.
kapp deploy -a rabbitmq-standby-replication-a -y -f - <<EOF
apiVersion: rabbitmq.tanzu.vmware.com/v1beta1
kind: StandbyReplication
metadata:
name: rabbitmq-standby-replication
namespace: rabbitmq-clusters-a
spec:
operatingMode: "upstream"
upstreamModeConfiguration:
replicationPolicies:
- name: all-quorum-queues-in-default
pattern: "^.*"
vhost: "/"
rabbitmqClusterReference:
name: rabbitmq-a
namespace: rabbitmq-clusters-a
EOF
Now that we have both Primary and Standby replication configured in Primary Cluster configured, we can start configuring the Standby Cluster for replication.
Configure Standby Cluster for Warm Standby Replication
Let’s configure the Standby Cluster by following the instructions below.
Configure additional plugins
and additional config
Add the below plugins to the RabbitMQ cluster.
rabbitmq:
spec:
rabbitmq:
additionalPlugins:
- rabbitmq_stream
- rabbitmq_stream_management
- rabbitmq_schema_definition_sync
- rabbitmq_schema_definition_sync_prometheus
- rabbitmq_standby_replication
Similarly, add the below configuration to additionalConfig
section.
rabbitmq:
spec:
rabbitmq:
additionalConfig: |
schema_definition_sync.operating_mode = downstream
standby.replication.operating_mode = downstream
schema_definition_sync.downstream.locals.users = ^default_user_
schema_definition_sync.downstream.locals.global_parameters = ^standby
schema_definition_sync.downstream.minimum_sync_interval = 15
standby.replication.retention.size_limit.messages = 5000000000
Make sure to apply these changes and wait for all the RabbitMQ nodes to be configured.
Create Replication Users & Permissions
This is very similar to the above (primary cluster), where we created the users and permissions, except for the namespace and RabbitMQ cluster name.
The below command will apply a combined configuration, creating the Kubernetes secret for storing the RabbitMQ replicator user’s credentials, defining the RabbitMQ user itself, and establishing the necessary permissions for both schema and standby replication for each Vhost in Standby Cluster rabbitmq-clusters-b
.
Note (1): It’s crucial to ensure that all these resources are created within the namespace where your RabbitMQ cluster is located.
Note (2): Please make sure to replace
<rabbitmq replicator password>
with the right password.
Note (3): Add seperate permission for each Vhost. in this example, only default
/
vhost is considered.
kapp deploy -a rabbitmq-replicator-b -y -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
name: rabbitmq-replicator-secret
namespace: rabbitmq-clusters-b
type: Opaque
stringData:
username: rabbitmq-replicator-user
password: <rabbitmq replicator password>
---
apiVersion: rabbitmq.com/v1beta1
kind: User
metadata:
name: rabbitmq-replicator-user
namespace: rabbitmq-clusters-b
spec:
rabbitmqClusterReference:
name: rabbitmq-b
namespace: rabbitmq-clusters-b
importCredentialsSecret:
name: rabbitmq-replicator-secret
namespace: rabbitmq-clusters-b
---
apiVersion: rabbitmq.com/v1beta1
kind: Permission
metadata:
name: rabbitmq-replicator.rabbitmq-schema-definition-sync.all
namespace: rabbitmq-clusters-b
spec:
vhost: rabbitmq_schema_definition_sync
userReference:
name: rabbitmq-replicator-user
namespace: rabbitmq-clusters-b
permissions:
write: ".*"
configure: ".*"
read: ".*"
rabbitmqClusterReference:
name: rabbitmq-b
namespace: rabbitmq-clusters-b
---
apiVersion: rabbitmq.com/v1beta1
kind: Permission
metadata:
name: rabbitmq-replicator.default.all
namespace: rabbitmq-clusters-b
spec:
vhost: "/"
userReference:
name: rabbitmq-replicator-user
namespace: rabbitmq-clusters-b
permissions:
write: ".*"
configure: ".*"
read: ".*"
rabbitmqClusterReference:
name: rabbitmq-b
namespace: rabbitmq-clusters-b
EOF
Now that user and permissions configured for standby cluster, we need to configure the replicators.
Configure Schema Replication
Create the SchemaReplication
resource by running the below command in the Standby Cluster. Before executing, replace 1.0.0.0
ipaddress with the active cluster’s kubernetes service’s external IP. You can obtain this using command kubectl get svc rabbitmq -n rabbitmq-clusters -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}"
kapp deploy -a rabbitmq-schema-replication-b -y -f - <<EOF
apiVersion: rabbitmq.com/v1beta1
kind: SchemaReplication
metadata:
name: rabbitmq-schema-replication
namespace: rabbitmq-clusters-b
spec:
endpoints: "1.0.0.0:5672"
upstreamSecret:
name: rabbitmq-replicator-secret
namespace: rabbitmq-clusters-b
rabbitmqClusterReference:
name: rabbitmq-b
namespace: rabbitmq-clusters-b
EOF
Configure Standby Replication
Create the StandbyReplication
resource by running the below command in the Standby Cluster. Before executing, replace 1.0.0.0
ipaddress with the active cluster’s kubernetes service’s external IP. You can obtain this using command kubectl get svc rabbitmq -n rabbitmq-clusters -o jsonpath="{.items[0].status.loadBalancer.ingress[0].ip}"
Note (1): Operating Mode is set as
downstream
for the standby cluster
kapp deploy -a rabbitmq-standby-replication-b -y -f - <<EOF
apiVersion: rabbitmq.tanzu.vmware.com/v1beta1
kind: StandbyReplication
metadata:
name: rabbitmq-standby-replication
namespace: rabbitmq-clusters-b
spec:
operatingMode: "downstream"
downstreamModeConfiguration:
endpoints: "1.0.0.0:5552"
upstreamSecret:
name: rabbitmq-replicator-secret
namespace: rabbitmq-clusters-b
rabbitmqClusterReference:
name: rabbitmq-b
namespace: rabbitmq-clusters-b
EOF
Now that we have all the configurations complete, make sure to check the RabbitMQ logs for any errors, and fix them incase of any.
Testing
Schema Replication
Simply create a queue in the primary cluster, under a configured vhost (/
default in this case) and check if it appears in the standby cluster. Please give few seconds for the replicated schema to appear in the standby cluster.
Message Replication
Once schema replication is tested successfully, publish some messages in the queue. Then, follow the instructions from Verifying Warm Standby Replication is Configured Correctly to verify if standby replication is successful.
Additional Info (Optional Read)
Users & Permissions
As seen above, to set up replication between an primary and a standby clusters, we need to ensure that certain secrets, users and permissions are properly established on both clusters. Below is a detailed walkthrough of each of them.
v1/Secret
To configure a RabbitMQ user with the required permissions for replication tasks within a Kubernetes environment, the first step involves creating a Kubernetes secret. This secret will securely store the replication username and password, which ensures that sensitive information is handled securely and aligns with the recommended practices for managing credentials in Kubernetes deployments. Refer to below yaml configuration with apiVersion: v1
and kind: Secret
for exact definition.
rabbitmq.com/v1beta1/User
The next step involves defining a RabbitMQ user specifically for replication tasks across RabbitMQ clusters. This user utilizes the previously created secret for authentication, ensuring secure and efficient replication operations. The configuration with apiVersion: rabbitmq.com/v1beta1
and kind: User
detailed below outlines how to set up this user which will be created by the Messaging Topology Operator.
rabbitmq.com/v1beta1/Permission for Schema Replication
Next is to assign the user with required schema replication permissions. These permissions enable the user to effectively manage and synchronize schema definitions. The first configuration provided below with apiVersion: rabbitmq.com/v1beta1
, kind: Permission
and name: rabbitmq-replicator.rabbitmq-schema-definition-sync.all
is crafted to establish the necessary permissions for the replicator user, for performing schema replication tasks. This setup is implemented by granting full (.*) write, configure, and read permissions on the rabbitmq_schema_definition_sync
virtual host.
rabbitmq.com/v1beta1/Permission for Standby Replication
Next is to assign the user with required standby replication permissions.These permissions are critical for enabling the user to replicate data across different clusters, specifically targeting a particular virtual host. The second configuration detailed below apiVersion: rabbitmq.com/v1beta1
, kind: Permission
and name: rabbitmq-replicator.default.all
is designed to set up these essential permissions “full (.*) write, configure, and read” for the replicator user on the default virtual host “/”. If you have additional virtual hosts to configure replication, you must create a kind: Permission
for each, specifying the appropriate spec:vhost
. This too, leverages the Messaging Topology Operator to configure the permissions in the cluster.
Github
TODO: Will be publishing a github repo soon.
Hope you had fun coding!
comments powered by Disqus