Chapter describing minimal steps necessary to set up a Redis HA environment using sentinels - processes responsible for ensuring that Zato servers have access to Redis nodes - along with instructions for configuring Zato to take advantage of an HA configuration.
While focused on Zato, many parts of the document can be used in other contexts as well.
Key concepts:
It should be emphasized that some requests may be rejected when the failover is being performed - sentinels are means of discovering Redis nodes in order to increase HA rather than ensuring that Zato services can use Redis in a fire-and-forget manner.
By default, Zato uses Redis for statistics but the feature can be turned off, for instance, if users already integrate with NewRelic.
During development, everything can be set up on localhost provided that /etc/hosts contains the entries:
127.0.0.1 redis1
127.0.0.1 redis2
127.0.0.1 redis3
Redis 2.8.4 or newer is required.
3 hosts the Redis environment will be on are: redis1, redis2 and redis3.
TCP/IP resources used:
Address/port | Notes |
---|---|
redis1:16300 | node1 running on redis1, this will be the active (master) node initially |
redis1:16355 | sentinel1 running on redis1 |
redis2:26300 | node2 running on redis2, standby (slave) |
redis2:26355 | sentinel2 running on redis2 |
redis3:36300 | node3 running on redis3, standby (slave) |
redis3:36355 | sentinel3 running on redis3 |
6 configuration files are needed, 2 on each of the hosts, for both a Redis node and sentinel. Initially, node1 will be active while node2 and node3 will be standby.
The configuration files has been uploaded to GitHub: node1.conf, node2.conf, node3.conf, sentinel1.conf, sentinel2.conf and sentinel3.conf. They should be treated as starting points and users are encouraged to adjust them so the configuration better fits their environments.
Each of the files should be saved in its own directory that must be writable to Redis processes, including the ability for Redis to write to the config file because it is updated in run-time - configuration files are not read-only.
For quick reference, node1.conf and sentinel1.conf files are as follows, respectively:
# Host and port we will listen for requests on
bind redis1
port 16300
# Database location
dir .
# Host and port we will listen for requests on
bind redis1
port 16355
#
# Our initial master is called redis-ha, there is a quorum of 2 sentinels
# needed for failover, after 10 seconds we declare a node to
# be down in our opinion, only 1 standby (slave) node can replicate with
# new master after failover and we don't allow for the same master to be failed over
# more often than once in 3 minutes.
#
sentinel monitor redis-ha redis1 16300 2
sentinel down-after-milliseconds redis-ha 10000
sentinel parallel-syncs redis-ha 1
sentinel failover-timeout redis-ha 180000
Start all nodes followed by all sentinels. Note replication messages in standby nodes. Also note sentinels confirming which node is currently master and discovering themselves automatically (longer lines broken into separate ones for clarity).
Nodes:
redis1$ redis-server node1.conf
13 Jan 20:27:51.238 * The server is now ready to accept connections on port 16300
redis2$ redis-server node2.conf
[4636] 13 Jan 20:29:15.038 * The server is now ready to accept connections on port 26300
[4636] 13 Jan 20:29:15.039 * Connecting to MASTER redis1:16300
[...]
[4636] 13 Jan 20:29:15.110 * MASTER <-> SLAVE sync: Finished with success
redis3$ redis-server node3.conf
[4701] 13 Jan 20:30:53.650 * The server is now ready to accept connections on port 36300
[4701] 13 Jan 20:30:53.650 * Connecting to MASTER redis1:16300
[...]
[4701] 13 Jan 20:30:53.722 * MASTER <-> SLAVE sync: Finished with success
Sentinels:
redis1$ redis-server sentinel1.conf --sentinel
[4808] 13 Jan 20:34:59.681 # +monitor master redis-ha 127.0.0.1 16300 quorum 2
redis2$ redis-server sentinel2.conf --sentinel
[4839] 13 Jan 20:36:52.519 # +monitor master redis-ha 127.0.0.1 16300 quorum 2
[4839] 13 Jan 20:36:53.222 * +sentinel sentinel 127.0.0.1:16355 127.0.0.1 16355 \
@ redis-ha 127.0.0.1 16300
redis3$ redis-server sentinel3.conf --sentinel
[4953] 13 Jan 20:40:33.790 # +monitor master redis-ha 127.0.0.1 16300 quorum 2
[4953] 13 Jan 20:40:33.968 * +sentinel sentinel 127.0.0.1:16355 127.0.0.1 16355 @ \
redis-ha 127.0.0.1 16300
[4953] 13 Jan 20:40:34.560 * +sentinel sentinel 127.0.0.1:26355 127.0.0.1 26355 @ \
redis-ha 127.0.0.1 16300
Edit server.conf of each server in a Zato cluster so the relevant keys in the kvdb stanza read as follows:
use_redis_sentinels=True
redis_sentinels=redis1:16355, redis2:26355, redis3:36355
redis_sentinels_master=redis-ha
Log into web-admin and navigate to KVDB remote commands view in order to set a Redis key to a value that will be expected to exist after failover.
Stop node1 - the master - and note that after aproximately 10 seconds (down-after-milliseconds) sentinels promote another node to the role, here it was node3.
[4612] 13 Jan 22:20:46.303 # User requested shutdown...
[4612] 13 Jan 22:20:46.303 # Redis is now ready to exit, bye bye...
[4808] 13 Jan 22:20:57.501 # +switch-master redis-ha 127.0.0.1 16300 127.0.0.1 36300
In web-admin again, confirm that from Zato's perspective the connectivity to Redis still works - servers have now transparently reconnected to a new master and the key can be read:
Note in server logs error messages related to Redis, such as follows. This is expected as there is indeed a time window during which no connection to Redis is available - that is, after master goes down and before a new one is elected by sentinels.
ConnectionError: Error 111 connecting 127.0.0.1:16300. Connection refused.
<Greenlet at 0x7efef956fcd0: <bound method MoveToTargetQueues._move_to_target_queues of
<zato.server.service.internal.pubsub.MoveToTargetQueues object at 0x7efef8d6d150>>>
failed with ConnectionError
Apr 05, 2018