From 09cd2d31203acac18e1e43f159c888c3b2cdfcac Mon Sep 17 00:00:00 2001 From: Justin Miron Date: Fri, 18 Oct 2024 12:23:42 -0500 Subject: [PATCH] Make MASTERDOWN a retriable error in RedisCluster client When clusters are running with `replica-server-stale-data no`, replicas will return a MASTERDOWN error under two conditions: 1. The primary has failed and we are not serving requests. 2. A replica has just started and has not yet synced from the primary. The former, primary has failed and we are not serving requests, is similar to a CLUSTERDOWN error and should be similarly retriable. When a replica has just started and has not yet synced from the primary the request should be retried on other available nodes in the shard. Otherwise a percentage of the read requests to the shard will fail. Examples when `replica-server-stale-data no` is enabled: 1. In a cluster using `ReadOnly` with a single read replica, every read request will return errors to the client because MASTERDOWN is not a retriable error. 2. In a cluster using `RouteRandomly` a percentage of the requests will return errors to the client based on if this server was selected. --- error.go | 3 +++ 1 file changed, 3 insertions(+) diff --git a/error.go b/error.go index 9b348193..db96fd82 100644 --- a/error.go +++ b/error.go @@ -63,6 +63,9 @@ func shouldRetry(err error, retryTimeout bool) bool { if strings.HasPrefix(s, "READONLY ") { return true } + if strings.HasPrefix(s, "MASTERDOWN ") { + return true + } if strings.HasPrefix(s, "CLUSTERDOWN ") { return true }