Jump to content
  • Hello visitors, welcome to the Hacker World Forum!

    Red Team 1949  (formerly CHT Attack and Defense Team) In this rapidly changing Internet era, we maintain our original intention and create the best community to jointly exchange network technologies. You can obtain hacker attack and defense skills and knowledge in the forum, or you can join our Telegram communication group to discuss and communicate in real time. All kinds of advertisements are prohibited in the forum. Please register as a registered user to check our usage and privacy policy. Thank you for your cooperation.

    TheHackerWorld Official

记录redis集群连接超时问题及解决方案

 Share


NedK7

Recommended Posts

 

1. 先看下日志

org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
    at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:271) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1062) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.connection.lettuce.LettuceConnection.lambda$doInvoke$4(LettuceConnection.java:919) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.connection.lettuce.LettuceInvoker$Synchronizer.invoke(LettuceInvoker.java:665) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.connection.lettuce.LettuceInvoker.just(LettuceInvoker.java:94) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.connection.lettuce.LettuceSetCommands.sMembers(LettuceSetCommands.java:149) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.connection.DefaultedRedisConnection.sMembers(DefaultedRedisConnection.java:805) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.core.DefaultSetOperations.lambda$members$10(DefaultSetOperations.java:214) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:222) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:189) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.core.DefaultSetOperations.members(DefaultSetOperations.java:214) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.data.redis.core.DefaultBoundSetOperations.members(DefaultBoundSetOperations.java:152) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    at org.springframework.session.data.redis.RedisSessionExpirationPolicy.cleanExpiredSessions(RedisSessionExpirationPolicy.java:132) ~[spring-session-data-redis-2.5.6.jar!/:2.5.6]
    at org.springframework.session.data.redis.RedisIndexedSessionRepository.cleanupExpiredSessions(RedisIndexedSessionRepository.java:424) ~[spring-session-data-redis-2.5.6.jar!/:2.5.6]
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.3.20.jar!/:5.3.20]
    at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:95) [spring-context-5.3.20.jar!/:5.3.20]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_191]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_191]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_191]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_191]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_191]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_191]
    at java.lang.Thread.run(Thread.java:748) [na:1.8.0_191]
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 minute(s)
    at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59) ~[lettuce-core-6.1.8.RELEASE.jar!/:6.1.8.RELEASE]
    at io.lettuce.core.internal.Futures.awaitOrCancel(Futures.java:246) ~[lettuce-core-6.1.8.RELEASE.jar!/:6.1.8.RELEASE]
    at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:74) ~[lettuce-core-6.1.8.RELEASE.jar!/:6.1.8.RELEASE]
    at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1060) ~[spring-data-redis-2.5.11.jar!/:2.5.11]
    ... 22 common frames omitted

2022-09-15 15:31:00.427  INFO 16828 --- [xecutorLoop-1-3] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 192.168.0.163:6692
2022-09-15 15:31:00.427  INFO 16828 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 192.168.0.163:6692
2022-09-15 15:31:00.428  WARN 16828 --- [ioEventLoop-8-4] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692
2022-09-15 15:31:00.429  WARN 16828 --- [ioEventLoop-8-5] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692
2022-09-15 15:31:00.606  INFO 16828 --- [xecutorLoop-4-7] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 192.168.0.163:6692
2022-09-15 15:31:00.608  WARN 16828 --- [ioEventLoop-9-3] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692
2022-09-15 15:31:30.527  INFO 16828 --- [xecutorLoop-1-6] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 192.168.0.163:6692
2022-09-15 15:31:30.527  INFO 16828 --- [xecutorLoop-1-7] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 192.168.0.163:6692
2022-09-15 15:31:30.530  WARN 16828 --- [ioEventLoop-8-7] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692
2022-09-15 15:31:30.532  WARN 16828 --- [ioEventLoop-8-6] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692
2022-09-15 15:31:30.706  INFO 16828 --- [xecutorLoop-4-8] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 192.168.0.163:6692
2022-09-15 15:31:30.714  WARN 16828 --- [ioEventLoop-9-4] i.l.core.protocol.ConnectionWatchdog     : Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692
2022-09-15 15:32:00.627  INFO 16828 --- [xecutorLoop-1-2] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 192.168.0.163:6692
2022-09-15 15:32:00.627  INFO 16828 --- [xecutorLoop-1-4] i.l.core.protocol.ConnectionWatchdog     : Reconnecting, last destination was 192.168.0.163:6692

从日志可以看出:Cannot reconnect to [192.168.0.163:6692]: Connection refused: /192.168.0.163:6692

*********************************************************************************************************************************************************************************************

2. 登入redis集群服务器看下redis服务状态

[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ps -ef|grep redis
root      1718     1 78 09:36 ?        04:49:50 ./redis-server *:6695 [cluster]
root      1723     1 36 09:36 ?        02:12:52 ./redis-server *:6693 [cluster]
root      1724     1 94 09:36 ?        05:49:34 ./redis-server *:6694 [cluster]
root      1726     1 94 09:36 ?        05:48:41 ./redis-server *:6696 [cluster]
root      1727     1 35 09:36 ?        02:12:15 ./redis-server *:6691 [cluster]
root      2426     1  0 09:48 ?        00:01:16 ./redis-server *:6379
root     22212 22111  0 15:44 pts/4    00:00:00 grep --color=auto redis

可以看出,6692端口的redis节点挂掉了。为什么会挂掉?

*********************************************************************************************************************************************************************************************

3. 先把6692端口对应的redis节点启动,看下内存

[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ./redis-cli -h 192.168.0.163 -p 6692 -c -a Tiye@123!
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.0.163:6692> info memory
# Memory
used_memory:5255993312
used_memory_human:4.90G
used_memory_rss:6011609088
used_memory_rss_human:5.60G
used_memory_peak:5255993312
used_memory_peak_human:4.90G
used_memory_peak_perc:100.00%
used_memory_overhead:23434632
used_memory_startup:1483784
used_memory_dataset:5232558680
used_memory_dataset_perc:99.58%
allocator_allocated:5225666392
allocator_active:6178586624
allocator_resident:6227939328
total_system_memory:33020043264
total_system_memory_human:30.75G
used_memory_lua:32768
used_memory_lua_human:32.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.18
allocator_frag_bytes:952920232
allocator_rss_ratio:1.01
allocator_rss_bytes:49352704
rss_overhead_ratio:0.97
rss_overhead_bytes:-216330240
mem_fragmentation_ratio:1.15
mem_fragmentation_bytes:786011688
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

从6692节点的redis内存可看出,内存已达到最大值:4.9G,内存占用率为100%,这是该节点挂掉的原因。

*********************************************************************************************************************************************************************************************

4. 再来看看6691、6693两个节点的内存使用情况

[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ./redis-cli -h 192.168.0.163 -p 6691 -c -a Tiye@123!
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.0.163:6691> info memory
# Memory
used_memory:312651792
used_memory_human:298.17M
used_memory_rss:317714432
used_memory_rss_human:303.00M
used_memory_peak:313941360
used_memory_peak_human:299.40M
used_memory_peak_perc:99.59%
used_memory_overhead:38853632
used_memory_startup:1483784
used_memory_dataset:273798160
used_memory_dataset_perc:87.99%
allocator_allocated:312643776
allocator_active:313520128
allocator_resident:322052096
total_system_memory:33020043264
total_system_memory_human:30.75G
used_memory_lua:32768
used_memory_lua_human:32.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:876352
allocator_rss_ratio:1.03
allocator_rss_bytes:8531968
rss_overhead_ratio:0.99
rss_overhead_bytes:-4337664
mem_fragmentation_ratio:1.02
mem_fragmentation_bytes:5103624
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:841560
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ./redis-cli -h 192.168.0.163 -p 6693 -c -a Tiye@123!
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.0.163:6693> info memory
# Memory
used_memory:313492000
used_memory_human:298.97M
used_memory_rss:318951424
used_memory_rss_human:304.18M
used_memory_peak:314344552
used_memory_peak_human:299.78M
used_memory_peak_perc:99.73%
used_memory_overhead:38814608
used_memory_startup:1483784
used_memory_dataset:274677392
used_memory_dataset_perc:88.04%
allocator_allocated:313483968
allocator_active:314449920
allocator_resident:322850816
total_system_memory:33020043264
total_system_memory_human:30.75G
used_memory_lua:32768
used_memory_lua_human:32.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:965952
allocator_rss_ratio:1.03
allocator_rss_bytes:8400896
rss_overhead_ratio:0.99
rss_overhead_bytes:-3899392
mem_fragmentation_ratio:1.02
mem_fragmentation_bytes:5500440
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:821056
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0

6691、6693两个节点的redis内存占用率也达到了99%以上,但是实际使用内存不超过300M,redis内存在集群内没有做到均衡

*********************************************************************************************************************************************************************************************

5. redis超时问题解决办法

redis超时问题的解决办法:清除6691、6692、6693这三个主节点的缓存。不过,这只是个临时的解决方案。

[root@iZ2ze0gm3scdypc0i15r8yZ redis-cluster]# ./redis-cli -h 192.168.0.163 -p 6691 -c -a Tiye@123!
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
192.168.0.163:6691> flushall
OK

*********************************************************************************************************************************************************************************************

6. 目前问题及个人认为的最佳解决方案

6.1 目前存在两个问题

  ①redis内存消耗没有在集群内部达到均衡;②重启redis服务,缓存没有清除。

6.2 个人认为的最佳解决方案

  查看redis.conf配置文件,最大内存配置( redis.conf默认配置)为:# maxmemory <bytes>       

  redis服务器总内存大小为32G,根据经验(之前在上海的某个项目,设置maxmemory为固定值,集群内部所有节点的内存消耗均衡),可以给6个redis节点都分配4G内存。即: maxmemory 4096M

重启redis集群。

 

***************后续待观察,redis集群最大内存是否达到均衡,redis服务是否还会挂掉*************************

Link to post
Link to comment
Share on other sites

 Share

discussion group

discussion group

    You don't have permission to chat.
    • Recently Browsing   0 members

      • No registered users viewing this page.
    ×
    ×
    • Create New...