Skip to content

Commit afda007

Browse files
[ISSUE #15225] Fix regression of #9461: NPE in ConfigRpcTransportClient.checkListenCache (#15226)
* test(client): add regression test for NPE in checkListenCache when server returns unknown key Reproduces the same bug class fixed by #9461: when the server returns a changeKey in ConfigChangeBatchListenResponse that has been concurrently removed from the local cacheMap (e.g. user has just called removeListener), the unsafe pre-dereference cacheMap.get().get(changeKey).isInitializing() throws NPE. The exception is swallowed by the surrounding catch (Throwable) and only surfaces as "Execute listen config change error" log noise, but the batch processing of the remaining changeKeys is silently aborted. The new test ClientWorkerTest#testCheckListenCacheSkipsRemovedKey asserts that no NPE-induced batch abort happens when the server returns an unknown changeKey. Without the fix the test fails because the swallowed NPE leaves the local cache's isInitializing() flag unchanged. Assisted-by: Claude Code Signed-off-by: wushiyuanmaimob <wushiyuanwork@outlook.com> * fix(client): regression of NPE in ConfigRpcTransportClient.checkListenCache (re-introduces #9461) The original fix in #9461 (milestone 2.2.0, commit c8c64a1) moved the notify computation into refreshContentAndCheck so that the existing cache != null guard would protect both call sites. When the gRPC long- connection variant ConfigRpcTransportClient was introduced in 3.x, a fresh copy of refreshContentAndCheck(RpcClient, String, boolean) was added — its body has a defensive null-check, but the two callers re-introduced the unsafe pattern cacheMap.get().get(changeKey).isInitializing(), defeating the protection. Refactor refreshContentAndCheck(RpcClient, String, boolean) to refreshContentAndCheck(RpcClient, String) so notify = !cache.isInitializing() is computed inside the method, under the existing cache != null guard. Take a single cacheMap.get() snapshot to avoid duplicate volatile reads and replace containsKey + get double-lookup with a single get + null check. Assisted-by: Claude Code Signed-off-by: wushiyuanmaimob <wushiyuanwork@outlook.com> --------- Signed-off-by: wushiyuanmaimob <wushiyuanwork@outlook.com>
1 parent 1a5386f commit afda007

2 files changed

Lines changed: 107 additions & 15 deletions

File tree

client/src/main/java/com/alibaba/nacos/client/config/impl/ClientWorker.java

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1048,11 +1048,17 @@ private ExecutorService ensureSyncExecutor(String taskId) {
10481048
}));
10491049
}
10501050

1051-
private void refreshContentAndCheck(RpcClient rpcClient, String groupKey, boolean notify) {
1052-
if (cacheMap.get() != null && cacheMap.get().containsKey(groupKey)) {
1053-
CacheData cache = cacheMap.get().get(groupKey);
1054-
refreshContentAndCheck(rpcClient, cache, notify);
1051+
private void refreshContentAndCheck(RpcClient rpcClient, String groupKey) {
1052+
Map<String, CacheData> cacheMapSnapshot = cacheMap.get();
1053+
if (cacheMapSnapshot == null) {
1054+
return;
1055+
}
1056+
CacheData cache = cacheMapSnapshot.get(groupKey);
1057+
if (cache == null) {
1058+
return;
10551059
}
1060+
boolean notify = !cache.isInitializing();
1061+
refreshContentAndCheck(rpcClient, cache, notify);
10561062
}
10571063

10581064
private void refreshContentAndCheck(RpcClient rpcClient, CacheData cacheData,
@@ -1174,10 +1180,7 @@ private boolean checkListenCache(Map<String, List<CacheData>> listenCachesMap)
11741180
changeConfig.getDataId(),
11751181
changeConfig.getGroup(), changeConfig.getTenant());
11761182
changeKeys.add(changeKey);
1177-
boolean isInitializing =
1178-
cacheMap.get().get(changeKey).isInitializing();
1179-
refreshContentAndCheck(rpcClient, changeKey,
1180-
!isInitializing);
1183+
refreshContentAndCheck(rpcClient, changeKey);
11811184
}
11821185

11831186
}
@@ -1188,10 +1191,7 @@ private boolean checkListenCache(Map<String, List<CacheData>> listenCachesMap)
11881191
cacheData.group,
11891192
cacheData.getTenant());
11901193
if (!changeKeys.contains(changeKey)) {
1191-
boolean isInitializing =
1192-
cacheMap.get().get(changeKey).isInitializing();
1193-
refreshContentAndCheck(rpcClient, changeKey,
1194-
!isInitializing);
1194+
refreshContentAndCheck(rpcClient, changeKey);
11951195
}
11961196
}
11971197
}

client/src/test/java/com/alibaba/nacos/client/config/impl/ClientWorkerTest.java

Lines changed: 95 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -595,9 +595,6 @@ public void receiveConfigInfo(String configInfo) {
595595
Map<String, CacheData> cacheDataMapMocked = Mockito.mock(Map.class);
596596
Mockito.when(cacheDataMapMocked.get(GroupKey.getKeyTenant(dataIdNormal, group, tenant)))
597597
.thenReturn(cacheNormal);
598-
Mockito.when(
599-
cacheDataMapMocked.containsKey(GroupKey.getKeyTenant(dataIdNormal, group, tenant)))
600-
.thenReturn(true);
601598

602599
Mockito.when(cacheDataMapMocked.values()).thenReturn(cacheDatas);
603600
AtomicReference<Map<String, CacheData>> cacheMapMocked =
@@ -643,6 +640,101 @@ public void receiveConfigInfo(String configInfo) {
643640

644641
}
645642

643+
/**
644+
* Regression test for NPE in ConfigRpcTransportClient.checkListenCache.
645+
*
646+
* <p>Reproduces the same bug class fixed by PR #9461 (issue: NullPointer judgement order):
647+
* if server returns a {@code changeKey} in {@link ConfigChangeBatchListenResponse} that no
648+
* longer exists in the local {@code cacheMap} (e.g. user has just called {@code removeListener}
649+
* concurrently), the call site previously did
650+
* {@code cacheMap.get().get(changeKey).isInitializing()} without null-check.
651+
*
652+
* <p>The NPE is swallowed by the surrounding {@code catch (Throwable)} so we cannot
653+
* detect it via {@code assertDoesNotThrow}. Instead we observe an indirect post-condition:
654+
* the buggy code aborts the {@code listenCaches} loop (which clears
655+
* {@code cacheData.isInitializing()}), whereas the fixed code skips the ghost key and
656+
* still executes that loop. So an unchanged {@code isInitializing()=true} after
657+
* {@code executeConfigListen} reveals the swallowed NPE.
658+
*
659+
* <p>Both former call sites at the buggy lines 1125 and 1136 share the same helper
660+
* {@code refreshContentAndCheck(RpcClient, String, boolean)}, so making either one safe
661+
* makes the other safe by construction; this test exercises the line-1125 path
662+
* (changedConfigs branch) and asserts the helper-level safety.
663+
*/
664+
@Test
665+
void testCheckListenCacheSkipsRemovedKey() throws Exception {
666+
Properties prop = new Properties();
667+
ConfigFilterChainManager filter = new ConfigFilterChainManager(new Properties());
668+
ConfigServerListManager agent = Mockito.mock(ConfigServerListManager.class);
669+
Mockito.when(agent.getName()).thenReturn("mocktest");
670+
final NacosClientProperties nacosClientProperties =
671+
NacosClientProperties.PROTOTYPE.derive(prop);
672+
ClientWorker clientWorker = new ClientWorker(filter, agent, nacosClientProperties);
673+
clientWorker.shutdown();
674+
675+
String group = "group-regression";
676+
String tenant = "tenant-regression";
677+
678+
// local cache contains key K_LOCAL
679+
String dataIdLocal = "dataIdLocal" + System.currentTimeMillis();
680+
CacheData localCache =
681+
normalNotConsistentCache(filter, agent.getName(), dataIdLocal, group, tenant);
682+
List<CacheData> cacheDatas = new ArrayList<>();
683+
cacheDatas.add(localCache);
684+
685+
// cacheMap.values() must return the local cache so executeConfigListen builds a non-empty
686+
// listenCachesMap. Production code under test only ever queries cacheMap.get(ghostKey),
687+
// which returns null by default (Mockito mock) — exactly modeling the concurrent
688+
// removeListener race we want to exercise.
689+
Map<String, CacheData> cacheDataMapMocked = Mockito.mock(Map.class);
690+
Mockito.when(cacheDataMapMocked.values()).thenReturn(cacheDatas);
691+
AtomicReference<Map<String, CacheData>> cacheMapMocked =
692+
Mockito.mock(AtomicReference.class);
693+
Mockito.when(cacheMapMocked.get()).thenReturn(cacheDataMapMocked);
694+
Field cacheMap = ClientWorker.class.getDeclaredField("cacheMap");
695+
cacheMap.setAccessible(true);
696+
cacheMap.set(clientWorker, cacheMapMocked);
697+
698+
// server response carries a "ghost" changeKey that is NOT in cacheMap (concurrent remove)
699+
String dataIdGhost = "dataIdGhost" + System.currentTimeMillis();
700+
ConfigChangeBatchListenResponse.ConfigContext ghostContext =
701+
new ConfigChangeBatchListenResponse.ConfigContext();
702+
ghostContext.setDataId(dataIdGhost);
703+
ghostContext.setGroup(group);
704+
ghostContext.setTenant(tenant);
705+
ConfigChangeBatchListenResponse response = new ConfigChangeBatchListenResponse();
706+
response.setChangedConfigs(Collections.singletonList(ghostContext));
707+
708+
RpcClient rpcClientInner = Mockito.mock(RpcClient.class);
709+
Mockito.when(rpcClientInner.isWaitInitiated()).thenReturn(true, false);
710+
rpcClientFactoryMockedStatic
711+
.when(() -> RpcClientFactory.createClient(anyString(), any(ConnectionType.class),
712+
any(GrpcClientConfig.class)))
713+
.thenReturn(rpcClientInner);
714+
Mockito.when(rpcClientInner.request(any(ConfigBatchListenRequest.class)))
715+
.thenReturn(response, response);
716+
717+
// sanity check: localCache starts as initializing.
718+
assertTrue(localCache.isInitializing(),
719+
"precondition: CacheData should start with isInitializing=true");
720+
721+
(clientWorker.getAgent()).executeConfigListen();
722+
723+
// After executeConfigListen finishes:
724+
// buggy 3.1.1 code: NPE is thrown at lines 1125 / 1136 because cacheMap.get().get(ghostKey)
725+
// returns null; the surrounding catch (Throwable) swallows it but aborts the
726+
// listenCaches loop, so cacheData.setInitializing(false) is never called and
727+
// localCache.isInitializing() is still true.
728+
// fixed code: ghost key is skipped (null-check inside refreshContentAndCheck shared by
729+
// both former call sites), the listenCaches loop runs to completion, and
730+
// localCache.isInitializing() is now false.
731+
assertFalse(localCache.isInitializing(),
732+
"ConfigRpcTransportClient.checkListenCache must clear isInitializing for the local "
733+
+ "cache even when server returns a ghost changeKey; "
734+
+ "if this assertion fails, the listen loop was likely aborted by an "
735+
+ "NPE in cacheMap.get().get(changeKey).isInitializing() (regression of #9461)");
736+
}
737+
646738
private CacheData discardCache(ConfigFilterChainManager filter, String envName, String dataId,
647739
String group,
648740
String tenant) {

0 commit comments

Comments
 (0)