Skip to content

Commit 0b5a5e8

Browse files
api,agent,server,engine-schema: scalability improvements (#9840)
* api,agent,server,engine-schema: scalability improvements Following changes and improvements have been added: - Improvements in handling of PingRoutingCommand 1. Added global config - `vm.sync.power.state.transitioning`, default value: true, to control syncing of power states for transitioning VMs. This can be set to false to prevent computation of transitioning state VMs. 2. Improved VirtualMachinePowerStateSync to allow power state sync for host VMs in a batch 3. Optimized scanning stalled VMs - Added option to set worker threads for capacity calculation using config - `capacity.calculate.workers` - Added caching framework based on Caffeine in-memory caching library, https://github.com/ben-manes/caffeine - Added caching for account/use role API access with expiration after write can be configured using config - `dynamic.apichecker.cache.period`. If set to zero then there will be no caching. Default is 0. - Added caching for account/use role API access with expiration after write set to 60 seconds. - Added caching for some recurring DB retrievals 1. CapacityManager - listing service offerings - beneficial in host capacity calculation 2. LibvirtServerDiscoverer existing host for the cluster - beneficial for host joins 3. DownloadListener - hypervisors for zone - beneficial for host joins 5. VirtualMachineManagerImpl - VMs in progress- beneficial for processing stalled VMs during PingRoutingCommands - Optimized MS list retrieval for agent connect - Optimize finding ready systemvm template for zone - Database retrieval optimisations - fix and refactor for cases where only IDs or counts are used mainly for hosts and other infra entities. Also similar cases for VMs and other entities related to host concerning background tasks - Changes in agent-agentmanager connection with NIO client-server classes 1. Optimized the use of the executor service 2. Refactore Agent class to better handle connections. 3. Do SSL handshakes within worker threads 5. Added global configs to control the behaviour depending on the infra. SSL handshake could be a bottleneck during agent connections. Configs - `agent.ssl.handshake.min.workers` and `agent.ssl.handshake.max.workers` can be used to control number of new connections management server handles at a time. `agent.ssl.handshake.timeout` can be used to set number of seconds after which SSL handshake times out at MS end. 6. On agent side backoff and sslhandshake timeout can be controlled by agent properties. `backoff.seconds` and `ssl.handshake.timeout` properties can be used. - Improvements in StatsCollection - minimize DB retrievals. - Improvements in DeploymentPlanner allow for the retrieval of only desired host fields and fewer retrievals. - Improvements in hosts connection for a storage pool. Added config - `storage.pool.host.connect.workers` to control the number of worker threads that can be used to connect hosts to a storage pool. Worker thread approach is followed currently only for NFS and ScaleIO pools. - Minor improvements in resource limit calculations wrt DB retrievals Signed-off-by: Abhishek Kumar <[email protected]> Co-authored-by: Abhishek Kumar <[email protected]> Co-authored-by: Rohit Yadav <[email protected]> * test1, domaindetails, capacitymanager fix Signed-off-by: Abhishek Kumar <[email protected]> * test2 - agent tests Signed-off-by: Abhishek Kumar <[email protected]> * capacitymanagertest fix Signed-off-by: Abhishek Kumar <[email protected]> * change Signed-off-by: Abhishek Kumar <[email protected]> * fix missing changes Signed-off-by: Abhishek Kumar <[email protected]> * address comments Signed-off-by: Abhishek Kumar <[email protected]> * revert marvin/setup.py Signed-off-by: Abhishek Kumar <[email protected]> * fix indent Signed-off-by: Abhishek Kumar <[email protected]> * use space in sql Signed-off-by: Abhishek Kumar <[email protected]> * address duplicate Signed-off-by: Abhishek Kumar <[email protected]> * update host logs Signed-off-by: Abhishek Kumar <[email protected]> * revert e36c6a5 Signed-off-by: Abhishek Kumar <[email protected]> * fix npe in capacity calculation Signed-off-by: Abhishek Kumar <[email protected]> * move schema changes to 4.20.1 upgrade Signed-off-by: Abhishek Kumar <[email protected]> * build fix Signed-off-by: Abhishek Kumar <[email protected]> * address comments Signed-off-by: Abhishek Kumar <[email protected]> * fix build Signed-off-by: Abhishek Kumar <[email protected]> * add some more tests Signed-off-by: Abhishek Kumar <[email protected]> * checkstyle fix Signed-off-by: Abhishek Kumar <[email protected]> * remove unnecessary mocks Signed-off-by: Abhishek Kumar <[email protected]> * build fix Signed-off-by: Abhishek Kumar <[email protected]> * replace statics Signed-off-by: Abhishek Kumar <[email protected]> * engine/orchestration,utils: limit number of concurrent new agent connections Signed-off-by: Abhishek Kumar <[email protected]> * refactor - remove unused Signed-off-by: Abhishek Kumar <[email protected]> * unregister closed connections, monitor & cleanup Signed-off-by: Abhishek Kumar <[email protected]> * add check for outdated vm filter in power sync Signed-off-by: Abhishek Kumar <[email protected]> * agent: synchronize sendRequest wait Signed-off-by: Abhishek Kumar <[email protected]> --------- Signed-off-by: Abhishek Kumar <[email protected]> Co-authored-by: Rohit Yadav <[email protected]>
1 parent ae2ffbe commit 0b5a5e8

File tree

138 files changed

+4420
-2111
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

138 files changed

+4420
-2111
lines changed

.python-version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.6
1+
3.10

agent/conf/agent.properties

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,3 +434,10 @@ iscsi.session.cleanup.enabled=false
434434

435435
# Implicit host tags managed by agent.properties
436436
# host.tags=
437+
438+
# Timeout(in seconds) for SSL handshake when agent connects to server. When no value is set then default value of 30s
439+
# will be used
440+
#ssl.handshake.timeout=
441+
442+
# Wait(in seconds) during agent reconnections. When no value is set then default value of 5s will be used
443+
#backoff.seconds=

agent/src/main/java/com/cloud/agent/Agent.java

Lines changed: 418 additions & 369 deletions
Large diffs are not rendered by default.

agent/src/main/java/com/cloud/agent/AgentShell.java

Lines changed: 33 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -16,29 +16,6 @@
1616
// under the License.
1717
package com.cloud.agent;
1818

19-
import com.cloud.agent.Agent.ExitStatus;
20-
import com.cloud.agent.dao.StorageComponent;
21-
import com.cloud.agent.dao.impl.PropertiesStorage;
22-
import com.cloud.agent.properties.AgentProperties;
23-
import com.cloud.agent.properties.AgentPropertiesFileHandler;
24-
import com.cloud.resource.ServerResource;
25-
import com.cloud.utils.LogUtils;
26-
import com.cloud.utils.ProcessUtil;
27-
import com.cloud.utils.PropertiesUtil;
28-
import com.cloud.utils.backoff.BackoffAlgorithm;
29-
import com.cloud.utils.backoff.impl.ConstantTimeBackoff;
30-
import com.cloud.utils.exception.CloudRuntimeException;
31-
import org.apache.commons.daemon.Daemon;
32-
import org.apache.commons.daemon.DaemonContext;
33-
import org.apache.commons.daemon.DaemonInitException;
34-
import org.apache.commons.lang.math.NumberUtils;
35-
import org.apache.commons.lang3.BooleanUtils;
36-
import org.apache.commons.lang3.StringUtils;
37-
import org.apache.logging.log4j.Logger;
38-
import org.apache.logging.log4j.LogManager;
39-
import org.apache.logging.log4j.core.config.Configurator;
40-
41-
import javax.naming.ConfigurationException;
4219
import java.io.File;
4320
import java.io.FileNotFoundException;
4421
import java.io.IOException;
@@ -53,6 +30,31 @@
5330
import java.util.Properties;
5431
import java.util.UUID;
5532

33+
import javax.naming.ConfigurationException;
34+
35+
import org.apache.commons.daemon.Daemon;
36+
import org.apache.commons.daemon.DaemonContext;
37+
import org.apache.commons.daemon.DaemonInitException;
38+
import org.apache.commons.lang.math.NumberUtils;
39+
import org.apache.commons.lang3.BooleanUtils;
40+
import org.apache.commons.lang3.StringUtils;
41+
import org.apache.logging.log4j.LogManager;
42+
import org.apache.logging.log4j.Logger;
43+
import org.apache.logging.log4j.core.config.Configurator;
44+
45+
import com.cloud.agent.Agent.ExitStatus;
46+
import com.cloud.agent.dao.StorageComponent;
47+
import com.cloud.agent.dao.impl.PropertiesStorage;
48+
import com.cloud.agent.properties.AgentProperties;
49+
import com.cloud.agent.properties.AgentPropertiesFileHandler;
50+
import com.cloud.resource.ServerResource;
51+
import com.cloud.utils.LogUtils;
52+
import com.cloud.utils.ProcessUtil;
53+
import com.cloud.utils.PropertiesUtil;
54+
import com.cloud.utils.backoff.BackoffAlgorithm;
55+
import com.cloud.utils.backoff.impl.ConstantTimeBackoff;
56+
import com.cloud.utils.exception.CloudRuntimeException;
57+
5658
public class AgentShell implements IAgentShell, Daemon {
5759
protected static Logger LOGGER = LogManager.getLogger(AgentShell.class);
5860

@@ -406,7 +408,9 @@ public void init(String[] args) throws ConfigurationException {
406408

407409
LOGGER.info("Defaulting to the constant time backoff algorithm");
408410
_backoff = new ConstantTimeBackoff();
409-
_backoff.configure("ConstantTimeBackoff", new HashMap<String, Object>());
411+
Map<String, Object> map = new HashMap<>();
412+
map.put("seconds", _properties.getProperty("backoff.seconds"));
413+
_backoff.configure("ConstantTimeBackoff", map);
410414
}
411415

412416
private void launchAgent() throws ConfigurationException {
@@ -455,6 +459,11 @@ public void launchNewAgent(ServerResource resource) throws ConfigurationExceptio
455459
agent.start();
456460
}
457461

462+
@Override
463+
public Integer getSslHandshakeTimeout() {
464+
return AgentPropertiesFileHandler.getPropertyValue(AgentProperties.SSL_HANDSHAKE_TIMEOUT);
465+
}
466+
458467
public synchronized int getNextAgentId() {
459468
return _nextAgentId++;
460469
}

agent/src/main/java/com/cloud/agent/IAgentShell.java

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,4 +70,6 @@ public interface IAgentShell {
7070
String getConnectedHost();
7171

7272
void launchNewAgent(ServerResource resource) throws ConfigurationException;
73+
74+
Integer getSslHandshakeTimeout();
7375
}

agent/src/main/java/com/cloud/agent/properties/AgentProperties.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -811,6 +811,13 @@ public Property<Integer> getWorkers() {
811811
*/
812812
public static final Property<String> HOST_TAGS = new Property<>("host.tags", null, String.class);
813813

814+
/**
815+
* Timeout for SSL handshake in seconds
816+
* Data type: Integer.<br>
817+
* Default value: <code>null</code>
818+
*/
819+
public static final Property<Integer> SSL_HANDSHAKE_TIMEOUT = new Property<>("ssl.handshake.timeout", null, Integer.class);
820+
814821
public static class Property <T>{
815822
private String name;
816823
private T defaultValue;

agent/src/test/java/com/cloud/agent/AgentShellTest.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -362,4 +362,11 @@ public void updateAndGetConnectedHost() {
362362

363363
Assert.assertEquals(expected, shell.getConnectedHost());
364364
}
365+
366+
@Test
367+
public void testGetSslHandshakeTimeout() {
368+
Integer expected = 1;
369+
agentPropertiesFileHandlerMocked.when(() -> AgentPropertiesFileHandler.getPropertyValue(Mockito.eq(AgentProperties.SSL_HANDSHAKE_TIMEOUT))).thenReturn(expected);
370+
Assert.assertEquals(expected, agentShellSpy.getSslHandshakeTimeout());
371+
}
365372
}
Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
// Licensed to the Apache Software Foundation (ASF) under one
2+
// or more contributor license agreements. See the NOTICE file
3+
// distributed with this work for additional information
4+
// regarding copyright ownership. The ASF licenses this file
5+
// to you under the Apache License, Version 2.0 (the
6+
// "License"); you may not use this file except in compliance
7+
// with the License. You may obtain a copy of the License at
8+
//
9+
// http://www.apache.org/licenses/LICENSE-2.0
10+
//
11+
// Unless required by applicable law or agreed to in writing,
12+
// software distributed under the License is distributed on an
13+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
// KIND, either express or implied. See the License for the
15+
// specific language governing permissions and limitations
16+
// under the License.
17+
package com.cloud.agent;
18+
19+
import static org.junit.Assert.assertEquals;
20+
import static org.junit.Assert.assertFalse;
21+
import static org.junit.Assert.assertNotNull;
22+
import static org.junit.Assert.assertSame;
23+
import static org.junit.Assert.assertTrue;
24+
import static org.mockito.Mockito.any;
25+
import static org.mockito.Mockito.doReturn;
26+
import static org.mockito.Mockito.doThrow;
27+
import static org.mockito.Mockito.mock;
28+
import static org.mockito.Mockito.eq;
29+
import static org.mockito.Mockito.times;
30+
import static org.mockito.Mockito.verify;
31+
import static org.mockito.Mockito.when;
32+
33+
import java.io.IOException;
34+
import java.net.InetSocketAddress;
35+
36+
import javax.naming.ConfigurationException;
37+
38+
import org.apache.logging.log4j.Logger;
39+
import org.junit.Before;
40+
import org.junit.Test;
41+
import org.junit.runner.RunWith;
42+
import org.mockito.junit.MockitoJUnitRunner;
43+
import org.springframework.test.util.ReflectionTestUtils;
44+
45+
import com.cloud.resource.ServerResource;
46+
import com.cloud.utils.backoff.impl.ConstantTimeBackoff;
47+
import com.cloud.utils.nio.Link;
48+
import com.cloud.utils.nio.NioConnection;
49+
50+
@RunWith(MockitoJUnitRunner.class)
51+
public class AgentTest {
52+
Agent agent;
53+
private AgentShell shell;
54+
private ServerResource serverResource;
55+
private Logger logger;
56+
57+
@Before
58+
public void setUp() throws ConfigurationException {
59+
shell = mock(AgentShell.class);
60+
serverResource = mock(ServerResource.class);
61+
doReturn(true).when(serverResource).configure(any(), any());
62+
doReturn(1).when(shell).getWorkers();
63+
doReturn(1).when(shell).getPingRetries();
64+
agent = new Agent(shell, 1, serverResource);
65+
logger = mock(Logger.class);
66+
ReflectionTestUtils.setField(agent, "logger", logger);
67+
}
68+
69+
@Test
70+
public void testGetLinkLogNullLinkReturnsEmptyString() {
71+
Link link = null;
72+
String result = agent.getLinkLog(link);
73+
assertEquals("", result);
74+
}
75+
76+
@Test
77+
public void testGetLinkLogLinkWithTraceEnabledReturnsLinkLogWithHashCode() {
78+
Link link = mock(Link.class);
79+
InetSocketAddress socketAddress = new InetSocketAddress("192.168.1.100", 1111);
80+
when(link.getSocketAddress()).thenReturn(socketAddress);
81+
when(logger.isTraceEnabled()).thenReturn(true);
82+
83+
String result = agent.getLinkLog(link);
84+
System.out.println(result);
85+
assertTrue(result.startsWith(System.identityHashCode(link) + "-"));
86+
assertTrue(result.contains("192.168.1.100"));
87+
}
88+
89+
@Test
90+
public void testGetAgentNameWhenServerResourceIsNull() {
91+
ReflectionTestUtils.setField(agent, "serverResource", null);
92+
assertEquals("Agent", agent.getAgentName());
93+
}
94+
95+
@Test
96+
public void testGetAgentNameWhenAppendAgentNameIsTrue() {
97+
when(serverResource.isAppendAgentNameToLogs()).thenReturn(true);
98+
when(serverResource.getName()).thenReturn("TestAgent");
99+
100+
String agentName = agent.getAgentName();
101+
assertEquals("TestAgent", agentName);
102+
}
103+
104+
@Test
105+
public void testGetAgentNameWhenAppendAgentNameIsFalse() {
106+
when(serverResource.isAppendAgentNameToLogs()).thenReturn(false);
107+
108+
String agentName = agent.getAgentName();
109+
assertEquals("Agent", agentName);
110+
}
111+
112+
@Test
113+
public void testAgentInitialization() {
114+
Runtime.getRuntime().removeShutdownHook(agent.shutdownThread);
115+
when(shell.getPingRetries()).thenReturn(3);
116+
when(shell.getWorkers()).thenReturn(5);
117+
agent.setupShutdownHookAndInitExecutors();
118+
assertNotNull(agent.selfTaskExecutor);
119+
assertNotNull(agent.outRequestHandler);
120+
assertNotNull(agent.requestHandler);
121+
}
122+
123+
@Test
124+
public void testAgentShutdownHookAdded() {
125+
Runtime.getRuntime().removeShutdownHook(agent.shutdownThread);
126+
agent.setupShutdownHookAndInitExecutors();
127+
verify(logger).trace("Adding shutdown hook");
128+
}
129+
130+
@Test
131+
public void testGetResourceGuidValidGuidAndResourceName() {
132+
when(shell.getGuid()).thenReturn("12345");
133+
String result = agent.getResourceGuid();
134+
assertTrue(result.startsWith("12345-" + ServerResource.class.getSimpleName()));
135+
}
136+
137+
@Test
138+
public void testGetZoneReturnsValidZone() {
139+
when(shell.getZone()).thenReturn("ZoneA");
140+
String result = agent.getZone();
141+
assertEquals("ZoneA", result);
142+
}
143+
144+
@Test
145+
public void testGetPodReturnsValidPod() {
146+
when(shell.getPod()).thenReturn("PodA");
147+
String result = agent.getPod();
148+
assertEquals("PodA", result);
149+
}
150+
151+
@Test
152+
public void testSetLinkAssignsLink() {
153+
Link mockLink = mock(Link.class);
154+
agent.setLink(mockLink);
155+
assertEquals(mockLink, agent.link);
156+
}
157+
158+
@Test
159+
public void testGetResourceReturnsServerResource() {
160+
ServerResource mockResource = mock(ServerResource.class);
161+
ReflectionTestUtils.setField(agent, "serverResource", mockResource);
162+
ServerResource result = agent.getResource();
163+
assertSame(mockResource, result);
164+
}
165+
166+
@Test
167+
public void testGetResourceName() {
168+
String result = agent.getResourceName();
169+
assertTrue(result.startsWith(ServerResource.class.getSimpleName()));
170+
}
171+
172+
@Test
173+
public void testUpdateLastPingResponseTimeUpdatesCurrentTime() {
174+
long beforeUpdate = System.currentTimeMillis();
175+
agent.updateLastPingResponseTime();
176+
long updatedTime = agent.lastPingResponseTime.get();
177+
assertTrue(updatedTime >= beforeUpdate);
178+
assertTrue(updatedTime <= System.currentTimeMillis());
179+
}
180+
181+
@Test
182+
public void testGetNextSequenceIncrementsSequence() {
183+
long initialSequence = agent.getNextSequence();
184+
long nextSequence = agent.getNextSequence();
185+
assertEquals(initialSequence + 1, nextSequence);
186+
long thirdSequence = agent.getNextSequence();
187+
assertEquals(nextSequence + 1, thirdSequence);
188+
}
189+
190+
@Test
191+
public void testRegisterControlListenerAddsListener() {
192+
IAgentControlListener listener = mock(IAgentControlListener.class);
193+
agent.registerControlListener(listener);
194+
assertTrue(agent.controlListeners.contains(listener));
195+
}
196+
197+
@Test
198+
public void testUnregisterControlListenerRemovesListener() {
199+
IAgentControlListener listener = mock(IAgentControlListener.class);
200+
agent.registerControlListener(listener);
201+
assertTrue(agent.controlListeners.contains(listener));
202+
agent.unregisterControlListener(listener);
203+
assertFalse(agent.controlListeners.contains(listener));
204+
}
205+
206+
@Test
207+
public void testCloseAndTerminateLinkLinkIsNullDoesNothing() {
208+
agent.closeAndTerminateLink(null);
209+
}
210+
211+
@Test
212+
public void testCloseAndTerminateLinkValidLinkCallsCloseAndTerminate() {
213+
Link mockLink = mock(Link.class);
214+
agent.closeAndTerminateLink(mockLink);
215+
verify(mockLink).close();
216+
verify(mockLink).terminated();
217+
}
218+
219+
@Test
220+
public void testStopAndCleanupConnectionConnectionIsNullDoesNothing() {
221+
agent.connection = null;
222+
agent.stopAndCleanupConnection(false);
223+
}
224+
225+
@Test
226+
public void testStopAndCleanupConnectionValidConnectionNoWaitStopsAndCleansUp() throws IOException {
227+
NioConnection mockConnection = mock(NioConnection.class);
228+
agent.connection = mockConnection;
229+
agent.stopAndCleanupConnection(false);
230+
verify(mockConnection).stop();
231+
verify(mockConnection).cleanUp();
232+
}
233+
234+
@Test
235+
public void testStopAndCleanupConnectionCleanupThrowsIOExceptionLogsWarning() throws IOException {
236+
NioConnection mockConnection = mock(NioConnection.class);
237+
agent.connection = mockConnection;
238+
doThrow(new IOException("Cleanup failed")).when(mockConnection).cleanUp();
239+
agent.stopAndCleanupConnection(false);
240+
verify(mockConnection).stop();
241+
verify(logger).warn(eq("Fail to clean up old connection. {}"), any(IOException.class));
242+
}
243+
244+
@Test
245+
public void testStopAndCleanupConnectionValidConnectionWaitForStopWaitsForStartupToStop() throws IOException {
246+
NioConnection mockConnection = mock(NioConnection.class);
247+
ConstantTimeBackoff mockBackoff = mock(ConstantTimeBackoff.class);
248+
mockBackoff.setTimeToWait(0);
249+
agent.connection = mockConnection;
250+
when(shell.getBackoffAlgorithm()).thenReturn(mockBackoff);
251+
when(mockConnection.isStartup()).thenReturn(true, true, false);
252+
agent.stopAndCleanupConnection(true);
253+
verify(mockConnection).stop();
254+
verify(mockConnection).cleanUp();
255+
verify(mockBackoff, times(3)).waitBeforeRetry();
256+
}
257+
}

0 commit comments

Comments
 (0)