Skip to content

Commit 1fd167f

Browse files
authored
Pause and resume-or-terminate (#829)
* Pause management Add API to ILogManager to support Pause/Resume/Wait * Android wrapper and tests * Change expected values for PauseManager * Documentation (use-after-free) * Repair git merge damage
1 parent abc7281 commit 1fd167f

File tree

17 files changed

+600
-28
lines changed

17 files changed

+600
-28
lines changed

docs/use-after-free.md

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# How to Not Crash on Shutdown
2+
3+
In any concurrent application, there is a race on shutdown between the destruction of SDK components and threads that may be calling SDK methods. This can occur both with static destructors (e.g., the static `LogManager` instance) and destructors for heap-allocated log manager instances. The most common symptom is a read-dereference crash in a `LogEvent` call stack caused by a read-after-free of the internal state of a log manager (less frequently one may see a read-after-free in the background threads of the a log manager; `LogEvent` is typically called more frequently than these background tasks, so it predominates).
4+
5+
As an interim solution, the SDK has added the `PauseActivity` and `WaitPause` methods, and modified the `FlushAndTeardown` method. Applications can reduce or eliminate use-after-free crashes by managing SDK lifetime and using these methods on shutdown.
6+
7+
## PauseActivity and WaitPause
8+
9+
The `PauseActivity` and `WaitPause` methods provide a reliable mechanism to completely quiesce the SDK prior to shutdown. While the SDK is quiesced, calls to `LogEvent`, `Flush`, and `UploadNow` will have no effect (they will return immediately without doing anything), and the SDK's background threads will not run.
10+
11+
`PauseActivity` starts the process of quiescing the SDK. It does not abort calls-in-progress from either other application threads into SDK methods, or from SDK background jobs. The SDK will quiesce once those calls in progress complete.
12+
13+
`WaitPause` waits for these calls to complete. It will return once the SDK is completely quiesced. The time required is potentially unbounded (as with any concurrent operation) though in practice the bound should be on the order of the time required to complete synchronous writes to the persistent database.
14+
15+
For shutdown, the suggested sequence of SDK calls is:
16+
17+
- `Flush` to persist all current telemetry to persistent storage.
18+
- `PauseActivity` to start the quiesce.
19+
- `WaitPause` to ensure that the quiesce completed.
20+
- proceed with other shutdown activity as needed.
21+
- ideally use `join` or similar mechanisms to quiesce all application threads before shutdown.
22+
- Since the application should be quiesced, use `SIGKILL` or similar immediate-exit mechanisms to avoid firing destructors and freeing heap at exit, thus avoiding use-after-free crashes on application exit.
23+
24+
## SDK Lifetime Management
25+
26+
The final two steps in that suggested sequence (`join` and `SIGKILL`) should ensure that the SDK outlives potential callers. The `join` step requires potentially unbounded time to complete, and the time in practice depends on application architecture. Thus the suggestion to use `SIGKILL` to terminate execution, possibly before the `join` completes. The immediate-exit effects are no more disruptive than use-after-free crashes, should you be worried about in-flight write operations at exit.
27+
28+
## FlushAndTeardown
29+
30+
The pull request that introduces `PauseActivity` adds calls to `PauseActivity` followed by `WaitPause` in the existing `FlushAndTeardown` method. If one wishes to use this backstop to quiesce the SDK, the application should call `Flush` before `FlushAndTeardown`. This looks redundant (doesn't FlushAndTeardown actually flush, you might ask), but despite the method name `FlushAndTeardown`, the call to `Flush` is required to ensure persistence before the `PauseActivity` quiesce.

lib/android_build/app/src/androidTest/java/com/microsoft/applications/events/maesdktest/SDKUnitNativeTest.java

+159
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,165 @@ public void useAppContext() {
9292
assertEquals("com.microsoft.applications.events.maesdktest", appContext.getPackageName());
9393
}
9494

95+
static class Waiter implements Runnable {
96+
@Override
97+
public void run() {
98+
LogManager.waitPause();
99+
}
100+
}
101+
102+
@Test
103+
public void notPausedInitially() {
104+
System.loadLibrary("native-lib");
105+
System.loadLibrary("maesdk");
106+
107+
Context appContext = InstrumentationRegistry.getInstrumentation().getTargetContext();
108+
HttpClient client = new HttpClient(appContext);
109+
OfflineRoom.connectContext(appContext);
110+
111+
final String token =
112+
"0123456789abcdef0123456789abcdef-01234567-0123-0123-0123-0123456789ab-0123";
113+
114+
ILogger logger = LogManager.initialize(token);
115+
assertThat(logger, isA(ILogger.class));
116+
Thread t = new Thread(new Waiter());
117+
t.start();
118+
try {
119+
t.join(5);
120+
} catch (InterruptedException e) {
121+
e.printStackTrace();
122+
fail("Waiter interrupted");
123+
}
124+
assertThat(t.isAlive(), is(false));
125+
assertThat(LogManager.startActivity(), is(true));
126+
LogManager.endActivity();
127+
}
128+
129+
@Test
130+
public void pausingWillWait() {
131+
System.loadLibrary("native-lib");
132+
System.loadLibrary("maesdk");
133+
134+
Context appContext = InstrumentationRegistry.getInstrumentation().getTargetContext();
135+
HttpClient client = new HttpClient(appContext);
136+
OfflineRoom.connectContext(appContext);
137+
138+
final String token =
139+
"0123456789abcdef0123456789abcdef-01234567-0123-0123-0123-0123456789ab-0123";
140+
141+
ILogger logger = LogManager.initialize(token);
142+
assertThat(logger, isA(ILogger.class));
143+
assertThat(LogManager.startActivity(), is(true));
144+
LogManager.pauseActivity();
145+
Thread t = new Thread(new Waiter());
146+
t.start();
147+
try {
148+
t.join(5);
149+
} catch (InterruptedException e) {
150+
e.printStackTrace();
151+
fail("Waiter interrupted");
152+
}
153+
assertThat(t.isAlive(), is(true));
154+
LogManager.endActivity();
155+
try {
156+
t.join(5);
157+
} catch (InterruptedException e) {
158+
e.printStackTrace();
159+
fail("Waiter interrupted");
160+
}
161+
assertThat(t.isAlive(), is(false));
162+
LogManager.resumeActivity();
163+
}
164+
165+
class InstanceWaiter implements Runnable {
166+
ILogManager logManager;
167+
168+
public InstanceWaiter(ILogManager x) {
169+
logManager = x;
170+
}
171+
172+
@Override
173+
public void run() {
174+
logManager.waitPause();
175+
}
176+
}
177+
178+
@Test
179+
public void pauseOnInstance() {
180+
System.loadLibrary("native-lib");
181+
System.loadLibrary("maesdk");
182+
183+
Context appContext = InstrumentationRegistry.getInstrumentation().getTargetContext();
184+
HttpClient client = new HttpClient(appContext);
185+
OfflineRoom.connectContext(appContext);
186+
187+
final String token =
188+
"0123456789abcdef0123456789abcdef-01234567-0123-0123-0123-0123456789ab-0123";
189+
190+
ILogConfiguration custom = LogManager.logConfigurationFactory();
191+
custom.set(LogConfigurationKey.CFG_STR_PRIMARY_TOKEN, token);
192+
custom.set(LogConfigurationKey.CFG_INT_MAX_TEARDOWN_TIME, (long) 5);
193+
custom.set(LogConfigurationKey.CFG_STR_FACTORY_NAME, "AThing");
194+
195+
final ILogManager secondaryManager = LogManagerProvider.createLogManager(custom);
196+
assertThat(secondaryManager.startActivity(), is(true));
197+
secondaryManager.pauseActivity();
198+
Thread t = new Thread(new InstanceWaiter(secondaryManager));
199+
t.start();
200+
try {
201+
t.join(5);
202+
} catch (InterruptedException e) {
203+
e.printStackTrace();
204+
fail("Waiter interrupted");
205+
}
206+
assertThat(t.isAlive(), is(true));
207+
secondaryManager.endActivity();
208+
try {
209+
t.join(5);
210+
} catch (InterruptedException e) {
211+
e.printStackTrace();
212+
fail("Waiter interrupted");
213+
}
214+
assertThat(t.isAlive(), is(false));
215+
secondaryManager.resumeActivity();
216+
}
217+
218+
@Test
219+
public void resumingAlsoEnds() {
220+
System.loadLibrary("native-lib");
221+
System.loadLibrary("maesdk");
222+
223+
Context appContext = InstrumentationRegistry.getInstrumentation().getTargetContext();
224+
HttpClient client = new HttpClient(appContext);
225+
OfflineRoom.connectContext(appContext);
226+
227+
final String token =
228+
"0123456789abcdef0123456789abcdef-01234567-0123-0123-0123-0123456789ab-0123";
229+
230+
ILogger logger = LogManager.initialize(token);
231+
assertThat(logger, isA(ILogger.class));
232+
assertThat(LogManager.startActivity(), is(true));
233+
LogManager.pauseActivity();
234+
Thread t = new Thread(new Waiter());
235+
t.start();
236+
try {
237+
t.join(5);
238+
} catch (InterruptedException e) {
239+
e.printStackTrace();
240+
fail("Waiter interrupted");
241+
}
242+
assertThat(t.isAlive(), is(true));
243+
LogManager.resumeActivity();
244+
try {
245+
t.join(5);
246+
} catch (InterruptedException e) {
247+
e.printStackTrace();
248+
fail("Waiter interrupted");
249+
}
250+
assertThat(t.isAlive(), is(false));
251+
LogManager.endActivity();
252+
}
253+
95254
@Test
96255
public void runNativeTests() {
97256
System.loadLibrary("native-lib");

lib/android_build/maesdk/src/main/java/com/microsoft/applications/events/ILogManager.java

+6
Original file line numberDiff line numberDiff line change
@@ -68,4 +68,10 @@ public interface ILogManager extends AutoCloseable {
6868
public boolean registerPrivacyGuard();
6969

7070
public boolean unregisterPrivacyGuard();
71+
72+
public void pauseActivity();
73+
public void resumeActivity();
74+
public void waitPause();
75+
public boolean startActivity();
76+
public void endActivity();
7177
}

lib/android_build/maesdk/src/main/java/com/microsoft/applications/events/LogManager.java

+6-1
Original file line numberDiff line numberDiff line change
@@ -932,5 +932,10 @@ public static boolean unregisterPrivacyGuard() {
932932
// we should let LogManager remove it when it d'tors.
933933
return PrivacyGuard.isInitialized() && nativeUnregisterPrivacyGuardOnDefaultLogManager();
934934
}
935-
}
936935

936+
public static native void pauseActivity();
937+
public static native void resumeActivity();
938+
public static native void waitPause();
939+
public static native boolean startActivity();
940+
public static native void endActivity();
941+
}

lib/android_build/maesdk/src/main/java/com/microsoft/applications/events/LogManagerProvider.java

+31
Original file line numberDiff line numberDiff line change
@@ -293,5 +293,36 @@ public boolean registerPrivacyGuard() {
293293
public boolean unregisterPrivacyGuard() {
294294
return PrivacyGuard.isInitialized() && nativeUnregisterPrivacyGuard(nativeLogManager);
295295
}
296+
297+
protected native void nativePauseActivity(long nativeLogManager);
298+
protected native void nativeResumeActivity(long nativeLogManager);
299+
protected native void nativeWaitPause(long nativeLogManager);
300+
protected native boolean nativeStartActivity(long nativeLogManager);
301+
protected native void nativeEndActivity(long nativeLogManager);
302+
303+
@Override
304+
public void pauseActivity() {
305+
nativePauseActivity(nativeLogManager);
306+
}
307+
308+
@Override
309+
public void resumeActivity() {
310+
nativeResumeActivity(nativeLogManager);
311+
}
312+
313+
@Override
314+
public void waitPause() {
315+
nativeWaitPause(nativeLogManager);
316+
}
317+
318+
@Override
319+
public boolean startActivity() {
320+
return nativeStartActivity(nativeLogManager);
321+
}
322+
323+
@Override
324+
public void endActivity() {
325+
nativeEndActivity(nativeLogManager);
326+
}
296327
}
297328
}

lib/api/LogManagerImpl.cpp

+79-9
Original file line numberDiff line numberDiff line change
@@ -364,6 +364,8 @@ namespace MAT_NS_BEGIN
364364

365365
void LogManagerImpl::FlushAndTeardown()
366366
{
367+
PauseActivity();
368+
WaitPause();
367369
LOG_INFO("Shutting down...");
368370
LOCKGUARD(m_lock);
369371
if (m_alive)
@@ -754,7 +756,7 @@ namespace MAT_NS_BEGIN
754756

755757
void LogManagerImpl::ResetLogSessionData()
756758
{
757-
if (m_logSessionDataProvider)
759+
if (m_logSessionDataProvider)
758760
{
759761
m_logSessionDataProvider->ResetLogSessionData();
760762
}
@@ -868,25 +870,93 @@ namespace MAT_NS_BEGIN
868870
{
869871

870872
LOCKGUARD(m_lock);
871-
if (GetSystem())
873+
if (GetSystem())
872874
{
873875
// cleanup pending http requests
874-
GetSystem()->cleanup();
875-
876+
GetSystem()->cleanup();
877+
876878
// cleanup log session ( UUID)
877879
if (m_logSessionDataProvider)
878880
{
879881
m_logSessionDataProvider->DeleteLogSessionData();
880882
}
881-
883+
882884
// cleanup offline storage ( this will also cleanup retry queue for http requests
883-
if (m_offlineStorage)
884-
{
885-
m_offlineStorage->DeleteAllRecords();
885+
if (m_offlineStorage)
886+
{
887+
m_offlineStorage->DeleteAllRecords();
886888
}
887889
}
888890
return STATUS_SUCCESS;
889891
}
892+
893+
// Pause/Resume interface
894+
895+
void LogManagerImpl::PauseActivity()
896+
{
897+
std::unique_lock<std::mutex> lock(m_pause_mutex);
898+
if (m_pause_state != PauseState::Active) {
899+
return;
900+
}
901+
902+
if (m_pause_active_count == 0)
903+
{
904+
m_pause_state = PauseState::Paused;
905+
// notify under lock because a waiting thread
906+
// may destroy objects (including this object)
907+
m_pause_cv.notify_all();
908+
}
909+
else
910+
{
911+
m_pause_state = PauseState::Pausing;
912+
}
913+
}
914+
915+
void LogManagerImpl::ResumeActivity()
916+
{
917+
std::unique_lock<std::mutex> lock(m_pause_mutex);
918+
if (m_pause_state == PauseState::Active) {
919+
return;
920+
}
921+
m_pause_state = PauseState::Active;
922+
m_pause_cv.notify_all();
923+
}
924+
925+
void LogManagerImpl::WaitPause()
926+
{
927+
std::unique_lock<std::mutex> lock(m_pause_mutex);
928+
if (m_pause_state != PauseState::Pausing) {
929+
return;
930+
}
931+
m_pause_cv.wait(lock, [this]() -> bool {
932+
return m_pause_state != PauseState::Pausing;
933+
});
934+
}
935+
936+
bool LogManagerImpl::StartActivity()
937+
{
938+
std::unique_lock<std::mutex> lock(m_pause_mutex);
939+
if (m_pause_state != PauseState::Active) {
940+
return false;
941+
}
942+
m_pause_active_count += 1;
943+
return true;
944+
}
945+
946+
void LogManagerImpl::EndActivity()
947+
{
948+
std::unique_lock<std::mutex> lock(m_pause_mutex);
949+
if (m_pause_active_count == 0) {
950+
return;
951+
}
952+
m_pause_active_count -= 1;
953+
if (m_pause_active_count > 0) {
954+
return;
955+
}
956+
if (m_pause_state == PauseState::Pausing) {
957+
m_pause_state = PauseState::Paused;
958+
m_pause_cv.notify_all();
959+
}
960+
}
890961
}
891962
MAT_NS_END
892-

0 commit comments

Comments
 (0)