Skip to content

RDKEMW-15498:Implement software update service layer library (Fix review comments) #232

Open
mkadinti wants to merge 18 commits into
developfrom
topic/RDKEMW-15498_backup
Open

RDKEMW-15498:Implement software update service layer library (Fix review comments) #232
mkadinti wants to merge 18 commits into
developfrom
topic/RDKEMW-15498_backup

Conversation

@mkadinti
Copy link
Copy Markdown
Contributor

@mkadinti mkadinti commented Apr 6, 2026

This is a backup PR

mkadinti and others added 18 commits March 17, 2026 08:49
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 6, 2026 21:43
@mkadinti mkadinti requested a review from a team as a code owner April 6, 2026 21:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the librdkFwupdateMgr client library to use an on-demand worker-thread model for async operations, adds a developer “KnowWhereItBreaks” (KWIB) test utility build target, and introduces extensive design/review documentation intended to capture architecture decisions and prior review outcomes.

Changes:

  • Add kwib_test_utility build/install target and accompanying KWIB documentation.
  • Strengthen registerProcess() / unregisterProcess() D-Bus error handling and add session-state guards to block unregister during active operations.
  • Document (and wire up) the internal on-demand worker-thread architecture via expanded internal headers and API-layer handshake logic.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
Makefile.am Adds Automake target to build/install kwib_test_utility.
librdkFwupdateMgr/src/rdkFwupdateMgr_process.c Adds internal guard usage for unregister + improves GError null-safety; renames timeout macro.
librdkFwupdateMgr/src/rdkFwupdateMgr_async_internal.h Major internal header refactor documenting on-demand worker thread design and declaring internal helpers.
librdkFwupdateMgr/src/rdkFwupdateMgr_api.c Implements per-call worker-thread startup + “ready” condvar handshake for check/download/update APIs.
kwib_src/USAGE_KWIB.md New usage guide for KWIB utility.
kwib_src/KnowWhereItBreaks_README.md New KWIB overview/readme (currently inconsistent with actual binary name).
docs/LIBRARY_API_DESIGN_GUIDE.md New library API/design guide documenting the architecture and contracts.
docs/KnowWhereItBreaks.md New KWIB design/plan doc (currently inconsistent with repo/build/API details).
docs/KNOWWHEREITBREAKS_README.md New KWIB readme (currently inconsistent with repo layout/binary).
docs/DBUS_codereview.txt New D-Bus review notes (contains encoding artifacts).
docs/CODEREVIEW_REPORT.txt New code review report (contains encoding artifacts).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +242 to +274
bool failed = ctx->init_failed || (wait_rc == ETIMEDOUT);
pthread_mutex_unlock(&ctx->ready_mutex);

if (wait_rc == ETIMEDOUT) {
FWUPMGR_ERROR("checkForUpdate: worker thread did not signal ready "
"within %ds — treating as init failure. handle='%s'\n",
WORKER_READY_TIMEOUT_SEC, handle);
}

/* [9] Check if worker failed to initialize
*
* The worker tried to connect to D-Bus and subscribe to signals.
* If that failed (D-Bus dead, system error), init_failed is true.
* We join the worker (it's already exiting) and return FAIL to the app.
* The worker handles its own cleanup — we just wait for it to finish.
*
* IMPORTANT: We use worker_thread (local copy), NOT ctx->thread.
* After the condvar wake, the worker may have already freed ctx.
*/
if (failed) {
FWUPMGR_ERROR("checkForUpdate: worker thread failed to initialize. "
"handle='%s'\n", handle);
pthread_join(worker_thread, NULL);

/* After join, the worker has exited. If caller_owns_cleanup is set,
* the worker left the mutex/cond/ctx alive for us to clean up safely.
* If not set (e.g., timeout case where worker is still running),
* the worker will eventually clean up itself — but since we joined,
* the worker has already exited so ctx is safe to access. */
pthread_mutex_destroy(&ctx->ready_mutex);
pthread_cond_destroy(&ctx->ready_cond);
free(ctx->handle_key);
free(ctx);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the readiness-timeout path (wait_rc == ETIMEDOUT), this code calls pthread_join(worker_thread). If the worker thread is actually wedged (the reason for the timeout), join can block indefinitely, defeating the purpose of the bounded timed wait. Also, when the worker owns cleanup (default), it frees ctx before exiting; joining and then later destroying ctx resources in the caller can become a double-free/UAF. Consider treating ETIMEDOUT as a non-joinable failure: clear in-progress state and return without touching ctx (or add an explicit cancellation/handshake so the caller can safely stop the worker before joining/freeing).

Suggested change
bool failed = ctx->init_failed || (wait_rc == ETIMEDOUT);
pthread_mutex_unlock(&ctx->ready_mutex);
if (wait_rc == ETIMEDOUT) {
FWUPMGR_ERROR("checkForUpdate: worker thread did not signal ready "
"within %ds — treating as init failure. handle='%s'\n",
WORKER_READY_TIMEOUT_SEC, handle);
}
/* [9] Check if worker failed to initialize
*
* The worker tried to connect to D-Bus and subscribe to signals.
* If that failed (D-Bus dead, system error), init_failed is true.
* We join the worker (it's already exiting) and return FAIL to the app.
* The worker handles its own cleanupwe just wait for it to finish.
*
* IMPORTANT: We use worker_thread (local copy), NOT ctx->thread.
* After the condvar wake, the worker may have already freed ctx.
*/
if (failed) {
FWUPMGR_ERROR("checkForUpdate: worker thread failed to initialize. "
"handle='%s'\n", handle);
pthread_join(worker_thread, NULL);
/* After join, the worker has exited. If caller_owns_cleanup is set,
* the worker left the mutex/cond/ctx alive for us to clean up safely.
* If not set (e.g., timeout case where worker is still running),
* the worker will eventually clean up itselfbut since we joined,
* the worker has already exited so ctx is safe to access. */
pthread_mutex_destroy(&ctx->ready_mutex);
pthread_cond_destroy(&ctx->ready_cond);
free(ctx->handle_key);
free(ctx);
bool init_failed = ctx->init_failed;
pthread_mutex_unlock(&ctx->ready_mutex);
if (wait_rc == ETIMEDOUT) {
FWUPMGR_ERROR("checkForUpdate: worker thread did not signal ready "
"within %ds — treating as init failure. handle='%s'\n",
WORKER_READY_TIMEOUT_SEC, handle);
/* Timeout is a non-joinable failure path: if the worker is wedged,
* pthread_join() could block indefinitely and defeat the bounded
* timed wait. The worker still owns ctx on this path, so do not
* touch ctx resources here. Just clear the in-progress state and
* return failure. */
internal_abort_check_operation();
return CHECK_FOR_UPDATE_FAIL;
}
/* [9] Check if worker failed to initialize
*
* The worker tried to connect to D-Bus and subscribe to signals.
* If that failed (D-Bus dead, system error), init_failed is true.
* In this path the worker has reported failure, so joining it is
* appropriate before returning FAIL to the app.
*
* IMPORTANT: We use worker_thread (local copy), NOT ctx->thread.
* After the condvar wake, the worker may have already freed ctx unless
* it explicitly left cleanup to the caller.
*/
if (init_failed) {
FWUPMGR_ERROR("checkForUpdate: worker thread failed to initialize. "
"handle='%s'\n", handle);
pthread_join(worker_thread, NULL);
/* Only destroy/free ctx resources if ownership was explicitly
* transferred to the caller. Otherwise the worker performed its
* own cleanup before exiting. */
if (ctx->caller_owns_cleanup) {
pthread_mutex_destroy(&ctx->ready_mutex);
pthread_cond_destroy(&ctx->ready_cond);
free(ctx->handle_key);
free(ctx);
}

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +248
bool failed = ctx->init_failed || (wait_rc == ETIMEDOUT);
pthread_mutex_unlock(&ctx->ready_mutex);

if (wait_rc == ETIMEDOUT) {
FWUPMGR_ERROR("checkForUpdate: worker thread did not signal ready "
"within %ds — treating as init failure. handle='%s'\n",
WORKER_READY_TIMEOUT_SEC, handle);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed is only set for init_failed or ETIMEDOUT. If pthread_cond_timedwait returns another error (e.g., EINVAL/EPERM), the function will proceed as if initialization succeeded even though the worker may not be ready. Treat any non-zero wait_rc as failure (log the rc) to avoid returning SUCCESS in an unknown state.

Suggested change
bool failed = ctx->init_failed || (wait_rc == ETIMEDOUT);
pthread_mutex_unlock(&ctx->ready_mutex);
if (wait_rc == ETIMEDOUT) {
FWUPMGR_ERROR("checkForUpdate: worker thread did not signal ready "
"within %ds — treating as init failure. handle='%s'\n",
WORKER_READY_TIMEOUT_SEC, handle);
bool failed = ctx->init_failed || (wait_rc != 0);
pthread_mutex_unlock(&ctx->ready_mutex);
if (wait_rc == ETIMEDOUT) {
FWUPMGR_ERROR("checkForUpdate: worker thread did not signal ready "
"within %ds — treating as init failure. handle='%s'\n",
WORKER_READY_TIMEOUT_SEC, handle);
} else if (wait_rc != 0) {
FWUPMGR_ERROR("checkForUpdate: pthread_cond_timedwait failed "
"(rc=%d) — treating as init failure. handle='%s'\n",
wait_rc, handle);

Copilot uses AI. Check for mistakes.
Comment on lines +587 to +602
if (failed) {
FWUPMGR_ERROR("downloadFirmware: worker init failed or daemon rejected. "
"handle='%s'\n", handle);
pthread_join(worker_thread, NULL);

/* After join, the worker has exited. If caller_owns_cleanup is set,
* the worker left the mutex/cond/ctx alive for us to clean up safely.
* The worker already freed its own resources (daemon_reject_message, etc.)
* but left our strdup'd strings, mutex/cond, and ctx for us. */
pthread_mutex_destroy(&ctx->ready_mutex);
pthread_cond_destroy(&ctx->ready_cond);
free(ctx->handle_key);
free(ctx->firmware_name);
free(ctx->firmware_url);
free(ctx->firmware_type);
free(ctx);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same readiness-timeout issue as checkForUpdate(): if wait_rc==ETIMEDOUT, pthread_join(worker_thread) can block indefinitely. Additionally, on paths where the worker owns cleanup, it will free/destroy ctx before exit; joining and then freeing ctx in the caller risks double-free/UAF. Please adjust the ETIMEDOUT handling so the caller does not blindly join+free ctx without a safe ownership guarantee.

Copilot uses AI. Check for mistakes.
Comment on lines +563 to +570
pthread_mutex_lock(&ctx->ready_mutex);
int wait_rc = 0;
while (!ctx->is_ready && wait_rc == 0) {
wait_rc = pthread_cond_timedwait(&ctx->ready_cond, &ctx->ready_mutex,
&deadline);
}
bool failed = ctx->init_failed || (wait_rc == ETIMEDOUT);
pthread_mutex_unlock(&ctx->ready_mutex);
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

downloadFirmware() only treats ETIMEDOUT as a timed-wait failure. If pthread_cond_timedwait returns any other non-zero error, failed remains false and the function may return SUCCESS even though the worker never signaled readiness. Treat any non-zero wait_rc as failure and log it.

Copilot uses AI. Check for mistakes.
Comment on lines +901 to +918
if (failed) {
FWUPMGR_ERROR("updateFirmware: worker init failed or daemon rejected. "
"handle='%s'\n", handle);
pthread_join(worker_thread, NULL);

/* After join, the worker has exited. If caller_owns_cleanup is set,
* the worker left the mutex/cond/ctx alive for us to clean up safely.
* The worker already freed its own resources (daemon_reject_message, etc.)
* but left our strdup'd strings, mutex/cond, and ctx for us. */
pthread_mutex_destroy(&ctx->ready_mutex);
pthread_cond_destroy(&ctx->ready_cond);
free(ctx->handle_key);
free(ctx->firmware_name);
free(ctx->firmware_location);
free(ctx->firmware_type);
free(ctx->reboot_flag);
free(ctx);

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same readiness-timeout issue as checkForUpdate()/downloadFirmware(): if wait_rc==ETIMEDOUT, pthread_join(worker_thread) can block indefinitely, and the caller freeing ctx after join is unsafe if the worker owned cleanup. UpdateFirmware should handle the timed-wait timeout without risking deadlock or double-free/UAF (e.g., explicit cancellation/ownership handshake before join, or return without join/free on ETIMEDOUT).

Copilot uses AI. Check for mistakes.
Comment thread docs/DBUS_codereview.txt
Comment on lines +28 to +38
| Review Category | Status | Findings |
|-----------------|--------|----------|
| **1. Coverity Issues** | � Mostly Resolved | 3 HIGH fixed (CID-03/04/05); 2 HIGH reassessed to LOW (CID-01/02); ~7 LOW remaining |
| **2. Memory Leaks** | ✅ Verified | LEAK-01, LEAK-02 verified NOT APPLICABLE (code already handles); LEAK-03 NOT APPLICABLE (char[] not char*); 2 LOW remaining |
| **3. Thread Safety** | ✅ Well Handled | Strong in library; daemon gap (IsFlashInProgress) reassessed LOW due to GLib main loop serialization |
| **4. Race Conditions** | ✅ Mostly Resolved | RACE-01 reassessed LOW (GLib serialization); RACE-02/03 LOW (client-contract or architectural constraint) |
| **5. Critical Sections** | ✅ Well Handled | Good mutex encapsulation patterns on both sides |
| **6. Positive/Negative Tests** | ⚠️ Gaps Found | ~5 unhandled edge cases (test coverage recommendations) |
| **7. Buffer Overflow/Underflow** | ✅ Well Handled | BUF-01 reassessed LOW (no current misuse); BUF-02/03 LOW |

**Overall Assessment:** � **Conditionally Approved — all HIGH findings fixed or reassessed; LOW-priority follow-ups remain**
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document contains a replacement character ("�") in the status/assessment fields, which usually indicates an encoding/copy-paste issue and renders poorly in many viewers. Please replace it with an intended character (e.g., ASCII text or a proper emoji) and ensure the file is saved as UTF-8.

Copilot uses AI. Check for mistakes.
| **6. Positive/Negative Tests** | ⚠️ Gaps Found | ~4 unhandled edge cases (documentation/test coverage) |
| **7. Buffer Overflow/Underflow** | ✅ Well Handled | strncpy used properly |

**Overall Assessment:** � **Approved — all HIGH/MEDIUM findings fixed or verified; LOW-priority follow-ups remain**
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This report contains an unexpected replacement character ("�") in the overall assessment line, which suggests an encoding issue. Please replace it with the intended character (or plain text) and ensure the file is saved as UTF-8 so it renders consistently.

Suggested change
**Overall Assessment:** **Approved all HIGH/MEDIUM findings fixed or verified; LOW-priority follow-ups remain**
**Overall Assessment:** Approved - all HIGH/MEDIUM findings fixed or verified; LOW-priority follow-ups remain

Copilot uses AI. Check for mistakes.
Comment on lines 71 to 74
#include "rdkFwupdateMgr_client.h"
#include "rdkFwupdateMgr_log.h"
#include "rdkFwupdateMgr_async_internal.h" /* for internal_is_check_in_progress() */
#include <stdlib.h>
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdkFwupdateMgr_async_internal.h is now included here for the internal_is_*_in_progress() helpers, but this file also defines DBUS_SERVICE_NAME/DBUS_OBJECT_PATH/DBUS_INTERFACE_NAME later with the same values. Since the internal header already defines those macros, the duplication becomes easy to accidentally diverge. Prefer keeping a single source of truth (use the header definitions and remove the local duplicates).

Copilot uses AI. Check for mistakes.
Comment thread docs/KnowWhereItBreaks.md
Comment on lines +186 to +191
| 1.4 | `unregister_happy_path` | ✅ | `unregisterProcess(handle)` returns SUCCESS |
| 1.5 | `unregister_invalid_handle` | ✅ | `unregisterProcess("invalid_99")` returns FAILED |
| 1.6 | `unregister_null_handle` | ❌ | `unregisterProcess(NULL)` returns FAILED (library validation) |
| 1.7 | `unregister_during_check` | ✅ | Start check → unregister → FAILED (guard blocks it) |
| 1.8 | `unregister_during_download` | ✅ | Start download → unregister → FAILED (guard blocks it) |
| 1.9 | `unregister_during_update` | ✅ | Start update → unregister → FAILED (guard blocks it) |
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table treats unregisterProcess() as returning SUCCESS/FAILED, but in the current library API unregisterProcess() has a void return type and unregisterProcess(NULL) is explicitly a no-op. The documented expectations (e.g., “returns FAILED”) should be updated to reflect the actual API contract/behavior.

Suggested change
| 1.4 | `unregister_happy_path` || `unregisterProcess(handle)` returns SUCCESS |
| 1.5 | `unregister_invalid_handle` || `unregisterProcess("invalid_99")` returns FAILED |
| 1.6 | `unregister_null_handle` || `unregisterProcess(NULL)` returns FAILED (library validation) |
| 1.7 | `unregister_during_check` || Start check → unregister → FAILED (guard blocks it) |
| 1.8 | `unregister_during_download` || Start download → unregister → FAILED (guard blocks it) |
| 1.9 | `unregister_during_update` || Start update → unregister → FAILED (guard blocks it) |
| 1.4 | `unregister_happy_path` || `unregisterProcess(handle)` completes with no return value; process is unregistered |
| 1.5 | `unregister_invalid_handle` || `unregisterProcess("invalid_99")` has no return value; verify it is handled safely without crashing |
| 1.6 | `unregister_null_handle` || `unregisterProcess(NULL)` is a no-op per API contract |
| 1.7 | `unregister_during_check` || Start check → unregister; validate resulting state/guard behavior, not a return code |
| 1.8 | `unregister_during_download` || Start download → unregister; validate resulting state/guard behavior, not a return code |
| 1.9 | `unregister_during_update` || Start update → unregister; validate resulting state/guard behavior, not a return code |

Copilot uses AI. Check for mistakes.
Comment thread docs/KnowWhereItBreaks.md
Comment on lines +239 to +249
### Category 5: Cross-API / Lifecycle (5 tests)

| ID | Test Name | Needs Daemon | What It Validates |
|----|-----------|:---:|-------------------|
| 5.1 | `full_lifecycle` | ✅ | Register → Check → Download → Update → Unregister (all succeed) |
| 5.2 | `full_lifecycle_no_sleeps` | ✅ | Same as 5.1 but no sleep() between calls |
| 5.3 | `check_and_download_simultaneous` | ✅ | Both calls succeed (different guards) |
| 5.4 | `download_then_update_sequential` | ✅ | Download completes, then update succeeds |
| 5.5 | `double_register_full_lifecycle` | ✅ | Register twice, run ops on both handles |

**Total: 45 automated tests**
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document states “Total: 45 automated tests”, but the KWIB utility/docs introduced elsewhere in this PR describe 39 test cases. Please reconcile the test count (and the listed catalog) so it matches the actual implemented tests for kwib_test_utility.

Suggested change
### Category 5: Cross-API / Lifecycle (5 tests)
| ID | Test Name | Needs Daemon | What It Validates |
|----|-----------|:---:|-------------------|
| 5.1 | `full_lifecycle` || Register → Check → Download → Update → Unregister (all succeed) |
| 5.2 | `full_lifecycle_no_sleeps` || Same as 5.1 but no sleep() between calls |
| 5.3 | `check_and_download_simultaneous` || Both calls succeed (different guards) |
| 5.4 | `download_then_update_sequential` || Download completes, then update succeeds |
| 5.5 | `double_register_full_lifecycle` || Register twice, run ops on both handles |
**Total: 45 automated tests**
### Category 5: Cross-API / Lifecycle (documented integration scenarios; not part of the current automated count)
| ID | Scenario Name | Needs Daemon | What It Validates |
|----|---------------|:---:|-------------------|
| 5.1 | `full_lifecycle` || Register → Check → Download → Update → Unregister (documented end-to-end scenario) |
| 5.2 | `full_lifecycle_no_sleeps` || Same as 5.1 but no sleep() between calls |
| 5.3 | `check_and_download_simultaneous` || Concurrent/sequenced API interaction scenario |
| 5.4 | `download_then_update_sequential` || Download completes, then update succeeds |
| 5.5 | `double_register_full_lifecycle` || Register twice, run ops on both handles |
**Total: 39 automated tests currently implemented in `kwib_test_utility`**

Copilot uses AI. Check for mistakes.
} else {
TEST_FAIL("TC01 — registerProcess()", "Returned NULL or empty handle");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverity Issue - Resource leak

Variable "h" going out of scope leaks the storage it points to.

High Impact, CWE-404
RESOURCE_LEAK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants