Draft: Refactor internal workflows #52

dgkf-roche · 2025-02-04T17:49:59Z

This is a pretty big redesign of the internals. The design objectives were to:

Unify the "plan" (data.frame) and "design" (task_graph) so that they both leverage a graph structure.
Avoid pre-calculation for data that can be easily re-derived
- Example: package "alias" and "custom" fields are now redundant with the graph structure. We can determine which is the dev and release versions based on their dependencies, no longer needs to be stored in node.
- Example: formatting functions used instead of aliases for user-facing communication
Aim to make each task node self-contained so that it can be executed directly
- Example: instead of deriving lib path locations based on class (task_get_install_lib()), the lib path method is declared when the install task node is created

Use Case

The code that has been refactored is really only for a central, primary use case so far:

run("./pkg")

# or, broken down into its component parts
g <- plan_rev_dep_checks("./pkg")  # declarative "high-level" tasks
g <- task_graph_create(g)  # complete graph with dependent tasks
run(checker$new(g))

Naming conventions changes:

check_design => checker
checks_df, tasks_df => task_graph (plan, but want to consolidate these still)
package_spec => pkg_origin
task_spec => task

Non-goals:

Graph performance. For now I'd prefer to have a focus on data structures that represent the problem best, and we can worry about performance optimizations after we know what will best articulate the problem.

I'll highlight important bits of code in comments

dgkf-roche · 2025-02-04T17:53:15Z

R/plan.R

+plan_rev_dep_dev_check <- function(path, revdep, repos) {
+  origin <- pkg_origin_local(path = path)
+
+  tasks <- list(
+    make_unique_task(seed = "dev", check_task(
+      origin = pkg_origin_repo(package = revdep, repos = repos),
+      env = DEFAULT_R_CMD_CHECK_ENVVARS,
+      args = DEFAULT_R_CMD_CHECK_ARGS,
+      build_args = DEFAULT_R_CMD_BUILD_ARGS
+    )),
+    install_task(origin = origin)
+  )
+
+  planned(sequence_graph(
+    name = hashes(tasks),
+    task = tasks,
+    task_type = lapply(tasks, function(task) class(task)[[1]]),
+    package = c(revdep, origin$package)
+  ))
+}


Planning a reverse-dependency check requires just two declarative nodes: running the R CMD check and installing the appropriate version of the dependency (CRAN or local dev version).

When constructing a full set of rev dep checks, we create pairs (release + dev) of these types of sub-graphs and then merge them into a single task graph.

note: I'm trying to simplify this further, for now just focus on the tasks <- list(...) part

dgkf-roche · 2025-02-04T17:55:45Z

R/task.R

+install_task <- function(
+  origin,
+  type = getOption("pkgType"),
+  INSTALL_opts = NULL,
+  lib = lib_loc_default(),
+  ...
+) {
+  task <- task(origin = origin, ...)
+  task$type <- type
+  task$INSTALL_opts <- INSTALL_opts
+  task$lib <- lib
+  class(task) <- c("install_task", class(task))
+  task
+}


An install_task already contains information about the library it is intended to install into. This could be a file path when one is already known, but is more likely a method of selecting a lib path such as lib_loc_default()

lib_loc_default is simply a structure(list(), class = c("lib_loc_default", "lib_loc")) used for dispatch when deriving libraries.

dgkf-roche · 2025-02-04T18:17:32Z

R/task_graph.R

+  # for each check task in the plan, build a dependency tree and merge it
+  # into the existing check task subtree
+  plan_neighborhoods <- lapply(plan_neighborhoods, function(nh) {
+    subtree <- igraph::induced_subgraph(plan, nh)
+    deps <- dep_tree(nh[[1]]$task)
+    igraph::reverse_edges(deps)
+
+    subtree <- graph_project(
+      x = deps,
+      onto = subtree,
+      where = c("name" = "package")
+    )
+
+    # set missing dependencies to be installed from repo
+    missing_task <- is.na(igraph::V(subtree)$task)
+    igraph::V(subtree)$task[missing_task] <- lapply(
+      igraph::V(subtree)$package[missing_task],
+      function(package) {
+        origin <- try_pkg_origin_repo(package = package, repos = repos)
+        install_task(origin = origin)
+      }
+    )
+
+    # re-hash tasks as vertex names (populate missing vertex names)
+    igraph::V(subtree)$name <- vcapply(igraph::V(subtree)$task, hash)
+    igraph::V(subtree)$task_type <- vcapply(
+      igraph::V(subtree)$task,
+      function(task) class(task)[[1]]
+    )
+
+    subtree
+  })


This is a a big component of the new strategy. When we "plan" a set of checks, we don't really care about the details of where packages come from or how dependencies are managed. This function completes the graph with all the task dependencies that need to be performed before the plan can be executed.

To accomplish this, we look at each check task, derive its dependency tree and then project that dependency tree back onto the existing planned tasks.

Let's imagine a situation where we are planning to run a rev dep check for pkgA with a dev version our package, pkgB. In this case, we plan to run R CMD check where our dependency is installed using the local dev source code. When we derive the dependency tree, we want to make sure that this dev dependency continues to use the existing task that has planned its installation.

graph_project projects a set of vertices and edges onto another graph based on one or more attributes. If a vertex already exists, that vertex is retained and all the edges from both graphs are merged.

flowchart TB rdcA["revdep check {pkgA}"] instB["install dep {pkgB} from source"] pkgA["{pkgA}"] pkgB["{pkgB}"] pkgX["{pkgX}"] pkgY["{pkgY}"] pkgZ["{pkgZ}"] subgraph plan direction TB rdcA --> instB end subgraph deps["{pkgA} dependencies"] direction TB pkgA --> pkgB pkgA --> pkgX pkgB --> pkgY pkgB --> pkgZ end

Loading

becomes

flowchart TB rdcA["revdep check {pkgA}"] instB["install dep {pkgB} from source"] instX["install dep {pkgX} from CRAN"] instY["install dep {pkgY} from CRAN"] instZ["install dep {pkgZ} from CRAN"] subgraph task graph direction TB rdcA --> instB rdcA --> instX instB --> instY instB --> instZ end

Loading

dgkf-roche · 2025-02-04T18:19:08Z

R/task_graph.R

 }

+#' @export
+plot.task_graph <- function(x, ...) {


Added some visualization tools to help with debugging. I have aspirations to use the viewer pane to have an actively updating graph as the execution is happening, but nowhere near that feature yet.

dgkf-roche and others added 23 commits September 3, 2024 14:03

wip: refactor updates

bf7d19e

Merge branch 'main' into meta-28-refactor

c6dad75

refactoring

647d55c

revert hashing changes

11e54c0

chore: refactor package_source => pkg_origin

1ce51e5

chore: refactor checks_df => plan

b3ac7e5

chore: rename check_design => checker

97b8dbd

chore: rebuilding docs

c7633c4

wip: refactor task_edges_df, prioritize early DESCRIPTION modification

c83059f

wip: reworking task graph building

51f7072

wip: more work towards libpaths refactor

ed60341

wip: task hierarchy

c6d8535

wip: plan as declarative task graph

30258f9

wip: continuing task graph refactor

1e84ac7

Restart CI

4836f08

rebasing changes

68e3dcf

wip: continuing task graph refactor

fb1f367

wip: getting reporters working

4e438d8

wip: getting reporters working

99d7746

produce dependency tree instead of dependency list

958204d

fix: igraph df attribute issues

0bdbe9f

fix: remove unnecessary attr copying

8bc46b1

fix: removed edge type from plot

b662dbf

dgkf-roche commented Feb 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Refactor internal workflows #52

Draft: Refactor internal workflows #52

dgkf-roche commented Feb 4, 2025 •

edited

Loading

dgkf-roche Feb 4, 2025

dgkf-roche Feb 4, 2025

dgkf-roche Feb 4, 2025

dgkf-roche Feb 4, 2025

Draft: Refactor internal workflows #52

Are you sure you want to change the base?

Draft: Refactor internal workflows #52

Conversation

dgkf-roche commented Feb 4, 2025 • edited Loading

Use Case

Naming conventions changes:

Non-goals:

dgkf-roche Feb 4, 2025

Choose a reason for hiding this comment

dgkf-roche Feb 4, 2025

Choose a reason for hiding this comment

dgkf-roche Feb 4, 2025

Choose a reason for hiding this comment

dgkf-roche Feb 4, 2025

Choose a reason for hiding this comment

dgkf-roche commented Feb 4, 2025 •

edited

Loading