You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In #3482 (parent issue of this sub-issue), we showed that passing execution records from CPU to GPU takes 80% of APC trace gen time, and in #3489, we skipped passing execution records for original instructions whose columns are all optimized away. #3482 also discussed the idea that for original instructions whose columns are mostly optimized away, we can create execution hints from existing constraints that derive these columns from execution records of other original instructions, which this sub-issue focuses on.
Overall Idea
Derived witgen will be conducted at the last step of APC generation after optimization, and will be a static analysis of post-optimization constraints.
Specific Steps
Represent system of post-optimization constraints as a DAG:
Each column will be a node.
For simplification purpose, let's only consider constraints of the form A = B * C (* ...), whose left hand side is unique and right hand side can contain arbitrary number of columns. Here we can say that B and C will "uniquely" derive A. For each of these type of constraints, draw a directed edge from each node on the right hand side to the node on the left hand side. In the example above, we draw two edges B -> A and C -> A.
It's not guaranteed that the graph is acyclical, but I'm hoping that a cyclical graph suggests that not all constraints are fully optimized away, so the optimizer should mostly output constraints that form a DAG.
Determine sequence of nodes that can be derived with the graph/DAG
All nodes will be marked as "known" at the start, and in a loop we will incrementally remove nodes from the "known pool" that can be derived from other nodes in the "known pool" and document the removal. At the end, the reverse of the removal order will be the derived order, because it's guaranteed that each removal/derive step is only based on nodes from the "known pool".
In each loop run, we remove nodes with the least incremental cost and most incremental benefit, where "cost" and "benefit" are defined as follows.
Cost of removing a node: by removing a node we might lose the chance of removing other nodes, because some other nodes in the "known pool" might depend on it. The cost of removing node A can be roughly calculated as the number of non-A nodes that the out going edges of node A can reach. This accounts for potential cyclical graph.
Benefit of removing a node: by removing a node we might be able to remove a record, and the record/node relationship has been extensively studied in [Execution] APC subs stats #3484, which shows that 85% of records have 4 or fewer columns left after optimization. The benefit of removing a node is larger if there are fewer nodes left in the record it resides in.
The cost and benefit of removing a node is always incremental and needs to be recalculated after removing each node. For example, a node A might help determine node B and only node B, so it has a cost of 1 at the start, but if node B were removed at some loop, node A will have zero cost. As another example, node A and node B might both correspond to the same record, but after removing node B, the benefit of removing node A becomes bigger.
We might need some "parameterized function" of cost and benefit to determine the priority of node removal.
Edge case: in each loop zero-cost node can be immediately removed regardless of benefit. Concretely, this means that a node has no outgoing edge, so removing it won't reduce the ability to remove any other nodes.
Simplified algorithm: we can instead remove nodes based on just the cost, and will remove greater number of nodes, though it's not guaranteed that the removed nodes will come from records with the least number of nodes left post optimization. We can also remove nodes based on just the benefit with the reverse trade off.
Determine the records that can be removed
In the prior step, once we loop till no more node can be removed, we can match the removed nodes against the post optimization nodes of each record, to determine which records can be removed (basically those whose post optimization nodes are all removed).
Then, we filter the sequence of derived columns to only those who appear in the removed records.
Finally, we ship the indices of removed records and sequence of derived columns to trace gen.
Constraint Solving
So far when using constraints of the form A = B * C (* ...) to construct the graph/DAG, we only add edges B -> A and C -> A, but because our GPU stack machine that evaluates derived column expressions has INVERSE as an opcode, we can also evaluate B if A and C are know or C if A and B are known. I'm not sure what should I include as an edge in this case, as it'll basically be 6 directed edges among A, B, and C (basically as many edges are there can be among 3 nodes).
To further extend this scenario, we can consider constraints with arbitrary number of nodes on BOTH sides of the equation. This will further complicate the graph, as each constraint will generate edges that goes from every single node to every single other nodes.
Therefore, once we extend the constraints to consider, maybe a better cost algorithm is the number constraints a node appears in? However, we then encounter the problem that a node appears in multiple constraints simply due to the degree limit, not because that it's "more important" in deriving other columns.
There's the other scenario that a node appears twice on one side of the equation. For example A * (A - 1) = B * C, and I'm not sure if A should be deemed as "solvable" from B and C, as technically it'll have two solutions, but maybe only one unique solution in the field in some cases?
Another idea is to use constraints before optimization as well, but only filtered to those constraints all of whose nodes still exist post optimization. This hopefully gives "raw constraints" with easier relationships among nodes compared to the post optimization constriants, which already undergoes optimization steps like "combining up to a certain degree", which yields more composite constraints.
Many of the issues encountered here feels very much like the witness solver we had in Powdr legacy, so I'd ask for some inputs from @chriseth.
@Schaeff: Let me know what you think :)
In #3482 (parent issue of this sub-issue), we showed that passing execution records from CPU to GPU takes 80% of APC trace gen time, and in #3489, we skipped passing execution records for original instructions whose columns are all optimized away. #3482 also discussed the idea that for original instructions whose columns are mostly optimized away, we can create execution hints from existing constraints that derive these columns from execution records of other original instructions, which this sub-issue focuses on.
Overall Idea
Derived witgen will be conducted at the last step of APC generation after optimization, and will be a static analysis of post-optimization constraints.
Specific Steps
A = B * C (* ...), whose left hand side is unique and right hand side can contain arbitrary number of columns. Here we can say that B and C will "uniquely" derive A. For each of these type of constraints, draw a directed edge from each node on the right hand side to the node on the left hand side. In the example above, we draw two edgesB -> AandC -> A.node Acan be roughly calculated as the number of non-A nodes that the out going edges ofnode Acan reach. This accounts for potential cyclical graph.subsstats #3484, which shows that 85% of records have 4 or fewer columns left after optimization. The benefit of removing a node is larger if there are fewer nodes left in the record it resides in.Constraint Solving
A = B * C (* ...)to construct the graph/DAG, we only add edgesB -> AandC -> A, but because our GPU stack machine that evaluates derived column expressions hasINVERSEas an opcode, we can also evaluate B if A and C are know or C if A and B are known. I'm not sure what should I include as an edge in this case, as it'll basically be 6 directed edges among A, B, and C (basically as many edges are there can be among 3 nodes).A * (A - 1) = B * C, and I'm not sure ifAshould be deemed as "solvable" from B and C, as technically it'll have two solutions, but maybe only one unique solution in the field in some cases?