Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dominator nodes only store immediate dominator #5137

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

JohnDumbell
Copy link
Contributor

This moves our current algorithm closer to Coopers and has the benefit of massively reduced memory usage if there are a large amount of dominator nodes in a single method.

This does mean that any operation that needs to know anything about the state of the dominator nodes beyond the immediate then needs to walk back to find it, but with the improvements of the underlying map by @smowton it'll likely still be faster than the original implementation.

Currently up to make sure CI passes before opening for reviews.

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting better -- main missing bit is the postorder numbering; rather than force cfgt into a particular order we should probably just assign postorder numbers in cfg_dominators_templatet::nodet and use that number for the intersect algorithm (which depends on being able to judge topological ordering quickly by comparing numbers)

@@ -467,7 +467,7 @@ static java_bytecode_convert_methodt::method_offsett get_common_dominator(
const auto &dominator_nodeidx=
dominator_analysis.cfg.entry_map.at(v->var.start_pc);
const auto &this_var_doms=
dominator_analysis.cfg[dominator_nodeidx].dominators;
dominator_analysis.dominated_by(dominator_nodeidx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the name dominated_by is the opposite of what we want? This should be the set of dominators of dominator_nodeidx

@@ -76,7 +74,45 @@ class cfg_dominators_templatet
/// Note by definition all program points dominate themselves.
bool dominates(T lhs, const nodet &rhs_node) const
{
return rhs_node.dominators.count(lhs);
if (!rhs_node.dominator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, duplicates the while condition


/// Returns a set of T that are the dominators of start_node. Only use if you
/// need the entire set of dominators from node to root.
std::set<T> dominated_by(T start_node) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about instead of returning a std::set we provide an iterator, then dominates can use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind it, but it'd need to be lazily-evaluated. Do we have anything like that right now I can copy/directly use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just got done writing a custom iterator, so I'll make a quick draft and post it


/// Returns a set of T that are the dominators of start_node. Only use if you
/// need the entire set of dominators from node to root.
std::set<T> dominated_by(T start_node) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, rename dominators (but see iterator comment -- dominators should return an iterable with a begin() and end() which walk over the tree)

// in-place intersection. not safe to use set_intersect
while(n_it!=node.dominators.end() && o_it!=other.end())
auto edge_index = *other.dominator;
while(potential_dominator != edge_index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the intersect function from the Cooper paper -- pull it out into a static function to make the correspondence with the paper clearer

@@ -133,44 +133,43 @@ void sese_region_analysist::compute_sese_regions(
++it)
{
// Only look for regions starting at nontrivial CFG edges:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated whitespace change


// Ideally I would use `optionalt<std::size_t>` here, but it triggers a
// GCC-5 bug.
std::size_t closest_exit_index = dominators.cfg.size();
for(const auto &possible_exit : instruction_postdoms)
std::size_t current_index = *instruction_postdom;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideal use site for a dominators iterator

@JohnDumbell JohnDumbell force-pushed the jd/feature/dom_cooper_improvement branch 7 times, most recently from 528caba to 52f2caa Compare October 8, 2019 09:37
@JohnDumbell JohnDumbell force-pushed the jd/feature/dom_cooper_improvement branch 4 times, most recently from 2276b24 to ada97ce Compare October 8, 2019 17:16
@JohnDumbell JohnDumbell force-pushed the jd/feature/dom_cooper_improvement branch 2 times, most recently from 0de344e to 8f51eef Compare October 9, 2019 11:21
@codecov-io
Copy link

codecov-io commented Oct 9, 2019

Codecov Report

Merging #5137 into develop will decrease coverage by 0.03%.
The diff coverage is 97.46%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5137      +/-   ##
===========================================
- Coverage    67.12%   67.09%   -0.04%     
===========================================
  Files         1149     1149              
  Lines        94194    94073     -121     
===========================================
- Hits         63229    63116     -113     
+ Misses       30965    30957       -8
Flag Coverage Δ
#cproversmt2 42.74% <97.05%> (+0.04%) ⬆️
#regression 63.58% <96.2%> (-0.05%) ⬇️
#unit 31.99% <94.59%> (+0.09%) ⬆️
Impacted Files Coverage Δ
src/analyses/dependence_graph.cpp 89.36% <100%> (ø) ⬆️
src/analyses/sese_regions.cpp 100% <100%> (ø) ⬆️
src/goto-analyzer/unreachable_instructions.cpp 85.81% <100%> (ø) ⬆️
src/goto-instrument/full_slicer.cpp 94.68% <100%> (-0.22%) ⬇️
...mc/src/java_bytecode/java_local_variable_table.cpp 96.58% <100%> (ø) ⬆️
src/analyses/cfg_dominators.h 97.64% <97.26%> (-2.36%) ⬇️
src/json/json_interface.cpp 71.42% <0%> (-11.43%) ⬇️
src/util/xml.cpp 60.46% <0%> (-1.17%) ⬇️
src/solvers/smt2/smt2_solver.cpp 81.87% <0%> (-1.14%) ⬇️
src/solvers/smt2/smt2_parser.cpp 65.43% <0%> (-1.13%) ⬇️
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db4c76b...8f51eef. Read the comment docs.

@JohnDumbell JohnDumbell requested a review from smowton October 9, 2019 12:33
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 8f51eef).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/131132137
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

@JohnDumbell
Copy link
Contributor Author

Now good to review. Kudos to @smowton for chunks of this.

Copy link
Contributor

@owen-mc-diffblue owen-mc-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm part-way through reviewing this

Copy link
Contributor

@owen-mc-diffblue owen-mc-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You should add a comment saying where you got the algorithm from, i.e. figure 3 in "A Simple, Fast Dominance Algorithm" by Cooper, Harvey and Kennedy. This means you don't have to explain why it does what you want it to.
  2. You haven't added any tests. Is this sufficiently tested? You're making quite a big change.

@JohnDumbell
Copy link
Contributor Author

@owen-mc-diffblue Applied all comments. There's mention of the paper in the class documentation, that good enough? (I have added your comment about intersect diagram though) Otherwise there are enough existing tests to cover this so that no new ones should be needed. Probably.

@JohnDumbell JohnDumbell force-pushed the jd/feature/dom_cooper_improvement branch from 8f51eef to 5573904 Compare October 15, 2019 13:05
Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks?

get_reverse_postordered_instructions(std::size_t entry_node_index) const;

/// Walks up the dominators of left/right nodes until it finds one that is
/// common to both sides. Used to work out a the common immediate dominator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a the common immediate dominator" -> "the least common dominator" (i.e. the dominator that is common to both, and which dominates no other common dominator)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'least common denominator' a known phrase? It reads really awkwardly otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the go-to phrase is in fact "nearest common dominator"

if(node.postorder_index != nodet::NODE_NOT_VISITED)
return;

// Otherwise set that we've processed it, and put children on the stack.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't put the children, we put the individual node (along with initialising that node's child iterators), as the method name suggests

std::size_t next_postorder_index = 0;
std::vector<stack_entryt> stack;

auto place_node_on_stack_if_not_visited = [this,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if you can find a way to please clang-format that's less odd than this


std::vector<T> order;

// Note that this will only select and order nodes that have been processed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that have been processed -> that are reachable from the entry point

@@ -115,6 +246,26 @@ class cfg_dominators_templatet
protected:
void initialise(P &program);
void fixedpoint(P &program);

/// Goal of this is to assign post-order numbering to each node in the tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Goal of this is to assign" -> "Assigns"

void assign_postordering(std::size_t start_node_index);

/// Sort our CFG nodes in reverse post-order. Just gets every node that we've
/// processed (anything not NODE_NOT_VISITED) then order from highest to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

order -> orders

while(n_it!=node.dominators.end() && o_it!=other.end())
// compute intersection of predecessors
auto &edges = (post_dom ? node.out : node.in);
if(edges.size() != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is still pointless (the for-loop will terminate immediately if there are no edges)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caused exceptions at one point when attempting to get the in/out iterators, hopefully not the case anymore.

@JohnDumbell JohnDumbell force-pushed the jd/feature/dom_cooper_improvement branch from 5573904 to b7dd30d Compare October 15, 2019 14:39
@smowton
Copy link
Contributor

smowton commented Oct 15, 2019

Looks good except needs benchmarks. If in doubt ask @hannes-steffenhagen-diffblue for the benchmarking he did when trying to replace the dominators algorithm with Tarjan's.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 5573904).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/131999602
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: b7dd30d).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/132007774
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

Change existing API calls to use the new immediate-dominator version
of dominator analysis.
@JohnDumbell JohnDumbell force-pushed the jd/feature/dom_cooper_improvement branch from b7dd30d to 533c35a Compare October 16, 2019 09:23
Copy link
Contributor

@allredj allredj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️
This PR failed Diffblue compatibility checks (cbmc commit: 533c35a).
Build URL: https://travis-ci.com/diffblue/test-gen/builds/132137431
Status will be re-evaluated on next push.
Common spurious failures include: the cbmc commit has disappeared in the mean time (e.g. in a force-push); the author is not in the list of contributors (e.g. first-time contributors); compatibility was already broken by an earlier merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some intermediate comments from a couple days ago, sorry if these are out of date now

@@ -12,15 +12,16 @@ Author: Georg Weissenbacher, [email protected]
#ifndef CPROVER_ANALYSES_CFG_DOMINATORS_H
#define CPROVER_ANALYSES_CFG_DOMINATORS_H

#include <set>
#include <cassert>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that these are sorted we might as well remove this header, <cassert> shouldn't be needed anywhere.

{
}

private:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reorder this so it doesn't keep switching between private and public things?

{
advance();
return *this;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚫

The logic for pre- and post increment is the wrong way round. operator++() should advance() then return *this, operator++(int) should save *this, advance() and then return the saved iterator.

const auto &next_dominator = current_node.dominator;
INVARIANT(
current_node.postorder_index ==
cfg_dominatorst::nodet::NODE_NOT_VISITED ||

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you wrap this up in something like is_reachable(), or to be more honest is_potentially_reachable?

}

/// Returns true if program point \p lhs dominates \p rhs.
/// Note by definition all program points dominate themselves.
bool dominates(T lhs, T rhs) const
{
return dominates(lhs, get_node(rhs));
const auto rhs_dominators = dominators(rhs);
return std::any_of(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very surprised there isn't something like std::contains for this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants