-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for discounted properties in MDPs and DTMCs #621
base: master
Are you sure you want to change the base?
Conversation
This is absolutely great. I have two questions:
Best, |
Yes, convergence is detected using the Bellman residual, see The error messages are not implemented, I will take care of it! I'm happy about any pointers where such problems might occur. |
Thanks @sjunges for the review! I incorporated your comments now. |
Thanks! @tquatmann do you want to have a look before merging? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great work!
- Do we (have to) enforce somewhere that the discount factor is strictly between 0 and 1? For both formula types (total and cumulative formulas)? I might have missed this in my review
- I'd suggest to add more tests, in particular in
FormulaParserTest
,DtmcPrctlModelCheckerTest
, andSchedulerGenerationMdpPrctlModelCheckerTest
. The existing test also doesn't considerDiscountedCumulativeRewardFormula
s
|
||
namespace storm { | ||
namespace logic { | ||
class DiscountedCumulativeRewardFormula : public CumulativeRewardFormula { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that .isCumulativeRewardFormula()
is also true forDiscountedCumulativeRewardFormula
s.
Does this have any unintended effects? (e.g. places like multi-objective model checking, where the discounted formula would be treaded as if it were undiscounted)? Maybe it's safer to also override .isCumulativeRewardFormula()
to false
here, so things will rather fail instead of silently dropping the discount factor.
Also, this probably needs to override gatherUsedVariables
to make sure that variables in the discount factor are catched.
Same for TotalRewardFormulas
optionalRewardAccumulation = f.getRewardAccumulation(); | ||
} | ||
return std::static_pointer_cast<Formula>( | ||
std::make_shared<DiscountedCumulativeRewardFormula>(f.getDiscountFactor(), bounds, timeBoundReferences, optionalRewardAccumulation)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to apply substitution function on the discount factor
@@ -24,6 +24,7 @@ class ExpressionSubstitutionVisitor : public CloneVisitor { | |||
virtual boost::any visit(RewardOperatorFormula const& f, boost::any const& data) const override; | |||
virtual boost::any visit(BoundedUntilFormula const& f, boost::any const& data) const override; | |||
virtual boost::any visit(CumulativeRewardFormula const& f, boost::any const& data) const override; | |||
virtual boost::any visit(DiscountedCumulativeRewardFormula const& f, boost::any const& data) const override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needs a case for the DiscountedTotalRewardFormula
(to apply substitution on the discount factor)
FormulaInformation result; | ||
result.setContainsCumulativeRewardFormula(true); | ||
for (unsigned i = 0; i < f.getDimension(); ++i) { | ||
if (f.getTimeBoundReference(i).isRewardBound()) { | ||
result.setContainsRewardBoundedFormula(true); | ||
} | ||
} | ||
return result; | ||
} | ||
|
||
boost::any FormulaInformationVisitor::visit(DiscountedTotalRewardFormula const&, boost::any const&) const { | ||
return FormulaInformation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't these need a result.setContainsDiscountFormula(true)
?
DiscountingHelper(storm::storage::SparseMatrix<ValueType> const& A, ValueType discountFactor); | ||
DiscountingHelper(storm::storage::SparseMatrix<ValueType> const& A, ValueType discountFactor, bool trackScheduler); | ||
|
||
void setUpViOperator() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to makesetUpViOperator
private and call it within solveWithDiscountedValueIteration
?
// Initialize result to the zero vector. | ||
std::vector<ValueType> result(transitionMatrix.getRowGroupCount(), storm::utility::zero<ValueType>()); | ||
|
||
auto multiplier = storm::solver::MultiplierFactory<ValueType>().create(env, transitionMatrix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not use the discountFactor
, right?
This PR adds support for checking cumulative and total reward properties with a given discount factor on MDPs and DTMCs.
R=? [Cdiscount=_factor_]