-
Notifications
You must be signed in to change notification settings - Fork 273
Instrument strchr to use the string refinement solver #5200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
int main() | ||
{ | ||
int i; | ||
char *test = i ? "fo\0tis" : "notfoti"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a test for the case when a character can’t be found as well, e.g. in this case 't'
may or may not be found depending on i
.
Note, this currently fails CI on Travis on the following test:
Specifically fails to find:
Instead:
I guess different platform, different way of printing null? |
For handling byte-extract expression, string-constants etc., in places that expect string arrays. Also slightly extend the definition of char-type.
into a new c-specific axiom-generating function. Basically, the addition encodes the existence of terminating-zero and the properties this entails.
by 1: associating the `src` with a new infinite array, 2: associating the array with its length variable, 3: applying the c_index_of function 4: translating the returned index to the expected char-pointer/NULL.
extracting string arrays from byte-expressions and string constants, hiding the `symex::args` renaming, etc.
using the string-refinement solver.
from offset 0, i.e. the whole array should be the result.
d6f02c0
to
0879ad0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat! I have a number of suggestions of ways we can make the quite dense string solver code a bit more readable, happy to help with applying them.
I would like to see the coverage report as there ends up being lots of new code, and I don't really know when it would be triggered, so would be good to check we're testing it all. Hopefully will just appear, but maybe needs a rebase?
STRING_REFINEMENT_MAX_CHAR_WIDTH; | ||
|
||
if(type.id() == ID_signedbv) | ||
return to_signedbv_type(type).get_width() <= 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 Magic number here should have some explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation of the function needs to be updated, and does it make sense to have signed characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romainbrenguier char
is allowed to be either signed
or unsigned
in C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the magic is not needed at all.
|
||
const auto &byte_extract_expr = to_byte_extract_expr(expr); | ||
const auto &offset = byte_extract_expr.offset(); | ||
PRECONDITION(offset.id() == ID_constant); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❔ how come this is a precondition? Seems like should be documented at least
const auto &byte_extract_expr = to_byte_extract_expr(expr); | ||
const auto &offset = byte_extract_expr.offset(); | ||
PRECONDITION(offset.id() == ID_constant); | ||
if(offset.id() != ID_constant) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Unreachable? You've just asserted this isn't true
@@ -192,3 +200,68 @@ array_exprt interval_sparse_arrayt::concretize( | |||
size, it == entries.end() ? default_value : it->second); | |||
return array; | |||
} | |||
|
|||
exprt maybe_byte_extract_expr(const exprt &expr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ What's "maybe" about this? Would returning an optional make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it always returns some expr
. But it acts as identity for expressions that are not byte-extract
.
|
||
exprt maybe_remove_string_exprs(const exprt &expr) | ||
{ | ||
return [&]() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ What's the purpose of this lambda...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fairly sure there isn’t one. Probably happened by accident during a refactor.
assertion->source_location.set_property_class("string"); | ||
assertion->source_location.set_comment( | ||
"zero-termination of string argument of strchr"); | ||
auto new_aux_symbol = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Don't you have a function that does this?
@@ -41,16 +41,19 @@ class incorrect_source_program_exceptiont : public cprover_exception_baset | |||
void string_instrumentation( | |||
symbol_tablet &, | |||
message_handlert &, | |||
goto_programt &); | |||
goto_programt &, | |||
size_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ naming variables in the header (particularly ones where the type tells you nothing) helps users of the function to figure out what they should be passing in
@@ -7,6 +7,8 @@ Author: Romain Brenguier, romain.brenguier@diffblue.com | |||
\*******************************************************************/ | |||
|
|||
#include "array_pool.h" | |||
#include "string_refinement_util.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛏️ etc.
in the commit message is less than ideal, perhaps it should be multiple commits, but should almost certainly list all the changes
if( | ||
rhs.id() == ID_symbol && | ||
has_prefix( | ||
id2string(to_symbol_expr(rhs).get_identifier()), "symex::args::")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard coded string seems like a bad idea, is it defined anywhere?
@@ -632,6 +632,17 @@ optionalt<exprt> get_subexpression_at_offset( | |||
return typecast_exprt(expr, target_type_raw); | |||
} | |||
|
|||
if( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚫 Please add why you need to do this (particularly in light of the comment yesterday about perf cost of byte_extract
attempt_assign_length_from_type( | ||
to_array_string_expr(if_expr.false_case()), | ||
length_of_array, | ||
symbol_generator); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesn't seem correct, the length could be assigned twice with different value, while it only has to be equal to one of these depending on the condition
STRING_REFINEMENT_MAX_CHAR_WIDTH; | ||
|
||
if(type.id() == ID_signedbv) | ||
return to_signedbv_type(type).get_width() <= 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
documentation of the function needs to be updated, and does it make sense to have signed characters?
Codecov Report
@@ Coverage Diff @@
## develop #5200 +/- ##
===========================================
+ Coverage 67.43% 67.51% +0.07%
===========================================
Files 1157 1157
Lines 95354 95536 +182
===========================================
+ Hits 64306 64504 +198
+ Misses 31048 31032 -16
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the expert on the string solver so perhaps it is not my place to say but this doesn't feel like a particularly good design for supporting a large number of string functions. My experience suggests that you want a minimal number of (ideally orthogonal) primitives that are supported by the solver and the rest of the complexity and modelling can be in the library models. That is a lot easier to edit, test (given an executable form of the primitives), understand and modify.
if(options.get_bool_option("string-abstraction")) | ||
string_instrumentation(goto_model, log.get_message_handler()); | ||
if( | ||
options.get_bool_option("string-abstraction") || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am getting mixed up but I thought the "string-abstraction" options did something very different? Maybe that was in goto-instrument.
Yes, I agree it would be better to avoid adding new primitives but rather reuse the existing ones and use models to deal with the differences between languages. |
When trying this branch out, I noticed the following example on this branch causes an invariant to be hit: #include <assert.h>
#include <string.h>
void testFunction() {
int size;
__CPROVER_assume(size >= 10 && size <= 100);
char *str = malloc(sizeof(char) * size);
for(int i = 0; i < size-1; ++i)
{
char nondet_char;
str[i] = nondet_char;
}
str[size - 1] = '\0';
char* result = strchr(str, 'z');
assert(result - str != 15);
} |
@martin-cs and @romainbrenguier there's a draft PR (#5239) which implements |
Reusing and extending the Java-based
indexOf
. The string refinement assumes that certain associations are made between the char-pointer input, (some) array string (, and it's length). There are also assumption about exactly what kinds of expressions can be passed to the string solver representing string. We added some special handling forbyte-extract
expressions andstring-constants
. These modification should not change the java part.strchr
choice is just for demonstration, the intent is to implement all/most of thestring.h
functions.