Instrument strchr to use the string refinement solver #5200

xbauch · 2019-12-05T15:11:50Z

Reusing and extending the Java-based indexOf. The string refinement assumes that certain associations are made between the char-pointer input, (some) array string (, and it's length). There are also assumption about exactly what kinds of expressions can be passed to the string solver representing string. We added some special handling for byte-extract expressions and string-constants. These modification should not change the java part.

strchr choice is just for demonstration, the intent is to implement all/most of the string.h functions.

Each commit message has a non-empty body, explaining why the change was made.
Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
My commit message includes data points confirming performance improvements (if claimed).
My PR is restricted to a single feature or bugfix.
White-space or formatting changes outside the feature-related changed lines are in commits of their own.

hannes-steffenhagen-diffblue · 2019-12-05T17:37:07Z

regression/cbmc/string-refinement-strchr2/test.c

+int main()
+{
+  int i;
+  char *test = i ? "fo\0tis" : "notfoti";


There should be a test for the case when a character can’t be found as well, e.g. in this case 't' may or may not be found depending on i.

thk123 · 2020-01-30T16:52:22Z

Note, this currently fails CI on Travis on the following test:

Running string-refinement-strchr2/test.desc  [FAILED]

Specifically fails to find:

^\[main.assertion.\d+\] line \d+ assertion first_o != NULL: SUCCESS$

Instead:

[main.assertion.1] line 10 assertion first_o != ((void *)0): SUCCESS

I guess different platform, different way of printing null?

For handling byte-extract expression, string-constants etc., in places that expect string arrays. Also slightly extend the definition of char-type.

into a new c-specific axiom-generating function. Basically, the addition encodes the existence of terminating-zero and the properties this entails.

by 1: associating the `src` with a new infinite array, 2: associating the array with its length variable, 3: applying the c_index_of function 4: translating the returned index to the expected char-pointer/NULL.

extracting string arrays from byte-expressions and string constants, hiding the `symex::args` renaming, etc.

using the string-refinement solver.

from offset 0, i.e. the whole array should be the result.

thk123

Neat! I have a number of suggestions of ways we can make the quite dense string solver code a bit more readable, happy to help with applying them.

I would like to see the coverage report as there ends up being lots of new code, and I don't really know when it would be triggered, so would be good to check we're testing it all. Hopefully will just appear, but maybe needs a rebase?

thk123 · 2020-01-30T16:54:47Z

src/solvers/strings/string_refinement_util.cpp

+           STRING_REFINEMENT_MAX_CHAR_WIDTH;
+
+  if(type.id() == ID_signedbv)
+    return to_signedbv_type(type).get_width() <= 8;


🚫 Magic number here should have some explanation

documentation of the function needs to be updated, and does it make sense to have signed characters?

@romainbrenguier char is allowed to be either signed or unsigned in C.

Turns out the magic is not needed at all.

thk123 · 2020-01-30T16:55:35Z

src/solvers/strings/string_refinement_util.cpp

+
+  const auto &byte_extract_expr = to_byte_extract_expr(expr);
+  const auto &offset = byte_extract_expr.offset();
+  PRECONDITION(offset.id() == ID_constant);


❔ how come this is a precondition? Seems like should be documented at least

thk123 · 2020-01-30T16:55:57Z

src/solvers/strings/string_refinement_util.cpp

+  const auto &byte_extract_expr = to_byte_extract_expr(expr);
+  const auto &offset = byte_extract_expr.offset();
+  PRECONDITION(offset.id() == ID_constant);
+  if(offset.id() != ID_constant)


❓ Unreachable? You've just asserted this isn't true

thk123 · 2020-01-30T16:56:48Z

src/solvers/strings/string_refinement_util.cpp

@@ -192,3 +200,68 @@ array_exprt interval_sparse_arrayt::concretize(
    size, it == entries.end() ? default_value : it->second);
  return array;
 }
+
+exprt maybe_byte_extract_expr(const exprt &expr)


❓ What's "maybe" about this? Would returning an optional make sense here?

Well it always returns some expr. But it acts as identity for expressions that are not byte-extract.

thk123 · 2020-01-30T16:57:25Z

src/solvers/strings/string_refinement_util.cpp

+
+exprt maybe_remove_string_exprs(const exprt &expr)
+{
+  return [&]() {


❓ What's the purpose of this lambda...

I am fairly sure there isn’t one. Probably happened by accident during a refactor.

thk123 · 2020-02-07T14:04:07Z

src/goto-programs/string_instrumentation.cpp

-  assertion->source_location.set_property_class("string");
-  assertion->source_location.set_comment(
-    "zero-termination of string argument of strchr");
+  auto new_aux_symbol =


❓ Don't you have a function that does this?

thk123 · 2020-02-07T14:04:52Z

src/goto-programs/string_instrumentation.h

@@ -41,16 +41,19 @@ class incorrect_source_program_exceptiont : public cprover_exception_baset
 void string_instrumentation(
  symbol_tablet &,
  message_handlert &,
-  goto_programt &);
+  goto_programt &,
+  size_t);


⛏️ naming variables in the header (particularly ones where the type tells you nothing) helps users of the function to figure out what they should be passing in

thk123 · 2020-02-07T14:05:41Z

src/solvers/strings/array_pool.cpp

@@ -7,6 +7,8 @@ Author: Romain Brenguier, romain.brenguier@diffblue.com
 \*******************************************************************/

 #include "array_pool.h"
+#include "string_refinement_util.h"


⛏️ etc. in the commit message is less than ideal, perhaps it should be multiple commits, but should almost certainly list all the changes

thk123 · 2020-02-07T14:07:15Z

src/solvers/strings/string_refinement.cpp

+    if(
+      rhs.id() == ID_symbol &&
+      has_prefix(
+        id2string(to_symbol_expr(rhs).get_identifier()), "symex::args::"))


hard coded string seems like a bad idea, is it defined anywhere?

thk123 · 2020-02-07T14:08:36Z

src/util/pointer_offset_size.cpp

@@ -632,6 +632,17 @@ optionalt<exprt> get_subexpression_at_offset(
    return typecast_exprt(expr, target_type_raw);
  }

+  if(


🚫 Please add why you need to do this (particularly in light of the comment yesterday about perf cost of byte_extract

romainbrenguier · 2020-02-07T14:16:17Z

src/solvers/strings/array_pool.cpp

+    attempt_assign_length_from_type(
+      to_array_string_expr(if_expr.false_case()),
+      length_of_array,
+      symbol_generator);


that doesn't seem correct, the length could be assigned twice with different value, while it only has to be equal to one of these depending on the condition

romainbrenguier · 2020-02-07T14:19:31Z

src/solvers/strings/string_refinement_util.cpp

+           STRING_REFINEMENT_MAX_CHAR_WIDTH;
+
+  if(type.id() == ID_signedbv)
+    return to_signedbv_type(type).get_width() <= 8;


documentation of the function needs to be updated, and does it make sense to have signed characters?

majakusber · 2020-02-07T15:32:16Z

Codecov Report

Merging #5200 into develop will increase coverage by 0.07%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop    #5200      +/-   ##
===========================================
+ Coverage    67.43%   67.51%   +0.07%     
===========================================
  Files         1157     1157              
  Lines        95354    95536     +182     
===========================================
+ Hits         64306    64504     +198     
+ Misses       31048    31032      -16

Flag	Coverage Δ
#cproversmt2	`42.55% <2.97%> (-0.13%)`	⬇️
#regression	`64.05% <87.12%> (+0.09%)`	⬆️
#unit	`31.90% <13.36%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
src/goto-programs/string_instrumentation.cpp	`28.49% <0.00%> (+28.49%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cccdfd9...0879ad0. Read the comment docs.

thk123 · 2020-02-07T15:38:51Z

Looks like two big areas not currently exercised:

maybe_byte_extract_expr never gets a non-zero offset (here)
Any of the new code for string_instrumentationt::do_strchr (here)

Not sure exactly what they'd require (possibly the 2nd one is just the code isn't needed?)

martin-cs

I'm not the expert on the string solver so perhaps it is not my place to say but this doesn't feel like a particularly good design for supporting a large number of string functions. My experience suggests that you want a minimal number of (ideally orthogonal) primitives that are supported by the solver and the rest of the complexity and modelling can be in the library models. That is a lot easier to edit, test (given an executable form of the primitives), understand and modify.

martin-cs · 2020-02-07T16:56:02Z

src/cbmc/cbmc_parse_options.cpp

-  if(options.get_bool_option("string-abstraction"))
-    string_instrumentation(goto_model, log.get_message_handler());
+  if(
+    options.get_bool_option("string-abstraction") ||


Maybe I am getting mixed up but I thought the "string-abstraction" options did something very different? Maybe that was in goto-instrument.

romainbrenguier · 2020-02-10T10:28:42Z

@martin-cs

I'm not the expert on the string solver so perhaps it is not my place to say but this doesn't feel like a particularly good design for supporting a large number of string functions. My experience suggests that you want a minimal number of (ideally orthogonal) primitives that are supported by the solver and the rest of the complexity and modelling can be in the library models. That is a lot easier to edit, test (given an executable form of the primitives), understand and modify.

Yes, I agree it would be better to avoid adding new primitives but rather reuse the existing ones and use models to deal with the differences between languages.

thk123 · 2020-02-11T15:32:55Z

When trying this branch out, I noticed the following example on this branch causes an invariant to be hit:

#include <assert.h>
#include <string.h>

void testFunction() {
  int size;
  __CPROVER_assume(size >= 10 && size <= 100);
  char *str = malloc(sizeof(char) * size);
  for(int i = 0; i < size-1; ++i)
  {
    char nondet_char;
    str[i] = nondet_char;
  }
  str[size - 1] = '\0';

  char* result = strchr(str, 'z');
  assert(result - str != 15);
}

xbauch · 2020-02-21T17:17:58Z

@martin-cs and @romainbrenguier there's a draft PR (#5239) which implements strchr by modelling the functionality inside string.h library and then instrumenting it to call the index-of directly. Have a look if you think that approach would be preferable to separating Java and C implementation. Note that there are some incompatible assumptions about types of things (array-size-type, char-type, etc.) between Java and C, so some changes to the originally java-only primitives are necessary.

hannes-steffenhagen-diffblue reviewed Dec 5, 2019

View reviewed changes

hannes-steffenhagen-diffblue and others added 6 commits February 7, 2020 13:19

Add c-specific refinement utils

941eed4

For handling byte-extract expression, string-constants etc., in places that expect string arrays. Also slightly extend the definition of char-type.

Extend index-of for c

8d82639

into a new c-specific axiom-generating function. Basically, the addition encodes the existence of terminating-zero and the properties this entails.

Implement strchr in terms of index-of

3f76f13

by 1: associating the `src` with a new infinite array, 2: associating the array with its length variable, 3: applying the c_index_of function 4: translating the returned index to the expected char-pointer/NULL.

Handle the c-specific cases in string refinement

80c88cc

extracting string arrays from byte-expressions and string constants, hiding the `symex::args` renaming, etc.

Add tests for strchr

ddf72d1

using the string-refinement solver.

Add case for byte-extracting arrays

0879ad0

from offset 0, i.e. the whole array should be the result.

xbauch force-pushed the feature/c-string-solver branch from d6f02c0 to 0879ad0 Compare February 7, 2020 13:24

xbauch marked this pull request as ready for review February 7, 2020 13:25

xbauch requested review from kroening, martin-cs, peterschrammel, pkesseli, romainbrenguier, smowton and tautschnig as code owners February 7, 2020 13:25

thk123 suggested changes Feb 7, 2020

View reviewed changes

romainbrenguier reviewed Feb 7, 2020

View reviewed changes

thk123 mentioned this pull request Feb 7, 2020

Unit tests for simplifying boolean and cast expressions #5229

Merged

5 tasks

martin-cs reviewed Feb 7, 2020

View reviewed changes

xbauch mentioned this pull request Feb 21, 2020

Model strchr inside string.h #5239

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instrument strchr to use the string refinement solver #5200

Instrument strchr to use the string refinement solver #5200

xbauch commented Dec 5, 2019

hannes-steffenhagen-diffblue Dec 5, 2019

thk123 commented Jan 30, 2020

thk123 left a comment

thk123 Jan 30, 2020

romainbrenguier Feb 7, 2020

hannes-steffenhagen-diffblue Feb 7, 2020

xbauch Feb 7, 2020

thk123 Jan 30, 2020

thk123 Jan 30, 2020

thk123 Jan 30, 2020

xbauch Feb 7, 2020

thk123 Jan 30, 2020

hannes-steffenhagen-diffblue Feb 7, 2020

thk123 Feb 7, 2020

thk123 Feb 7, 2020

thk123 Feb 7, 2020

thk123 Feb 7, 2020

thk123 Feb 7, 2020

romainbrenguier Feb 7, 2020

romainbrenguier Feb 7, 2020

majakusber commented Feb 7, 2020 •

edited

Loading

thk123 commented Feb 7, 2020

martin-cs left a comment

martin-cs Feb 7, 2020

romainbrenguier commented Feb 10, 2020

thk123 commented Feb 11, 2020

xbauch commented Feb 21, 2020

Instrument strchr to use the string refinement solver #5200

Are you sure you want to change the base?

Instrument strchr to use the string refinement solver #5200

Conversation

xbauch commented Dec 5, 2019

Choose a reason for hiding this comment

thk123 commented Jan 30, 2020

thk123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

majakusber commented Feb 7, 2020 • edited Loading

Codecov Report

thk123 commented Feb 7, 2020

martin-cs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romainbrenguier commented Feb 10, 2020

thk123 commented Feb 11, 2020

xbauch commented Feb 21, 2020

majakusber commented Feb 7, 2020 •

edited

Loading