Skip to content

Rework memory allocation failure checks and options #5401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

danpoe
Copy link
Contributor

@danpoe danpoe commented Jun 30, 2020

  • Each commit message has a non-empty body, explaining why the change was made.
  • Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • n/a My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

Copy link
Contributor

@NlightNFotis NlightNFotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far, only one question

@@ -58,7 +58,7 @@ void goto_check(
" --no-built-in-assertions ignore assertions in built-in library\n" \
" --enum-range-check checks that all enum type expressions have values in the enum range\n" /* NOLINT(whitespace/line_length) */ \
" --pointer-primitive-check checks that all pointers in pointer primitives are valid or null\n" /* NOLINT(whitespace/line_length) */ \
" --allocate-size-check checks that not more memory is allocated than can be addressed by cbmc" /* NOLINT(whitespace/line_length) */
<< help_entry("--allocate-size-check", "checks that not more memory is allocated than can be addressed by cbmc") /* NOLINT(whitespace/line_length) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the help_entry function. Where was it introduced? Is there a CBMC equivalent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defined in parse_options.cpp. I've introduced it a few weeks ago. It's supposed to make writing the help entries easier, by automatically wrapping the description at 80 characters, and indenting it by 30 characters.

Copy link
Collaborator

@martin-cs martin-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this works and is a good implementation of the idea but I don't see why it is needed. To my mind, having the modelling of malloc, including the checks in the library makes way more sense. Why is this PR an improvement?

@danpoe
Copy link
Contributor Author

danpoe commented Jun 30, 2020

@martin-cs: The PR essentially does two things:

  1. It fixes the issue raised by @hannes-steffenhagen-diffblue's on the previous PR (see Add malloc-may-fail option to goto-check #5212)
  2. The checking of the allocated size (flag --allocate-size-check) is now done for __CPROVER_allocate(), which is used by the models. This means the models are simpler, the check is only present when the flag is given, and it will also work for plain __CPROVER_allocate() statements. That this is implemented in goto_checkt is consistent with how other primitives like __CPROVER_r_ok, etc. are handled.

Copy link
Contributor

@smowton smowton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should reword the PR, this isn't a refactor, which I would expect to change no behaviour. Approving to remove my codeowner block (I don't mind either way), but you should certainly win the argument with @martin-cs before merging :)

@@ -56,7 +57,8 @@ void goto_check(
" --nan-check check floating-point for NaN\n" \
" --no-built-in-assertions ignore assertions in built-in library\n" \
" --enum-range-check checks that all enum type expressions have values in the enum range\n" /* NOLINT(whitespace/line_length) */ \
" --pointer-primitive-check checks that all pointers in pointer primitives are valid or null\n" /* NOLINT(whitespace/line_length) */
" --pointer-primitive-check checks that all pointers in pointer primitives are valid or null\n" /* NOLINT(whitespace/line_length) */ \
" --allocate-size-check checks that not more memory is allocated than can be addressed by cbmc" /* NOLINT(whitespace/line_length) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checks that no individual allocation exceeds CBMC's addresss space limit? After all total alloc can still be very large

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// NOLINTNEXTLINE(whitespace/line_length)
" --malloc-may-fail allow malloc calls to return a null pointer\n"
<< help_entry(
"--allocate-size-null",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overly-large-allocation-returns-null? The other option would then be overly-large-allocation-asserts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@danpoe danpoe changed the title Refactor memory allocation failure checks and options Reword memory allocation failure checks and options Jul 1, 2020
@danpoe danpoe changed the title Reword memory allocation failure checks and options Rework memory allocation failure checks and options Jul 1, 2020
@danpoe danpoe added the aws Bugs or features of importance to AWS CBMC users label Jul 1, 2020
@martin-cs
Copy link
Collaborator

Is #5121 the right issue? I don't see the link here.

I think the model way of checking these is better because it is more flexible. It can be turned on and off during analysis, the limits can be varied, etc. But I appreciate that this is just my opinion. More importantly than where it is, having all of the modelling / instrumentation for memory allocation in one place is a good thing IMHO.

@danpoe
Copy link
Contributor Author

danpoe commented Jul 1, 2020

Ah, it's #5212. I think it depends on whether one views this as a modelling issue or not. The flag --allocate-size-check checks that an implementation limit of cbmc isn't hit. The most straightforward way of doing this is to instrument a precondition before every __CPROVER_allocate() invocation. On the other hand, whether malloc() can nondeterministically return null is more of a modelling question, and the code for this is still part of the model.

@martin-cs
Copy link
Collaborator

I appreciate the distinction and it is a useful thing to note.

#5210 looks more relevant to me.

Sorry to be awkward but this feels like something that needs a bit more discussion. If this is checking an internal limit of the 'base + offset' pointer model so that we don't generate spurious models then...

A. Why should it be an option? Why would you want it turned off?
B. Wouldn't 'assert then assume' be the right way to handle this because we can't reasonably proceed with an allocation that is too large.
C. Doesn't this also affect global (or even stack allocated!) objects above a certain size?

@danpoe
Copy link
Contributor Author

danpoe commented Jul 1, 2020

Yes, #5210 is the first of the two related PRs. The issue though that this one fixes was pointed out by @hannes-steffenhagen-diffblue on #5212.

A. If you know the limit will not be hit you don't need to turn on the option (or if you're using cbmc for bug finding). Currently cbmc adds no checks by default, and all of them need to be explicitely enabled (e.g., --pointer-check, etc.)
B. Not necessarily. It's done like that for all goto_checkt checks, e.g. array bounds checks, even though one could question how much sense it makes to continue to execute after an out of bounds access.
C. Yes, I think it also affects those.

There's also other cbmc primitives that have preconditions for which checks need to be explicitely enabled (see http://cprover.diffblue.com/memory-primitives.html).

@danpoe danpoe force-pushed the refactor/malloc branch from 252be8c to fa4b305 Compare July 2, 2020 13:12
danpoe added 5 commits July 2, 2020 14:13
The option adds an assertion to check that not more memory is allocated
with __CPROVER_allocate() than cbmc can address. Since the primitive is
used by the models of malloc(), etc., this also guarantees that those
function don't allocate more memory than can be addressed.
Adapt the help messages of all the tools to work with the macro
HELP_GOTO_CHECK which is now using help_entry() for the entry for
--overly-large-allocation-check
Adapt the malloc failure regression tests to use the new options
--overly-large-allocation-check, --overly-large-allocation-returns-null,
and --allocate-may-fail
Due to the changes to the malloc models, less assertions are now
generated by default. Thus, adapt the regexes `<n> of <m> failed` in the
test.desc files
@danpoe danpoe force-pushed the refactor/malloc branch from fa4b305 to 02172cc Compare July 2, 2020 13:15
Copy link
Contributor

@chrisr-diffblue chrisr-diffblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good to me, but I have two requests before I'm happy to approve :-)

  1. Get an 'OK' from @martin-cs
  2. Update the user docs - I think some of the removed command line options were possibly documented, if so, they'll need removing from the docs too, plus add user docs for the new options too.

@danpoe
Copy link
Contributor Author

danpoe commented Jul 3, 2020

@martin-cs Would you be ok with this going in for now? I agree it's not a perfect or complete solution, it just would help us as it fixes an issue for us.

@martin-cs
Copy link
Collaborator

@danpoe Sorry for the lag. I think it is possible to sort this out but let's just check we are talking about the same thing. This program shows the problem:

#define BIG_NUMBER   // something bigger than 1 << (sizeof(void*)*CHAR_BITS - (--object-bits n))

char big_array[BIG_NUMBER];
char other_array[256];

int main (int argc, char **argv) {
  char *b = big_array;
  char *c = other_array;

  *(b + (BIG_NUMBER - 1) = '\0';  // Should modify big_array not other_array

  char *p = malloc(BIG_NUMBER);
  char *q = malloc(256);

  *(p + (BIG_NUMBER - 1)) = '\0'; // Should modify the first allocated block, not the second one

  return 0;
}

So the root cause of the problem is the split between object and offset in the representation of pointers in the default back-end? If the offset is too high then it can overflow into the object, resulting in incorrect answers.

The patch you are proposing will add a command-line option to CBMC which will insert assertions everywhere __CPROVER_alloc is used. When the flag is used it will indicate when the program is able to dynamically allocate a block of memory that /might/ have this problem.

Have I understood the problem?

  1. Ideally we need to solve the actual root of the problem which is the overflow (or underflow) from offset to object. The back-end effectively already tracks when a pointer is becoming invalid as it is one of the overflow bits in the adder when modifying the offset. It would take a little thought and care to expose this to the front-end / instrumentation and you would need to decide if the problem was A. creating an invalid pointer or using an invalid pointer, B. whether making the pointer valid again is possible.

  2. A quicker fix would be to prevent objects that are too large being created assuming that bounds checks are used. This doesn't actually solve the problem as offset overflow can modify a pointer so that it is still in-bounds for another object. Regardless, this should be possible by checking global and local variables and putting assert/assume before calls to __CPROVER_alloc. As it is an internal function and only used in a few place I think this makes more sense to do in the modelled C library. As long as --object-bits is converted into a global variable this should be a trivial patch.

  3. I think there is an important question about what the default should be. I can't see a reason not to have this on by default. Ordered compare with constants, particularly powers of 2, are about as cheap as you can get. If it is on by default then in the model makes more sense to me.

HTH

@danpoe
Copy link
Contributor Author

danpoe commented Jul 16, 2020

Yes, I agree with your analysis, the allocation of a block for which not all parts can be addressed (in the realm of cbmc) is not in itself a problem. So yes the best fix would probably be to catch the pointer overflows instead.

It turns out that --pointer-overflow-check only checks for overflows of the complete bitvector, as opposed to overflows of the offset part only. The offset part does not overflow into the the object bits part though but wraps around.

Btw, if the result of pointer arithmetic points outside the pointed-to memory block, the behavior is undefined (except pointing to one past the last element which is allowed). See, e.g., here: https://blog.regehr.org/archives/1395.

So a possible solution would be:

  • add an always on check which verifies that the offset part of a pointer does not overflow

  • modify --pointer-overflow-check such that it verifies that the result of pointer arithmetic does not point outside the pointed-to object

@martin-cs
Copy link
Collaborator

I like the proposed course of action. Thanks for giving this the extra consideration. Limitations in how we handle pointers should not conceal bugs, so I like the always on check for offset overflow. I can't see what the use of checking the overflow of a whole pointer is, so modifying --pointer-overflow-check sounds good to me. The proposed semantics seem good to me because it catches an extra kind of UB. I think having it as an option still makes sense because I can see cases (sort of like 1.B. above) where people assume that any kind of pointer arithmetic is fine as long as they only access in the object.

@martin-cs
Copy link
Collaborator

@danpoe Might your suggestion also address #5096 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants