Cache z3 results #464

iirekm · 2025-02-20T14:02:40Z

z3 can be slow, very slow when codebase grows. On-disk caching would help a lot during development (only new things would be computed).
I tried to do ad-hoc caching with the following wrapper around z3:

#!/bin/bash -e

# Cache directory
CACHE_DIR="$HOME/.z3cache"
mkdir -p "$CACHE_DIR"

# Check if SMT file is provided
if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <smt-file>" >&2
    exit 1
fi

SMT_FILE="$1"

# Ensure the file exists
if [ ! -f "$SMT_FILE" ]; then
    echo "Error: File '$SMT_FILE' not found!"  >&2
    exit 1
fi

# Generate a hash of the SMT file for caching
FILE_HASH=$(sha256sum "$SMT_FILE" | awk '{print $1}')
CACHE_FILE="$CACHE_DIR/$FILE_HASH.result"

# Check if result is already cached
if [ -f "$CACHE_FILE" ]; then
    echo "[CACHE HIT] Returning cached result."  >&2
    cat "$CACHE_FILE"
    exit 0
fi

cat "$SMT_FILE" >"$CACHE_DIR/$FILE_HASH.smt"

# Run Z3 and cache the result
echo "[CACHE MISS] Running Z3..." >&2
"$0".orig "$SMT_FILE" | tee "$CACHE_FILE"

It almost works, but the problem is that the variable identifiers change from run to run (e.g. it's p_x_uint256_9fdce11_00 one run, and p_x_uint256_7721c25_00 another). How to make those identifiers deterministic? From what piece of code do they come from?

In long term a better solution could be made, e.g. with popular joblib.Memory library.

The text was updated successfully, but these errors were encountered:

iirekm · 2025-02-20T19:19:47Z

I made an experimental pull request for that. It needs lots of improvements, but in general - it works.

0xkarmacoma · 2025-02-21T18:27:17Z

I see, the idea is to build a global cache that works across runs. That's a cool idea! diskcache in particular seems really well suited for that (and I see it has a method to ensure the cache doesn't grow too big)

the random-look parts in identifiers comes from the uuid library, they change across runs because it is based on the current time. We may be able to seed it or switch to a deterministic id generation mechanism, wdyt @daejunpark ?

iirekm · 2025-02-22T11:32:23Z

The random part also comes from .to_smt2() method (replacing with .sexpr() seems to fix it), and also from somewhere else in the code (I couldn't figure out where, but Z3 queries generated from run to run do differ sometimes). The final result is that the current code in PR does improve speed in some examples (e.g. ArithTest), but even worsens in other (due to unresolved all sources of randomness, and probably the fact that Z3 is called even thousands of times per run so the diskcache library becomes performance bottleneck itself).

iirekm added the enhancement New feature or request label Feb 20, 2025

iirekm mentioned this issue Feb 20, 2025

[WIP] Caching #464 #465

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache z3 results #464

Cache z3 results #464

iirekm commented Feb 20, 2025

iirekm commented Feb 20, 2025

0xkarmacoma commented Feb 21, 2025

iirekm commented Feb 22, 2025

Cache z3 results #464

Cache z3 results #464

Comments

iirekm commented Feb 20, 2025

iirekm commented Feb 20, 2025

0xkarmacoma commented Feb 21, 2025

iirekm commented Feb 22, 2025