Generate Kani Metrics (#235)

carolynzech · web-flow · commit b4fa4fd78ff6 · 2025-01-14T16:10:01.000Z
Add a new "Kani Metrics" workflow (that runs every week) that calls `./scripts/run-kani --run metrics`, then creates a pull request to this repository with the computed metrics. See [here](https://github.com/carolynzech/verify-rust-std/pull/38) for an example of what the pull request will look like. Callouts: - This is a separate workflow from the Kani workflow because it is a cronjob instead of running on every pull request. I thought the plots would be too noisy if we ran on every PR. - See the "Notes" section of `kani_std_analysis.py` for other callouts. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.
diff --git a/.github/workflows/kani-metrics.yml b/.github/workflows/kani-metrics.yml
@@ -0,0 +1,42 @@
+name: Kani Metrics Update
+
+on:
+  schedule:
+    - cron: '0 0 * * 0'  # Run at 00:00 UTC every Sunday
+  workflow_dispatch:
+
+defaults:
+  run:
+    shell: bash
+
+jobs:
+  update-kani-metrics:
+    runs-on: ubuntu-latest
+    
+    steps:
+    - name: Checkout Repository
+      uses: actions/checkout@v4
+      with:
+        submodules: true
+    
+    # The Kani metrics collection uses a Python script (kani_std_analysis.py), so make sure Python is installed
+    - name: Set up Python
+      uses: actions/setup-python@v4
+      with:
+        python-version: '3.x'
+
+    - name: Compute Kani Metrics
+      run: ./scripts/run-kani.sh --run metrics --path ${{github.workspace}}
+
+    - name: Create Pull Request
+      uses: peter-evans/create-pull-request@v7
+      with:
+        commit-message: Update Kani metrics
+        title: 'Update Kani Metrics'
+        body: |
+          This is an automated PR to update Kani metrics.
+          
+          The metrics have been updated by running `./scripts/run-kani.sh --run metrics`.
+        branch: update-kani-metrics
+        delete-branch: true
+        base: main
diff --git a/.gitignore b/.gitignore
@@ -46,7 +46,6 @@ package-lock.json
 ## Kani
 *.out
 
-
 # Added by cargo
 #
 # already existing elements were commented out
diff --git a/scripts/kani-std-analysis/kani_std_analysis.py b/scripts/kani-std-analysis/kani_std_analysis.py
@@ -0,0 +1,307 @@
+#!/usr/bin/env python3
+
+import argparse
+from collections import defaultdict
+import csv
+import json
+import sys
+from datetime import datetime
+import matplotlib.pyplot as plt
+import matplotlib.dates as mdates
+from matplotlib.ticker import FixedLocator
+
+# Output metrics about Kani's application to the standard library by:
+#   1. Postprocessing results from running `kani list`, which collects data about Kani harnesses and contracts.
+#   2. Postprocessing results from std-analysis.sh, which outputs metrics about the standard library unrelated to Kani (e.g., a list of the functions in each crate).
+#   3. Comparing the output of steps #1 and #2 to compare the Kani-verified portion of the standard library to the overall library,
+#      e.g., what fraction of unsafe functions have Kani contracts.
+
+# Notes:
+# - This script assumes that std-analysis.sh and `kani list` have already run, since we expect to invoke this script from `run-kani.sh`.
+# - The results are architecture-dependent: the standard library has functions that are only present for certain architectures, 
+#   and https://github.com/model-checking/verify-rust-std/pull/122 has Kani harnesses that only run on x86-64.
+# - The total functions under contract are not necessarily equal to the sum of unsafe and safe functions under contract.
+#   We've added a few functions (three, as of writing) with contracts to this fork, e.g. ffi::c_str::is_null_terminated.
+#   Since std-analysis.sh runs on the standard library given a toolchain (not this fork), it doesn't know about this function and therefore can't classify it as safe or unsafe.
+#   But `kani list` runs on this fork, so it can still see it and add it to the total functions under contract.
+# - See #TODOs for known limitations.
+
+# Process the results from Kani's std-analysis.sh script for each crate.
+# TODO For now, we just handle "core", but we should process all crates in the library.
+class GenericSTDMetrics():
+    def __init__(self, results_dir):
+        self.results_directory = results_dir
+        self.unsafe_fns_count = 0
+        self.safe_abstractions_count = 0
+        self.safe_fns_count = 0
+        self.unsafe_fns = []
+        self.safe_abstractions = []
+        self.safe_fns = [] 
+
+        self.read_std_analysis()
+        
+    # Read {crate}_overall_counts.csv
+    # and return the number of unsafe functions and safe abstractions
+    def read_overall_counts(self):
+        file_path = f"{self.results_directory}/core_scan_overall.csv"
+        with open(file_path, 'r') as f:
+            csv_reader = csv.reader(f, delimiter=';')
+            counts = {row[0]: int(row[1]) for row in csv_reader if len(row) >= 2}
+        self.unsafe_fns_count = counts.get('unsafe_fns', 0)
+        self.safe_abstractions_count = counts.get('safe_abstractions', 0)
+        self.safe_fns_count = counts.get('safe_fns', 0)
+
+    # Read {crate}_scan_functions.csv
+    # and return an array of the unsafe functions and the safe abstractions
+    def read_scan_functions(self):
+        expected_header_start = "name;is_unsafe;has_unsafe_ops"
+        file_path = f"{self.results_directory}/core_scan_functions.csv"
+
+        with open(file_path, 'r') as f:
+            csv_reader = csv.reader(f, delimiter=';', quotechar='"')
+            
+            # The row parsing logic below assumes the column structure in expected_header_start,
+            # so assert that is how the header begins before continuing
+            header = next(csv_reader)
+            header_str = ';'.join(header[:3])
+            if not header_str.startswith(expected_header_start):
+                print(f"Error: Unexpected CSV header in {file_path}")
+                print(f"Expected header to start with: {expected_header_start}")
+                print(f"Actual header: {header_str}")
+                sys.exit(1)
+
+            for row in csv_reader:
+                if len(row) >= 3:
+                    name, is_unsafe, has_unsafe_ops = row[0], row[1], row[2]
+                    # An unsafe function is a function for which is_unsafe=true
+                    if is_unsafe.strip() == "true":
+                        self.unsafe_fns.append(name)
+                    else:
+                        assert is_unsafe.strip() == "false" # sanity check against malformed data
+                        self.safe_fns.append(name)
+                        # A safe abstraction is a safe function with unsafe ops
+                        if has_unsafe_ops.strip() == "true":
+                            self.safe_abstractions.append(name)
+
+    def read_std_analysis(self):
+        self.read_overall_counts()
+        self.read_scan_functions()
+
+        # Sanity checks
+        if len(self.unsafe_fns) != self.unsafe_fns_count:
+            print(f"Number of unsafe functions does not match core_scan_functions.csv")
+            print(f"UNSAFE_FNS_COUNT: {self.unsafe_fns_count}")
+            print(f"UNSAFE_FNS length: {len(self.unsafe_fns)}")
+            sys.exit(1)
+
+        if len(self.safe_abstractions) != self.safe_abstractions_count:
+            print(f"Number of safe abstractions does not match core_scan_functions.csv")
+            print(f"SAFE_ABSTRACTIONS_COUNT: {self.safe_abstractions_count}")
+            print(f"SAFE_ABSTRACTIONS length: {len(self.safe_abstractions)}")
+            sys.exit(1)
+        
+        if len(self.safe_fns) != self.safe_fns_count:
+            print(f"Number of safe functions does not match core_scan_functions.csv")
+            print(f"SAFE_FNS_COUNT: {self.safe_fns_count}")
+            print(f"SAFE_FNS length: {len(self.safe_fns)}")
+            sys.exit(1)
+
+# Process the results of running `kani list` against the standard library,
+# i.e., the Kani STD metrics for a single date (whichever day this script is running).
+class KaniListSTDMetrics():
+    def __init__(self, kani_list_filepath):
+        self.total_fns_under_contract = 0
+        # List of (fn, has_harnesses) tuples, where fn is a function under contract and has_harnesses=true iff the contract has at least one harness
+        self.fns_under_contract = []
+        self.expected_kani_list_version = "0.1"
+
+        self.read_kani_list_data(kani_list_filepath)
+
+    def read_kani_list_data(self, kani_list_filepath):
+        try:
+            with open(kani_list_filepath, 'r') as f:
+                kani_list_data = json.load(f)
+        except FileNotFoundError:
+            print(f"Kani list JSON file not found.")
+            sys.exit(1)
+        
+        if kani_list_data.get("file-version") != self.expected_kani_list_version:
+            print(f"Kani list JSON file version does not match expected version {self.expected_kani_list_version}")
+            sys.exit(1)
+
+        for contract in kani_list_data.get('contracts', []):
+            func_under_contract = contract.get('function', '')
+            has_harnesses = len(contract.get('harnesses', [])) > 0
+            self.fns_under_contract.append((func_under_contract, has_harnesses))
+
+        self.total_fns_under_contract = kani_list_data.get('totals', {}).get('functions-under-contract', 0)
+
+# Generate metrics about Kani's application to the standard library over time
+# by reading past metrics from metrics_file, then computing today's metrics.
+class KaniSTDMetricsOverTime():
+    def __init__(self, metrics_file):
+        self.dates = []
+        self.unsafe_metrics = ['total_unsafe_fns', 'unsafe_fns_under_contract', 'verified_unsafe_fns_under_contract']
+        self.safe_abstr_metrics = ['total_safe_abstractions', 'safe_abstractions_under_contract', 'verified_safe_abstractions_under_contract']
+        self.safe_metrics = ['total_safe_fns', 'safe_fns_under_contract', 'verified_safe_fns_under_contract']
+        # The keys in these dictionaries are unsafe_metrics, safe_abstr_metrics, and safe_metrics, respectively; see update_plot_metrics()
+        self.unsafe_plot_data = defaultdict(list)
+        self.safe_abstr_plot_data = defaultdict(list)
+        self.safe_plot_data = defaultdict(list)
+
+        self.date = datetime.today().date()
+        self.metrics_file = metrics_file
+
+        self.read_historical_data()
+
+    # Read historical data from self.metrics_file and initialize the date range.    
+    def read_historical_data(self):
+        try:
+            with open(self.metrics_file, 'r') as f:
+                all_data = json.load(f)["results"]
+                self.update_plot_metrics(all_data)
+        except FileNotFoundError:
+            all_data = {}
+        
+        self.dates = [datetime.strptime(data["date"], '%Y-%m-%d').date() for data in all_data]
+        self.dates.append(self.date)
+        
+        print(f"Dates are {self.dates}\n")
+    
+    # TODO For now, we don't plot how many of the contracts have been verified, since the line overlaps with how many are under contract.
+    # If we want to plot this data, we should probably generate separate plots.
+    # Update the plot data with the provided data.
+    def update_plot_metrics(self, all_data):
+        for data in all_data:
+            for metric in self.unsafe_metrics:
+                if not metric.startswith("verified"):
+                    self.unsafe_plot_data[metric].append(data[metric])
+        
+            for metric in self.safe_abstr_metrics:
+                if not metric.startswith("verified"):
+                    self.safe_abstr_plot_data[metric].append(data[metric])
+            
+            for metric in self.safe_metrics:
+                if not metric.startswith("verified"):
+                    self.safe_plot_data[metric].append(data[metric])
+
+    # Read output from kani list and std-analysis.sh, then compare their outputs to compute Kani-specific metrics
+    # and write the results to {self.metrics_file}
+    def compute_metrics(self, kani_list_filepath, analysis_results_dir):
+        # Process the `kani list` and `std-analysis.sh` data
+        kani_data = KaniListSTDMetrics(kani_list_filepath)
+        generic_metrics = GenericSTDMetrics(analysis_results_dir)
+
+        print("Comparing kani-list output to std-analysis.sh output and computing metrics...")
+
+        (unsafe_fns_under_contract, verified_unsafe_fns_under_contract) = (0, 0)
+        (safe_abstractions_under_contract, verified_safe_abstractions_under_contract) = (0, 0)
+        (safe_fns_under_contract, verified_safe_fns_under_contract) = (0, 0)
+
+        for (func_under_contract, has_harnesses) in kani_data.fns_under_contract:
+            if func_under_contract in generic_metrics.unsafe_fns:
+                unsafe_fns_under_contract += 1
+                if has_harnesses:
+                    verified_unsafe_fns_under_contract += 1
+            if func_under_contract in generic_metrics.safe_abstractions:
+                safe_abstractions_under_contract += 1
+                if has_harnesses:
+                    verified_safe_abstractions_under_contract += 1
+            if func_under_contract in generic_metrics.safe_fns:
+                safe_fns_under_contract += 1
+                if has_harnesses:
+                    verified_safe_fns_under_contract += 1
+
+        # Keep the keys here in sync with unsafe_metrics, safe_metrics, and safe_abstr_metrics
+        data = {
+            "date": self.date,
+            "total_unsafe_fns": generic_metrics.unsafe_fns_count,
+            "total_safe_abstractions": generic_metrics.safe_abstractions_count,
+            "total_safe_fns": generic_metrics.safe_fns_count,
+            "unsafe_fns_under_contract": unsafe_fns_under_contract,
+            "verified_unsafe_fns_under_contract": verified_unsafe_fns_under_contract,
+            "safe_abstractions_under_contract": safe_abstractions_under_contract,
+            "verified_safe_abstractions_under_contract": verified_safe_abstractions_under_contract,
+            "safe_fns_under_contract": safe_fns_under_contract,
+            "verified_safe_fns_under_contract": verified_safe_fns_under_contract,
+            "total_functions_under_contract": kani_data.total_fns_under_contract,
+        }
+
+        self.update_plot_metrics([data])
+
+        # Update self.metrics_file so that these results are included in our historical data for next time
+        with open(self.metrics_file, 'r') as f:
+            content = json.load(f)
+            content["results"].append(data)
+        
+        with open(self.metrics_file, 'w') as f:
+            json.dump(content, f, indent=2, default=str)
+        
+        print(f"Wrote metrics data for date {self.date} to {self.metrics_file}")    
+
+    # Make a single plot with specified data, title, and filename
+    def plot_single(self, data, title, outfile):
+        plt.figure(figsize=(14, 8))
+
+        colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#946F7bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']
+        
+        for i, (metric, values) in enumerate(data.items()):
+            color = colors[i % len(colors)]
+            plt.plot(self.dates, values, 'o-', color=color, label=metric, markersize=6)
+            
+            # Mark each point on the line with the y value
+            for x, y in zip(self.dates, values):
+                plt.annotate(str(y), 
+                            (mdates.date2num(x), y), 
+                            xytext=(0, 5), 
+                            textcoords='offset points',
+                            ha='center', 
+                            va='bottom',
+                            color=color,
+                            fontweight='bold')
+
+        plt.title(title)
+        plt.xlabel("Date")
+        plt.ylabel("Count")
+        
+        # Set x-axis to only show ticks for the dates we have
+        plt.gca().xaxis.set_major_locator(FixedLocator(mdates.date2num(self.dates)))
+        plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
+        
+        plt.gcf().autofmt_xdate()
+        plt.tight_layout()
+        plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
+        
+        plt.savefig(outfile, bbox_inches='tight', dpi=300)
+        plt.close()
+
+        print(f"PNG graph generated: {outfile}")
+
+    def plot(self):
+        self.plot_single(self.unsafe_plot_data, title="Contracts on Unsafe Functions in core", outfile="core_unsafe_metrics.png")
+        self.plot_single(self.safe_abstr_plot_data, title="Contracts on Safe Abstractions in core", outfile="core_safe_abstractions_metrics.png")
+        self.plot_single(self.safe_plot_data, title="Contracts on Safe Functions in core", outfile="core_safe_metrics.png")
+
+def main():
+    parser = argparse.ArgumentParser(description="Generate metrics about Kani's application to the standard library.")
+    parser.add_argument('--metrics-file', 
+                    type=str, 
+                    default="metrics-data.json", 
+                    help="Path to the JSON file containing metrics data (default: metrics-data.json)")
+    parser.add_argument('--kani-list-file', 
+                    type=str, 
+                    default="kani-list.json", 
+                    help="Path to the JSON file containing the Kani list data (default: kani-list.json)")
+    # The default is /tmp/std_lib_analysis/results because, as of writing, that's always where std-analysis.sh outputs its results.
+    parser.add_argument('--analysis-results-dir', 
+                    type=str, 
+                    default="/tmp/std_lib_analysis/results", 
+                    help="Path to the folder file containing the std-analysis.sh results (default: /tmp/std_lib_analysis/results)")
+    args = parser.parse_args()
+
+    metrics = KaniSTDMetricsOverTime(args.metrics_file)
+    metrics.compute_metrics(args.kani_list_file, args.analysis_results_dir)
+    metrics.plot()
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/kani-std-analysis/metrics-data.json b/scripts/kani-std-analysis/metrics-data.json
@@ -0,0 +1,17 @@
+{
+  "results": [
+    {
+      "date": "2025-01-01",
+      "total_unsafe_fns": 6987,
+      "total_safe_abstractions": 1704,
+      "total_safe_fns": 14666,
+      "unsafe_fns_under_contract": 144,
+      "verified_unsafe_fns_under_contract": 132,
+      "safe_abstractions_under_contract": 41,
+      "verified_safe_abstractions_under_contract": 41,
+      "safe_fns_under_contract": 77,
+      "verified_safe_fns_under_contract": 77,
+      "total_functions_under_contract": 224
+    }
+  ]
+}
diff --git a/scripts/kani-std-analysis/requirements.txt b/scripts/kani-std-analysis/requirements.txt
@@ -0,0 +1 @@
+matplotlib
diff --git a/scripts/run-kani.sh b/scripts/run-kani.sh

Original file line number	Diff line number	Diff line change
`@@ -46,7 +46,6 @@ package-lock.json`
`46`	`46`	`## Kani`
`47`	`47`	`*.out`
`48`	`48`
`49`		`-`
`50`	`49`	`# Added by cargo`
`51`	`50`	`#`
`52`	`51`	`# already existing elements were commented out`