secure-software-engineering
diff --git a/‎README.md
Lines changed: 40 additions & 67 deletions b/‎README.md
Lines changed: 40 additions & 67 deletions
@@ -15,6 +15,7 @@
 - 📊 Automatically produces **meaningful metrics** for in-depth assessment and comparison.
 
 ### [New] TypeEvalPy Autogen
+
 - 🤖 **Autogenerates code snippets** and ground truth to scale the benchmark based on the original `TypeEvalPy` benchmark.
 - 📈 The autogen benchmark now contains:
   - **Python files**: 7121
@@ -30,72 +31,44 @@
 | [HiTyper](https://github.com/JohnnyPeng18/HiTyper)                    | [Pytype](https://github.com/google/pytype)                           |
 | [Scalpel](https://github.com/SMAT-Lab/Scalpel/issues)                 | [TypeT5](https://github.com/utopia-group/TypeT5)                     |
 | [Type4Py](https://github.com/saltudelft/type4py)                      |                                                                      |
-| [GPT-4](https://openai.com/research/gpt-4)                            |                                                                      |
+| [GPT](https://openai.com)                                             |                                                                      |
 | [Ollama](https://ollama.ai)                                           |                                                                      |
 
 ---
 
----
-
 ## 🏆 TypeEvalPy Leaderboard
 
-Below is a comparison showcasing exact matches across different tools, coupled with `top_n` predictions for ML-based tools.
-
-| Rank | 🛠️ Tool                                                                    | Top-n       | Function Return Type | Function Parameter Type | Local Variable Type | Total             |
-| ---- | ------------------------------------------------------------------------- | ----------- | -------------------- | ----------------------- | ------------------- | ----------------- |
-| 1    | **[HeaderGen](https://github.com/secure-software-engineering/HeaderGen)** | 1           | 186                  | 56                      | 322                 | 564               |
-| 2    | **[Jedi](https://github.com/davidhalter/jedi)**                           | 1           | 122                  | 0                       | 293                 | 415               |
-| 3    | **[Pyright](https://github.com/microsoft/pyright)**                       | 1           | 100                  | 8                       | 297                 | 405               |
-| 4    | **[HiTyper](https://github.com/JohnnyPeng18/HiTyper)**                    | 1<br>3<br>5 | 163<br>173<br>175    | 27<br>37<br>37          | 179<br>225<br>229   | 369<br>435<br>441 |
-| 5    | **[HiTyper (static)](https://github.com/JohnnyPeng18/HiTyper)**           | 1           | 141                  | 7                       | 102                 | 250               |
-| 6    | **[Scalpel](https://github.com/SMAT-Lab/Scalpel/issues)**                 | 1           | 155                  | 32                      | 6                   | 193               |
-| 7    | **[Type4Py](https://github.com/saltudelft/type4py)**                      | 1<br>3<br>5 | 39<br>103<br>109     | 19<br>31<br>31          | 99<br>167<br>174    | 157<br>301<br>314 |
-
-_<sub>(Auto-generated based on the the analysis run on 20 Oct 2023)</sub>_
-
----
-
-## 🏆🤖 TypeEvalPy LLM Leaderboard
-
-Below is a comparison showcasing exact matches for LLMs.
-
-| Rank | 🛠️ Tool                                                                                      | Function Return Type | Function Parameter Type | Local Variable Type | Total |
-| ---- | ------------------------------------------------------------------------------------------- | -------------------- | ----------------------- | ------------------- | ----- |
-| 1    | **[GPT-4](https://openai.com/research/gpt-4)**                                              | 225                  | 85                      | 465                 | 775   |
-| 2    | **[Finetuned:GPT 3.5](https://platform.openai.com/docs/models/gpt-3-5-turbo)**              | 209                  | 85                      | 436                 | 730   |
-| 3    | **[codellama:13b-instruct](https://huggingface.co/docs/transformers/model_doc/code_llama)** | 199                  | 75                      | 425                 | 699   |
-| 4    | **[GPT 3.5 Turbo](https://platform.openai.com/docs/models/gpt-3-5-turbo)**                  | 188                  | 73                      | 429                 | 690   |
-| 5    | **[codellama:34b-instruct](https://huggingface.co/docs/transformers/model_doc/code_llama)** | 190                  | 52                      | 425                 | 667   |
-| 6    | phind-codellama:34b-v2                                                                      | 182                  | 60                      | 399                 | 641   |
-| 7    | codellama:7b-instruct                                                                       | 171                  | 72                      | 384                 | 627   |
-| 8    | dolphin-mistral                                                                             | 184                  | 76                      | 356                 | 616   |
-| 9    | codebooga                                                                                   | 186                  | 56                      | 354                 | 596   |
-| 10   | llama2:70b                                                                                  | 168                  | 55                      | 342                 | 565   |
-| 11   | **[HeaderGen](https://github.com/secure-software-engineering/HeaderGen)**                   | 186                  | 56                      | 321                 | 563   |
-| 12   | wizardcoder:13b-python                                                                      | 170                  | 74                      | 317                 | 561   |
-| 13   | llama2:13b                                                                                  | 153                  | 40                      | 283                 | 476   |
-| 14   | mistral:instruct                                                                            | 155                  | 45                      | 250                 | 450   |
-| 15   | mistral:v0.2                                                                                | 155                  | 45                      | 248                 | 448   |
-| 16   | vicuna:13b                                                                                  | 153                  | 35                      | 260                 | 448   |
-| 17   | vicuna:33b                                                                                  | 133                  | 29                      | 267                 | 429   |
-| 18   | **[Jedi](https://github.com/davidhalter/jedi)**                                             | 122                  | 0                       | 293                 | 415   |
-| 19   | **[Pyright](https://github.com/microsoft/pyright)**                                         | 100                  | 8                       | 297                 | 405   |
-| 19   | wizardcoder:7b-python                                                                       | 103                  | 48                      | 254                 | 405   |
-| 20   | llama2:7b                                                                                   | 140                  | 34                      | 216                 | 390   |
-| 21   | **[HiTyper](https://github.com/JohnnyPeng18/HiTyper)**                                      | 163                  | 27                      | 179                 | 369   |
-| 22   | wizardcoder:34b-python                                                                      | 140                  | 43                      | 178                 | 361   |
-| 23   | orca2:7b                                                                                    | 117                  | 27                      | 184                 | 328   |
-| 24   | vicuna:7b                                                                                   | 131                  | 17                      | 172                 | 320   |
-| 25   | orca2:13b                                                                                   | 113                  | 19                      | 166                 | 298   |
-| 26   | **[Scalpel](https://github.com/SMAT-Lab/Scalpel/issues)**                                   | 155                  | 32                      | 6                   | 193   |
-| 27   | **[Type4Py](https://github.com/saltudelft/type4py)**                                        | 39                   | 19                      | 99                  | 157   |
-| 28   | tinyllama                                                                                   | 3                    | 0                       | 23                  | 26    |
-| 29   | phind-codellama:34b-python                                                                  | 5                    | 0                       | 15                  | 20    |
-| 30   | codellama:13b-python                                                                        | 0                    | 0                       | 0                   | 0     |
-| 31   | codellama:34b-python                                                                        | 0                    | 0                       | 0                   | 0     |
-| 32   | codellama:7b-python                                                                         | 0                    | 0                       | 0                   | 0     |
-
-_<sub>(Auto-generated based on the the analysis run on 14 Jan 2024)</sub>_
+Below is a comparison showcasing exact matches across different tools and LLMs on the Autogen benchmark.
+
+| Rank | 🛠️ Tool                                                                                        | Function Return Type | Function Parameter Type | Local Variable Type | Total |
+| ---- | ---------------------------------------------------------------------------------------------- | -------------------- | ----------------------- | ------------------- | ----- |
+| 1    | **[mistral-large-it-2407-123b](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407)** | 16701                | 728                     | 57550               | 74979 |
+| 2    | **[qwen2-it-72b](https://huggingface.co/Qwen/Qwen2-72B-Instruct)**                             | 16488                | 629                     | 55160               | 72277 |
+| 3    | **[llama3.1-it-70b](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct)**           | 16648                | 580                     | 54445               | 71673 |
+| 4    | **[gemma2-it-27b](https://huggingface.co/google/gemma-2-27b-it)**                              | 16342                | 599                     | 49772               | 66713 |
+| 5    | **[codestral-v0.1-22b](https://huggingface.co/mistralai/Codestral-22B-v0.1)**                  | 16456                | 706                     | 49379               | 66541 |
+| 6    | **[codellama-it-34b](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf)**            | 15960                | 473                     | 48957               | 65390 |
+| 7    | **[mistral-nemo-it-2407-12.2b](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)**  | 16221                | 526                     | 48439               | 65186 |
+| 8    | **[mistral-v0.3-it-7b](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)**            | 16686                | 472                     | 47935               | 65093 |
+| 9    | **[phi3-medium-it-14b](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)**          | 16802                | 467                     | 45121               | 62390 |
+| 10   | **[llama3.1-it-8b](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)**             | 16125                | 492                     | 44313               | 60930 |
+| 11   | **[codellama-it-13b](https://huggingface.co/meta-llama/CodeLlama-13b-Instruct-hf)**            | 16214                | 479                     | 43021               | 59714 |
+| 12   | **[phi3-small-it-7.3b](https://huggingface.co/microsoft/Phi-3-small-128k-instruct)**           | 16155                | 422                     | 38093               | 54670 |
+| 13   | **[qwen2-it-7b](https://huggingface.co/Qwen/Qwen2-7B-Instruct)**                               | 15684                | 313                     | 38109               | 54106 |
+| 14   | **[HeaderGen](https://github.com/ashwinprasadme/headergen)**                                   | 14086                | 346                     | 36370               | 50802 |
+| 15   | **[phi3-mini-it-3.8b](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)**             | 15908                | 320                     | 30341               | 46569 |
+| 16   | **[phi3.5-mini-it-3.8b](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)**              | 15763                | 362                     | 28694               | 44819 |
+| 17   | **[codellama-it-7b](https://huggingface.co/meta-llama/CodeLlama-7b-Instruct-hf)**              | 13779                | 318                     | 29346               | 43443 |
+| 18   | **[Jedi](https://github.com/davidhalter/jedi)**                                                | 13160                | 0                       | 15403               | 28563 |
+| 19   | **[Scalpel](https://github.com/SMAT-Lab/Scalpel/issues)**                                      | 15383                | 171                     | 18                  | 15572 |
+| 20   | **[gemma2-it-9b](https://huggingface.co/google/gemma-2-9b-it)**                                | 1611                 | 66                      | 5464                | 7141  |
+| 21   | **[Type4Py](https://github.com/saltudelft/type4py)**                                           | 3143                 | 38                      | 2243                | 5424  |
+| 22   | **[tinyllama-1.1b](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)**                | 1514                 | 28                      | 2699                | 4241  |
+| 23   | **[mixtral-v0.1-it-8x7b](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)**        | 3235                 | 33                      | 377                 | 3645  |
+| 24   | **[phi3.5-moe-it-41.9b](https://huggingface.co/microsoft/Phi-3.5-MoE-instruct)**               | 3090                 | 25                      | 273                 | 3388  |
+| 25   | **[gemma2-it-2b](https://huggingface.co/google/gemma-2-2b-it)**                                | 1497                 | 41                      | 1848                | 3386  |
+
+_<sub>(Auto-generated based on the the analysis run on 30 Aug 2024)</sub>_
 
 ---
 
@@ -125,15 +98,16 @@ Each results folder will have a timestamp, allowing you to easily track and comp
 Here is how the auto-generated CSV tables relate to the paper's tables:
 
 - **Table 1** in the paper is derived from three auto-generated CSV tables:
-	- `paper_table_1.csv` - details Exact matches by type category.
-	- `paper_table_2.csv` - lists Exact matches for 18 micro-benchmark categories.
-	- `paper_table_3.csv` - provides Sound and Complete values for tools.
 
+  - `paper_table_1.csv` - details Exact matches by type category.
+  - `paper_table_2.csv` - lists Exact matches for 18 micro-benchmark categories.
+  - `paper_table_3.csv` - provides Sound and Complete values for tools.
 
 - **Table 2** in the paper is based on the following CSV table:
-	- `paper_table_5.csv` - shows Exact matches with top_n values for machine learning tools.
+  - `paper_table_5.csv` - shows Exact matches with top_n values for machine learning tools.
+
+Additionally, there are CSV tables that are _not_ included in the paper:
 
-Additionally, there are CSV tables that are *not* included in the paper:
 - `paper_table_4.csv` - containing Sound and Complete values for 18 micro-benchmark categories.
 - `paper_table_6.csv` - featuring Sensitivity analysis.
 </details>
@@ -260,7 +234,6 @@ To generate an extended version of the original TypeEvalPy benchmark to include
     cd autogen
     ```
 
-
 2.  **Execute the Generation Script**
 
     Run the following command to start the generation process:
@@ -270,7 +243,7 @@ To generate an extended version of the original TypeEvalPy benchmark to include
     ```
 
 This will generate a folder in the repo root with the autogen benchmark with the current date.
- 
+
 ---
 
 ### 🤝 Contributing