Skip to content

Commit 3f11a97

Browse files
chg: [RELEASE] Updated documentation and CHANGELOG.
1 parent 347b742 commit 3f11a97

File tree

4 files changed

+36
-11
lines changed

4 files changed

+36
-11
lines changed

CHANGELOG.md

+18
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,23 @@
11
# Changelog
22

3+
## Release 1.0.0 (2025-02-25)
4+
5+
### News
6+
7+
- Introduced a new trainer to automatically classify vulnerabilities based on their descriptions,
8+
even when CVSS scores are unavailable.
9+
- Added CVSS parsing to the dataset generation script.
10+
11+
### Changes
12+
13+
- Refactored the project structure for better organization.
14+
- Improved CPE parsing.
15+
- Enhanced the dataset generation script.
16+
- Optimized the trainer for text generation on vulnerability descriptions.
17+
- Improved command-line argument parsing.
18+
- Improved the process of pushing the tokenizer and trainer to Hugging Face.
19+
20+
321
## Release 0.5.1 (2025-02-22)
422

523
Fixed configuration module name.

README.md

+8-6
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,16 @@ Check out the datasets and models on Hugging Face:
1818

1919
## Usage
2020

21-
Various types of commands are available:
21+
Three types of commands are available:
2222

2323
- **Dataset generation**: Create and prepare datasets.
24-
- **Model training**: Train models on the prepared datasets.
25-
- **Model validation**: Evaluate the performance of the trained model.
24+
- **Model training**: Train models using the prepared datasets.
25+
- Train a model for text generation to assist in writing vulnerability descriptions.
26+
- Train a model to classify vulnerabilities by severity.
27+
- **Model validation**: Assess the performance of trained models.
2628

2729

28-
### Generate datasets
30+
### Dataset generation
2931

3032
Authenticate to HuggingFace:
3133

@@ -45,7 +47,7 @@ Then ensures that the kvrocks database of Vulnerability-Lookup is running.
4547
Creation of datasets:
4648

4749
```bash
48-
$ vulntrain-dataset-generation --sources cvelistv5 --nb-rows 10000 --upload --repo-id CIRCL/vulnerability-dataset-10k
50+
$ vulntrain-dataset-generation --sources cvelistv5 --nb-rows 10000 --repo-id CIRCL/vulnerability-dataset-10k
4951
Generating train split: 9999 examples [00:00, 177710.74 examples/s]
5052
DatasetDict({
5153
train: Dataset({
@@ -65,7 +67,7 @@ README.md: 100%|█████████████████████
6567
```
6668

6769

68-
### Train
70+
### Model training
6971

7072
#### Training for text generation
7173

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ build-backend = "poetry.core.masonry.api"
55

66
[project]
77
name = "VulnTrain"
8-
version = "0.5.1"
8+
version = "1.0.0"
99
description = "Generate datasets amd models based on vulnerabilities data from Vulnerability-Lookup."
1010
authors = [
1111
{name = "Cédric Bonhomme",email = "[email protected]"}

vulntrain/datasets/create_dataset.py

+9-4
Original file line numberDiff line numberDiff line change
@@ -126,11 +126,16 @@ def main():
126126
help="Comma-separated list of sources (cvelistv5, github)",
127127
)
128128
parser.add_argument(
129-
"--upload", action="store_true", help="Upload dataset to Hugging Face"
129+
"--repo-id",
130+
dest="repo_id",
131+
default="",
132+
help="The name of the repository you want to push your object to. It should contain your organization name when pushing to a given organization.",
130133
)
131-
parser.add_argument("--repo-id", required=False, help="Hugging Face repository ID")
132134
parser.add_argument(
133-
"--commit-message", default="", help="Commit message when publishing"
135+
"--commit-message",
136+
dest="commit_message",
137+
default="",
138+
help="Commit message when publishing",
134139
)
135140
parser.add_argument(
136141
"--nb-rows", type=int, default=0, help="Number of rows in the dataset"
@@ -150,7 +155,7 @@ def main():
150155
)
151156
print(dataset_dict)
152157

153-
if args.upload:
158+
if args.repo_id:
154159
if args.commit_message:
155160
# dataset_dict.push_to_hub(args.repo_id, commit_message=args.commit_message, token=hf_token)
156161
dataset_dict.push_to_hub(args.repo_id, commit_message=args.commit_message)

0 commit comments

Comments
 (0)