Skip to content

Releases: vulnerability-lookup/VulnTrain

Release 1.2.0

11 Mar 07:31
v1.2.0
d405b7d
Compare
Choose a tag to compare

Changes

  • Dataset generation: CVSS are now extracted from GitHub and PySec security advisories.
  • Dataset generation: CVSS, CPE, title and description (summary) are now extracted from CSAF document.

Release 1.1.0

27 Feb 07:44
v1.1.0
c94d3d0
Compare
Choose a tag to compare

News

  • Trainers: Support of roberta-base for the text classifier with improved
    settings for TrainingArguments.
  • Validators: Validator for severity classification.

Release 1.0.0

25 Feb 07:40
v1.0.0
3f11a97
Compare
Choose a tag to compare

News

  • Introduced a new trainer to automatically classify vulnerabilities based on their descriptions,
    even when CVSS scores are unavailable.
  • Added CVSS parsing to the dataset generation script.

Changes

  • Refactored the project structure for better organization.
  • Improved CPE parsing.
  • Enhanced the dataset generation script.
  • Optimized the trainer for text generation on vulnerability descriptions.
  • Improved command-line argument parsing.
  • Improved the process of pushing the tokenizer and trainer to Hugging Face.

Release 0.5.1

21 Feb 23:02
v0.5.1
2a250c1
Compare
Choose a tag to compare

Fixed configuration module name.

Release 0.5.0

21 Feb 22:43
v0.5.0
6aaa31f
Compare
Choose a tag to compare

Added support of configuration file.

Release 0.4.0

21 Feb 17:19
v0.4.0
d922d3a
Compare
Choose a tag to compare

The dataset generation step now uses data from GitHub Advisories, and the VulnExtractor cleans the summary and details fields.

Release 0.3.0

20 Feb 22:31
v0.3.0
35918af
Compare
Choose a tag to compare

News

Dataset generation: allow specifying a commit message when uploading to Hugging Face.

Validation: Added a simple validation script for a model optimized for text generation. The script is
able to pull a model and send tasks via a Pipeline

Changes

Training step: added the choices of model: gpt2, distilgpt2, meta-llama/Llama-3.3-70B-Instruct, and distilbert-base-uncased

Various improvements to the command line parsing.

Release 0.2.0

20 Feb 06:55
v0.2.0
e169baf
Compare
Choose a tag to compare

News

  • Added a trainer.
  • Experimenting distilbert-base-uncased (AutoModelForMaskedLM) and gpt2 (AutoModelForCausalLM).
    The goal is to generate text.

Changes

  • Various improvements to the dataset generator. And added a command line parser.

Release 0.1.0

19 Feb 15:27
v0.1.0
bb53ba1
Compare
Choose a tag to compare

First release with upload of datasets to HuggingFace.

Datasets are build based on NIST data with enrichment from FKIE and vulnrichment.