Dockerfile Processor

Overview

This repository contains a Python-based tool for analyzing and processing Dockerfiles. It is designed to generate user-friendly questions and outputs in JSONL format for training or other applications. The tool uses the Hugging Face Inference API to interact with a language model, providing meaningful outputs based on the content of the Dockerfiles.

Features

Automated Dockerfile Analysis: Parses and validates Dockerfiles for processing.
Hugging Face Integration: Uses mistralai/Mistral-7B-Instruct-v0.3 for generating prompts and responses.
Error Handling & Retry Mechanism: Handles API failures with retry logic and logs failures for later review.
Logging: Tracks success and failure statistics in both console output and log files.
JSONL Output: Generates well-structured JSONL files with system-user interactions for each Dockerfile.

File Structure

.
├── dockerfiles
│   └── sources-gold       # Directory containing input Dockerfiles
├── data
│   └── dockerfiles.jsonl  # Output file storing processed data in JSONL format
├── logs
│   ├── success.log        # Logs filenames successfully processed
│   └── failure.log        # Logs filenames that failed processing
├── .env                   # Environment variables (e.g., API_TOKEN)
├── main.py                # Main Python script for processing Dockerfiles
├── README.md              # Repository documentation (this file)
└── requirements.txt       # Python dependencies

How It Works

Dockerfile Parsing:
- The tool reads Dockerfiles from the dockerfiles/sources-gold directory.
- Validates each file using the dockerfile library to ensure compatibility.
Prompt Generation:
- Constructs a prompt based on the content of the Dockerfile.
- Sends the prompt to the Hugging Face Inference API for processing.
Response Handling:
- Validates and cleans the model's response.
- Retries up to a defined limit if the response is invalid or empty.
Output Generation:
- Creates a JSONL entry with the Dockerfile content and the generated user question.
- Logs each file's success or failure into separate log files.

Usage

Prerequisites

Python 3.8+
Install dependencies:
```
pip install -r requirements.txt
```
Set up the .env file with your Hugging Face API token:
```
API_TOKEN=your_hugging_face_api_token
```

Running the Script

Execute the main script to process Dockerfiles:

python main.py

Outputs

Processed Data:
- Saved in data/dockerfiles.jsonl as structured JSONL.
Logs:
- Successful files: logs/success.log
- Failed files: logs/failure.log

Example JSONL Entry

{
  "text": "System: You are a Dockerfile generator.\n\nUser: Create a Dockerfile using...\n\nAssistant: FROM alpine:3.10\nRUN ..."
}

Contributing

Fork the repository.
Create a new branch:
```
git checkout -b feature-branch
```
Make your changes and commit them:
```
git commit -m "Add new feature"
```
Push to your branch:
```
git push origin feature-branch
```
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, please create an issue in this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dockerfile Processor

Overview

Features

File Structure

How It Works

Usage

Prerequisites

Running the Script

Outputs

Example JSONL Entry

Contributing

License

Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dockerfile Processor

Overview

Features

File Structure

How It Works

Usage

Prerequisites

Running the Script

Outputs

Example JSONL Entry

Contributing

License

Contact