Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AI-Driven Test Selection on Pull Request acceptance tests #244

Open
srbarrios opened this issue Feb 16, 2025 · 1 comment
Open

AI-Driven Test Selection on Pull Request acceptance tests #244

srbarrios opened this issue Feb 16, 2025 · 1 comment
Labels
AI Has elements of AI development associated with the project Medium Sized Project Medium sized project is 175 hours Uyuni

Comments

@srbarrios
Copy link

srbarrios commented Feb 16, 2025

Project Title

AI-Driven Test Selection on Pull Request acceptance tests

Description

Large test suites can slow down CI/CD pipelines, leading to longer feedback loops and inefficient resource usage. This project aims to leverage machine learning (ML) to predict which tests should be executed based on recent code changes, commit history, past test failures, and code coverage.

By analyzing this data we can train an ML model to prioritize high-risk tests and reduce overall test execution time. The goal is to reduce the Pull Request acceptance tests execution time by running only the most relevant tests using this ML model.

This project is a continuation of our current work in the Uyuni project, that will be presented during the SeleniumConf 2025

The project will involve

  • Extracting commit history to identify impacted files.
  • Analyzing test execution history to track past failures.
  • Processing code coverage data to map tests to code changes
  • Training a machine learning model (e.g., Random Forest, XGBoost) to recommend which tests to run.
  • Integrating the trained model into our GH actions to dynamically select tests for each Pull Request.

This approach ensures that tests are executed intelligently, reducing test cycle time while maintaining high test coverage.

Deliverables

  • Data extraction scripts for:
    • Commit history from Git (files changed, commit messages)
    • Test execution logs (pass/fail results, error messages)
    • JaCoCo Code coverage reports
  • A trained ML model that predicts which tests should run based on commit history and past test results.
  • Integration with GitHub actions to automate test selection.
  • Comprehensive documentation on setup, training, and deployment of the model.

Mentor

Oscar Barrios (@srbarrios)

Skills Required

  • Ruby (for test framework integration).
  • Machine Learning Basics (feature engineering, model training).
  • Python + Scikit-Learn/Pandas (for ML model development).
  • GitHub actions (to integrate test selection into pipelines).
  • Cucumber/Selenium Testing (understanding of automated tests).

Skill Level

  • Medium – Requires knowledge of Ruby (for test integration) and basic ML concepts (training and using models).
  • Prior experience with CI/CD and automated testing is a plus.

Project Size

Medium-Sized Project (160 hours)

Get Started

Data Sources:

  • Uyuni Git Commit History: Includes code changes, affected files, and commit messages.
  • Test Execution Logs: Stores past test results, including failures, execution time, and errors. This content it gonna be publicly available through a web server located in AWS, for now we are publishing only the Cucumber reports here.
  • Code Coverage Data: Tracks which tests touch specific parts of the codebase. This is available in a Redis database, and already in use in our GH actions

Steps

  • Data Collection & Preprocessing
    • Extract Commit History
    • Extract Test Execution History
    • Extract Code Coverage Data
  • Use Python + Scikit-Learn for training a model using all the data collected.
    • Prepare Training Data
    • Train a Classification Model
  • Integrate it as GitHub action into every Pull Request
  • Enhance the current GH action using the ML model.

Useful links

@srbarrios
Copy link
Author

@ddemaio can you add the Uyuni label on this issue? Thanks!

@srbarrios srbarrios changed the title AI-Driven Test Selection for Faster CI Pipelines AI-Driven Test Selection on Pull Request acceptance tests Feb 17, 2025
@ddemaio ddemaio added Uyuni Medium Sized Project Medium sized project is 175 hours AI Has elements of AI development associated with the project labels Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Has elements of AI development associated with the project Medium Sized Project Medium sized project is 175 hours Uyuni
Projects
None yet
Development

No branches or pull requests

2 participants