Welcome, contributors 👋
We’re thrilled that you want to contribute to Android Malware Detection with Machine Learning, an open-source AI-powered project designed to identify and classify malicious Android applications using machine learning and explainable AI (XAI).
Your ideas, code, and enthusiasm help strengthen the security ecosystem. 🧠💪
- Project README:
README.md - Main entry points:
main.py - Feature extraction:
FeatureExtractionModule/FeatureExtraction.py
- Getting started
- Fork and clone
- Development setup
- Branching and commits
- Pull request process
- Code style and quality
- Testing
- Areas you can contribute to
- Reporting issues
- Security and responsible disclosure
- Community and support
- Need help?
Before you make changes, read README.md to understand the goals and architecture. The most relevant files and folders for contributors are:
FeatureExtractionModule/FeatureExtraction.py— static feature extraction logicMachineLearningModule/MachineLearningFlow.py— end-to-end ML flow and orchestrationMachineLearningModule/Classifiers/— classifier implementationsDatasets/— CSV files used for experiments (e.g.Drebin_v1.csv)ModelEvaluation/— evaluation utilities and metricsUtil/— helper functions used across modules
If you are new to the repository, begin with small doc fixes, tests or simple bug fixes to learn the codebase.
Fork this repo using the Fork button on GitHub.
Clone your fork locally:
git clone https://github.com/<your-username>/AndroidMalwareDetection.git
cd AndroidMalwareDetectionAdd the upstream remote:
git remote add upstream https://github.com/VarnitKumar/AndroidMalwareDetection.gitKeep your fork updated regularly:
git fetch upstream
git merge upstream/mainRecommended steps (Windows example):
-
Create a virtual environment and activate it:
python -m venv venv
venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Run a quick smoke test to ensure the environment works. For example:
python main.py
Notes:
- If you add packages, update
requirements.txtand mention them in your PR. - For heavy experiments, run them on a machine with sufficient RAM (4GB+ recommended).
- Branch names:
feature/<short>,fix/<short>,chore/<short>,experiment/<name>. - Make focused commits and write clear commit messages.
- Use verbs and a short scope prefix when useful.
Commit message format :
<type>: <short summary>
Common types:
- feat: New feature
- fix: Bug fix
- docs: Documentation changes
- test: Adding/updating tests
- refactor: Code improvement without changing functionality
- style: Code style or formatting
Examples:
- feat: add permission-based extractor
- fix: avoid division by zero in Normalisation
- feat: integrate SHAP explainability module
Include a concise description in the commit body when more context helps reviewers.
When you open a PR against main, include:
- A short title and 1–2 paragraph summary of the change and why it matters.
- Steps to reproduce or test locally (commands + expected output).
- Any dataset or model artifacts required for testing (or a subset/sample).
- Notes about performance differences or new dependencies.
- Links to related issues or discussion threads.
PR checklist (maintainers will look for):
- Code compiles / scripts run without errors for the described steps
- Tests for new behavior or a clear explanation why tests are not required
- Updated docs or README when behavior changes
- No secrets or large binary files committed (use links or artifact storage)
- Follow PEP 8 for Python. Use meaningful names and small functions.
- Add or update docstrings for public functions and classes.
- Prefer type hints for public APIs.
- If you introduce a new module, include a short module-level docstring describing responsibilities.
- If you use a linter (recommended:
flake8), include configuration in the repo or list commands in your PR.
-
Add unit tests for any new logic or bug fixes. Place tests under a
tests/directory. -
Run tests with pytest:
pip install pytest python -m pytest -q
-
For data-heavy tests, include small synthetic or sampled CSVs to keep CI fast.
- Feature extractors: permission analysis, API calls, intent filters.
- Preprocessing: missing value handling, normalization, imbalance correction.
- Models: add classifiers, hyperparameter tuning, or cross-validation improvements.
- Explainability: integrate or improve SHAP/LIME analyses and visualizations.
- Testing & CI: add tests and a minimal GitHub Actions workflow (optional).
- Documentation: improve README examples, usage guides, and notebooks.
If you'd like to work on a larger feature, open an issue with a short design proposal so maintainers can provide feedback first.
When creating an issue, include:
- A descriptive title
- Steps to reproduce (commands, exact file names)
- Expected vs actual behavior
- Error messages / tracebacks and relevant log excerpts
- Minimal dataset or sample input if possible
Label issues clearly (bug, enhancement, question) to help triage.
If you find a security vulnerability (example: accidental credentials, data exfiltration paths), do not create a public issue. Instead contact the repository owner(s) directly or use GitHub's private security advisory to report the issue.
-
For help getting started, open an issue with the
help-wantedlabel. -
If you want maintainers to review a proposed design before implementation, open an issue titled
RFC: short descriptionand outline the approach. -
Join discussions on GitHub or reach out via email (if provided in the repo).
-
Be respectful, collaborative, and constructive.
-
Support other contributors and share knowledge.
-
Use issues and PRs for discussions, not personal communication.
-
Avoid spam or irrelevant comments.
- Ask in the Discord server (check README).
- If you’re stuck on a task, open/continue the discussion on the Issue itself.