Hub Model Search

Aggregate models from Hugging Face Hub based on a search scenarios.

TODO

Work on a generic important_models.yaml search scenario for which we want to have CSP support / great doc
Work on a generic recommended_models.yaml to provide recommendations to CSP on their curated catalogs.
Add a component that can pull best models from leaderboards (doesn't seem to be programmatic access)
Add a component that can list Merve's collections using HfApi (complicated)

Features

Provider-specific Model Selection
- Support for multiple cloud providers compatibility checks (GCP, AWS, Azure)
Flexible Search Scenarios
- YAML-based configuration for search scenario
- configs/important_models.yaml to list models for which we want to have great doc for all our CSP.
- configs/recommended_models.yaml to list models which we think should be added to our CSP catalogs.
- Create your own.

Installation

Clone the repository:

git clone [repository-url]

Install dependencies:

pip install -r requirements.txt

Required dependencies:

huggingface-hub: For accessing the Hugging Face model hub
pandas: For data processing and CSV output

Configuration

The tool uses YAML configuration files located in the configs/ directory:

search_scenarios.yaml: example search scenarios
providers/: Provider-specific compatibility rules
- gcp.yaml: Google Cloud Platform configuration (Deploy to Google Cloud rules)
- aws.yaml: Amazon Web Services configuration (Deploy to Sagemaker rules)
- azure.yaml: Microsoft Azure configuration (Azure HF Collection limitations)

Logic

src:

config.py: Define the classes used to load the providers and search scenario config files.
providers.py: Define the classes corresponding to each provider. It is used to define model compatibility rules.
searcher.py: Define the class used to query the hub for each search query, define model compatibility, and save results.

main.py take as input a list of providers, a config file of search scenarios and output the results.

Search Scenarios Configuration

Each scenario in your scenarios yaml file requires:

sort: Field to sort results by (e.g., "downloads", "trendingScore")
direction: Sort direction (-1 for descending, 1 for ascending)

Optional parameters:

tasks: List of Hugging Face tasks to search for
tags: List of tags to filter models

Example scenario configuration:

finance:
  sort: "downloads"
  direction: -1
  tasks:
    - "text-classification"
    - "text-generation"
  tags:
    - "finance"
    - "fintech"

When both tasks and tags are specified, the tool performs searches for each combination of task and tag.

Usage

Basic usage:

python main.py --provider gcp,aws,azure --search_scenario_file configs/search_scenario.yaml

Command Line Arguments

--provider: Comma-separated list of providers (gcp,aws,azure)
--search_scenario_file: yaml_file

Examples

Search trending models for GCP:

python main.py --provider gcp --search_scenario_file configs/trending.yaml

Get finance-specific models for AWS:

python main.py --provider aws --search_scenario_file configs/finance.yaml

Run a search scenario across providers:

python main.py --provider gcp,aws --search_scenario_file configs/search_scenario.yaml

Output

The tool generates a consolidated CSV file in the output/ directory with:

Model ID
Provider Compatibility (for each provider)
Downloads
Likes
Tags
Task
Search Parameters Used (task and tag that found the model)
Pipeline Compatibility
Library Name
Search Scenario

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hub Model Search

TODO

Features

Installation

Configuration

Logic

Search Scenarios Configuration

Usage

Command Line Arguments

Examples

Output

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
configs		configs
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

pagezyhf/hub-model-search

Folders and files

Latest commit

History

Repository files navigation

Hub Model Search

TODO

Features

Installation

Configuration

Logic

Search Scenarios Configuration

Usage

Command Line Arguments

Examples

Output

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages