FreeAlphaRadar ingests financial, alternative and qualitative data from entirely zero-cost public feeds and produces a ranked, transparently scored list of high-potential, under-the-radar companies β ready for an investment committee discussion. Every score is fully drillable: raw value β normalised z-score β weighted contribution.
This project is for educational purposes only and is not intended for real trading or investment.
| Principle | What it means here |
|---|---|
| 100% free data | yfinance, SEC EDGAR, Yahoo Finance news β key-less and free (patents need a free, optional PatentsView or Lens key). |
| Zero configuration | No .env, no secrets, no registration. streamlit run and go. |
| Works fully offline | First launch seeds a SQLite cache with sample data; every fetcher falls back to cache on failure. |
| Totally transparent | A waterfall chart decomposes every company's score, factor by factor. |
| Zero-shot ML | Pre-trained FinBERT + PCA/K-Means clustering. Graceful lexicon fallback when offline. |
| Self-discovering | An optional offline job scans all ~8,000 SEC filers and regenerates the watchlist with the ranked top under-the-radar names. |
git clone https://github.com/chizkidd/freealpharadar.git
cd freealpharadar
pip install -r requirements.txt # slim runtime β uses lexicon sentiment
streamlit run streamlit_app.pyThat's it. The app seeds a local sample cache on first launch, so the dashboard is populated immediately β even with no internet. Click Refresh Data & Re-score to pull live data from the free sources.
Optional β real FinBERT. The slim install uses a deterministic lexicon sentiment backend (fast, dependency-light). To enable the pre-trained FinBERT model and the XGBoost skeleton, also run
pip install -r requirements-ml.txt(~3.9 GB). The app behaves identically either way; only the sentiment engine swaps in.
Customising the screening universe. The default tickers live in
universe.txt(one or more per line,#comments allowed) β edit it to change the default screen with no code changes. By default it's a hand-curated "under-the-radar deep tech" list; you can also type any tickers into the sidebar to override it for a session, or opt into the auto-discovery job below, which regeneratesuniverse.txtfrom a market-wide scan.
make install # slim runtime deps (lexicon sentiment)
make install-ml # + FinBERT + XGBoost (optional, heavyweight)
make install-dev # runtime + pytest/black/isort
make install-warehouse # + duckdb/pyarrow for the discovery job
make run # launch the Streamlit dashboard
make scorer # batch scorer (warms the cache)
make seed # (re)seed the offline sample dataset
make warehouse # build the bulk SEC fundamentals store (needs network)
make discover # scan all filers β promote ranked top-15 (needs network)
make test # offline test-suite (no network)
make lint / make format # black + isort (check / apply)
make docker-up # run via docker compose
make clean # remove caches/build artefactsDependencies are split so the cloud install stays lean; each layer is additive.
| File | Installs | When |
|---|---|---|
requirements.txt |
streamlit, pandas, plotly, yfinance, scikit-learn⦠| always (the only file Streamlit Cloud installs) |
requirements-ml.txt |
transformers, torch, xgboost (~3.9 GB) | optional β real FinBERT + XGBoost locally |
requirements-dev.txt |
pytest, black, isort, pre-commit | tests / contributing |
requirements-warehouse.txt |
duckdb, pyarrow | the optional market-wide discovery job |
35 factors across four groups, each z-normalised within the current universe and combined via configurable sidebar sliders (equal-weight by default).
| Group | Example factors |
|---|---|
| Disruption & Moat | Patent growth rate, R&D intensity, founder-led flag, product moat (manual), culture (manual) |
| Growth & Momentum | 3-yr revenue CAGR, ~5y & ~10y revenue CAGR + full-cycle margin trend (from full SEC XBRL history), gross-margin expansion, price momentum, employee growth, customer-concentration risk |
| Valuation & Inefficiency | EV/Gross-Profit, P/E, P/B, short interest, institutional ownership, insider activity, Piotroski F-score |
| Qualitative Flags | News-headline sentiment (Yahoo + FinBERT), FinBERT controversy score, key-person dependency, regulatory-risk count |
Under-the-radar lens: factors like institutional ownership are oriented so that lower is more attractive β the hallmark of a genuinely overlooked name with room to re-rate.
- π°οΈ Radar Screen β Plotly scatter of Score vs. Market Cap, coloured by sector, with hover company cards and live filters for sector, market-cap range and score threshold.
- π¬ Deep Dive β financial trajectory, SEC risk-factor excerpt with FinBERT sentiment highlighting, patent timeline + top assignees, a Yahoo Finance news feed with headline-sentiment bars, and a waterfall of the score breakdown.
- β Watchlist β save/remove companies (SQLite), then Check for Changes
to re-score and write a changelog to
watchlist_changes/<ticker>_<date>.txt, with deltas shown inline.
| Source | Library / endpoint | Used for |
|---|---|---|
| yfinance | yfinance |
Prices, income/balance/cash-flow statements, short interest, ownership |
| SEC EDGAR | data.sec.gov JSON + EDGAR archives (no key) |
Risk factors, MD&A, business description, Form 4 insider data, XBRL facts |
| PatentsView / Lens | search.patentsview.org or api.lens.org (free API key/token, optional) |
Patent counts, assignees, titles over time |
| Yahoo Finance news | yfinance Ticker.news (no key) |
Recent news headlines; sentiment scored by FinBERT/lexicon, plus volume |
| Manual CSV | optional upload | Employee/culture/product-moat signals β gracefully ignored if absent |
Everything is cached in SQLite with per-source TTLs (configurable via env vars)
so free-tier rate limits are respected and the app remains usable offline. For
resilience from datacenter/CI IPs, yfinance uses a browser-impersonating
curl_cffi session and falls back to Stooq for prices (with a SEC
shares Γ price market-cap approximation) when Yahoo returns nothing.
freealpharadar/
βββ streamlit_app.py # Streamlit entrypoint (stateless, cached)
βββ run_scorer.py # Batch scorer for cron / GitHub Actions
βββ universe.txt # Editable default screening universe
βββ manual_upload_template.csv # Optional manual-signals template
βββ colab_setup.ipynb # One-click Colab/Kaggle launcher
βββ Dockerfile / docker-compose.yml
βββ requirements.txt # Pinned for reproducibility
βββ freealpharadar/
β βββ config.py # Zero-config settings (no secrets)
β βββ database.py # SQLite cache + watchlist + scores
β βββ pipeline.py # Per-company data orchestration
β βββ service.py # ingest β enrich β score pipeline
β βββ sample_data.py # Deterministic offline sample data
β βββ watchlist.py # Re-scoring + changelog writing
β βββ fetchers/ # yfinance, SEC, patents, Yahoo news, manual CSV
β βββ scoring/ # factors, normalisation, engine
β βββ ml/ # FinBERT, clustering, XGBoost skeleton, enrich
β βββ ui/ # radar screen, deep dive, watchlist, sidebar
β βββ warehouse/ # optional bulk SEC fundamentals (DuckDB/Parquet)
β βββ discovery/ # optional market-wide screen β ranked top-N
β βββ utils/ # logging
βββ discoveries/ # dated auto-discovery reports
βββ tests/ # offline pytest suite (no network)
βββ data/sample/ # canned sample dataset
- FinBERT (
ProsusAI/finbert) classifies SEC risk-section and news headline sentiment into a per-company controversy score. If the model can't be downloaded (offline / constrained runtime), a deterministic finance lexicon backend takes over with an identical API. - PCA + K-Means clusters companies on financial ratios so you can spot names that behave unlike their nominal peers.
- XGBoost breakout model is a thin, optional skeleton: supply a
ticker,labelCSV of historical breakouts to train a re-ranker; with no labels (the default) the system uses rule-based scoring with zero degradation.
There are zero secrets, and the only dependencies Streamlit Cloud installs
are the slim requirements.txt (the heavyweight ML stack stays in the optional
requirements-ml.txt), so the build is fast and reliable on the free tier.
- Go to share.streamlit.io β sign in with GitHub β Create app β Deploy a public app from GitHub.
- Repository:
chizkidd/freealpharadarΒ· Branch:mainΒ· Main file path:streamlit_app.py. - In the App URL field, set the subdomain to
freealpharadarso the public URL becomeshttps://freealpharadar.streamlit.app. (The subdomain must be globally unique; if it is taken, choose another. You can also change it later under App settings β General β Custom subdomain.) - Click Deploy β that's it. No Advanced-settings or Python-version changes
are needed:
requirements.txtfloats its compiled dependencies to whatever prebuilt wheel matches the interpreter Streamlit Cloud picks (3.11β3.14), so the build never falls back to a slow/failing source compile. No secrets. - The app paints instantly from the committed prewarm snapshot (or the deterministic sample if none), so the Radar Screen is populated on first load. News is fetched lazily per company when you open a Deep Dive.
Streamlit Cloud's filesystem is ephemeral: the SQLite cache and any watchlist changelogs reset when the container restarts. Sample data re-seeds automatically, so the app is always populated. For a persistent, pre-warmed cache use the scheduled refresh below.
docker compose up --build # http://localhost:8501If you outgrow Streamlit Cloud's ~1 GB free tier (e.g. you want a larger universe or the FinBERT ML stack on by default), the same app runs unchanged on a Docker Space (~16 GB RAM, less aggressive sleeping):
- Create a Space β SDK: Docker β push this repo (it auto-builds the
committed
Dockerfile). - Tell the Space which port the app serves by adding this to the Space's
README.mdYAML front-matter:app_port: 8501. - (Optional) add
FAR_PATENTSVIEW_API_KEYorFAR_LENS_API_TOKENas Space secrets to enable the Patents tab. Everything else stays key-less.
.github/workflows/scheduler.yml runs daily (and on-demand via the Actions
tab) to keep the hosted app showing live data with no refresh wait:
run_scorer.py --no-ml --export-snapshotfetches fresh data for the default universe and writes the warmed cache to a committed JSON snapshot,data/prewarm_cache.json.- The workflow commits that snapshot to the deploy branch, which triggers a Streamlit Cloud redeploy.
- On boot the app calls
seed_from_snapshot()(seestreamlit_app.py) to load the snapshot into its SQLite cache, so the Radar Screen renders real, recent figures immediately. With no snapshot it falls back to the synthetic sample seed.
Why a JSON snapshot rather than the SQLite file? Streamlit Cloud's filesystem
is ephemeral (the DB resets on restart) and binary SQLite makes messy git
diffs, so the committed, diff-friendly JSON is the durable hand-off between the
scheduler and the app. No secrets are required for any of this. To pre-warm
locally on demand: python run_scorer.py --export-snapshot.
Beyond the curated list, FreeAlphaRadar can scan the entire market and pick its own watchlist β "from all ~8,000 SEC filers, the top-10 under-the-radar names, ranked bestβworst." It's a two-stage offline funnel (never runs inside the live app) built on free SEC bulk data:
- Warehouse (
freealpharadar/warehouse/) β downloads SEC's free Financial Statement Data Sets (quarterly, 2009βpresent, every XBRL filer incl. delisted) into a gitignored DuckDB/Parquet store. - Stage 1 β bulk screen (
discovery/screen.py) β one DuckDB pass over all filers computing cheap fundamentals (revenue CAGR, margin trend, R&D intensity) with under-the-radar gates (small/mid revenue scale + sustained growth) β ~100-name shortlist. - Stage 2 β full scoring (
discovery/discover.py) β runs the existing 35-factor pipeline on the shortlist, applies a market-cap ceiling, and ranks β top-10. - Promote β rewrites
universe.txt, regenerates the prewarm snapshot, and writesdiscoveries/<date>.md. The live app then shows the self-discovered names.
pip install -r requirements.txt -r requirements-warehouse.txt # duckdb, pyarrow
make warehouse # build the store (needs network)
python -m freealpharadar.discovery run --top 10It runs weekly via .github/workflows/discover.yml (or on-demand), committing
the refreshed universe.txt + snapshot + report β so the deployed app's list
updates itself. Heavyweight by design (multi-GB store, needs network), opt-in,
and entirely outside the app's hot path.
make test # FAR_OFFLINE=1 pytest β no network, fully reproducible
make lint # black --check + isort --check
pre-commit install # black, isort, hygiene hooks on every commit~50+ tests run fully offline (FAR_OFFLINE=1, no network):
tests/test_scoring.pyβ normalisation, all 35 factors (incl. the SEC long-horizon ones), the engine, the FinBERT lexicon fallback, the universe loader, and the end-to-end offline pipeline on canned data.tests/test_warehouse.pyβ the bulk loader, DuckDB store, Stage-1 screen gates and discovery promotion against a synthetic SEC ZIP fixture (auto- skipped ifduckdb/pyarrowaren't installed).
There is nothing you must configure. A few knobs are exposed as environment variables for convenience:
| Variable | Default | Purpose |
|---|---|---|
FAR_OFFLINE |
false |
Force cache/sample-only mode (no network). |
FAR_DB_PATH |
data/freealpharadar.sqlite |
SQLite cache location. |
FAR_TTL_FUNDAMENTALS / _SEC / _PATENTS / _NEWS |
1d / 1w / 1w / 12h | Per-source cache TTLs (seconds). |
FAR_CONCURRENCY / FAR_MAX_RETRIES / FAR_HTTP_TIMEOUT |
5 / 4 / 30 |
Async fetch concurrency, retries, per-request timeout. |
FAR_FINBERT_MODEL |
ProsusAI/finbert |
HuggingFace sentiment model id. |
FAR_SEC_USER_AGENT |
research UA | Identifies you to SEC EDGAR (set a real contact to reduce throttling). |
FAR_PATENTSVIEW_API_KEY |
(unset) | Optional free PatentsView key β enables the Patents tab (lowest-friction for a US universe). For the GitHub Actions to include patents, add it as a repo Actions secret of the same name. |
FAR_LENS_API_TOKEN |
(unset) | Optional Lens.org token β alternative, global patent provider. The fetcher uses PatentsView if its key is set, else Lens, else skips patents. Set either to enable the Patents tab. |
FAR_LOG_LEVEL |
INFO |
Logging verbosity. |
FreeAlphaRadar is a research and educational tool. The bundled sample data is synthetic and the live data is provided "as is" from third-party public sources. Nothing here is investment advice. Do your own due diligence.
MIT β see LICENSE.