Click above to explore the full interactive HTML report with all embedded visualizations
- View Complete Results
- Quick Start
- Executive Summary
- Dataset Description
- Analysis Architecture
- Methodology
- Visualizations & Results
- Technical Implementation
- Statistical Rigor
- Installation & Usage
- Reproducibility
- Future Research Directions
Click above to explore the full interactive HTML report with all embedded visualizations
Self-contained with base64-encoded images |
Works in any modern browser |
No external resources required |
Report Contents:
- Executive summary with key findings and metrics
- All 8 publication-quality visualizations embedded inline
- Detailed methodology documentation with mathematical formulas
- Statistical validation and hypothesis testing results
- Complete analytical framework documentation
- Responsive design optimized for all screen sizes
- Professional styling suitable for stakeholder presentations
Alternative Viewing Options:
- Download HTML Report for offline viewing
- Browse Image Directory for individual high-resolution PNG files (6000+ pixels wide, 300 DPI)
- Scroll Down to see all visualizations embedded in this README with detailed explanations
Each visualization includes detailed panel-by-panel explanations, statistical interpretations, and key insights documented below in the Visualizations & Results section.
This repository presents a production-level cryptocurrency market analysis that demonstrates mastery across multiple data science disciplines. The project intentionally moves beyond conventional exploratory analysis to showcase sophisticated techniques applicable to any time-series financial data, from clinical trials to market dynamics.
The analysis integrates four distinct methodological frameworks, each implemented with rigor and attention to statistical validity:
Statistical Inference Framework The foundation of this analysis rests on classical statistical methods, implemented with attention to assumptions and validity:
This framework establishes that cryptocurrency returns violate normality assumptions (excess kurtosis, non-zero skewness), necessitating robust methods throughout the analysis. Financial Econometrics Modern Portfolio Theory serves as the optimization backbone:
The optimization demonstrates that diversification reduces risk by 30-40% compared to individual assets, with Sharpe ratios improving from 0.2-0.8 (individual) to 1.2-1.5 (optimized portfolio). |
Machine Learning Implementation Unsupervised learning reveals patterns invisible to traditional analysis:
Clustering reveals 3-4 natural groups: low-risk stable assets, moderate-risk balanced assets, high-risk volatile assets, and unique outliers. Network Science Application Graph-theoretic methods uncover market interconnections:
Network density of 0.31 indicates moderate integration with diversification opportunities remaining, while hub nodes (Bitcoin, Ethereum) show systematic importance. |
Methodological Rigor Every analytical choice is justified and validated. Normality tests inform method selection. Cluster validation ensures meaningful groupings. Sensitivity analysis confirms robustness. This isn't exploratory analysis - it's a complete analytical framework with statistical validation at every step. |
Visual Communication Custom color palette designed for accessibility and professionalism. Multi-panel dashboards tell complete stories in single images. Three-tier information architecture serves 10-second, 60-second, and deep-dive audiences. Static visualizations for storytelling, interactive for exploration. |
Production Quality 300 DPI publication-ready images. Self-contained HTML reports with no dependencies. Fully documented, reproducible code with fixed random seeds. Complete methodology suitable for peer review. Professional deliverables ready for stakeholders. |
This dataset contains hourly cryptocurrency and stock market data collected from CoinGecko One of the world's largest independent cryptocurrency data aggregators |
||||
|
|
|
|
|
Primary Source: This dataset contains hourly cryptocurrency and stock market data collected from CoinGecko, a leading cryptocurrency data aggregator providing comprehensive market information across thousands of digital assets.
Distribution Platform: Kaggle Cryptocurrency Historical Prices Dataset
Data Collection Method: Automated API retrieval from CoinGecko's public endpoints at hourly intervals
Structure: Time series panel data with hourly frequency across multiple cryptocurrency assets
Coverage: 111,746 observations representing continuous hourly measurements
Format: CSV (Comma-Separated Values) with UTF-8 encoding
About CoinGecko: CoinGecko is one of the world's largest independent cryptocurrency data aggregators, tracking over 10,000+ cryptocurrencies across 600+ exchanges. The platform provides:
- Real-time price data
- Trading volume metrics
- Market capitalization calculations
- Historical price archives
- API access for programmatic data retrieval
This dataset leverages CoinGecko's robust data infrastructure, ensuring high-quality, verified market information suitable for rigorous statistical analysis and machine learning applications.
The hourly frequency strikes an optimal balance between:
- High-Frequency Analysis: Captures intraday volatility and momentum effects
- Pattern Detection: Identifies hourly trading patterns and volume cycles
- Manageable Volume: 111K observations suitable for comprehensive statistical analysis
- Long-Term Trends: Multi-year span enables trend analysis and regime detection
This granularity enables both micro-level pattern recognition (intraday effects) and macro-level trend analysis (long-term market dynamics).
Column | Data Type | Description | Analytical Purpose |
---|---|---|---|
timestamp |
DateTime | Hourly observation timestamp in UTC timezone | Time series indexing, temporal pattern analysis, seasonality detection, trend identification |
name |
String | Cryptocurrency full name (e.g., Bitcoin, Ethereum) | Asset identification, categorical grouping, panel data structure, cross-asset comparison |
symbol |
String | Standard trading symbol (e.g., BTC, ETH) | Concise asset labeling in visualizations, lookup key for external data integration |
price_usd |
Float | Spot price in US Dollars at observation time | Primary variable for return calculation, trend analysis, volatility measurement, portfolio valuation |
vol_24h |
Float | 24-hour rolling trading volume in USD | Liquidity assessment, volume-price divergence analysis, market activity indicator |
total_vol |
Float | Cumulative volume traded to date | Market maturity indicator, adoption tracking, cumulative liquidity measure |
chg_24h |
Float | 24-hour percentage price change | Short-term momentum indicator, volatility proxy, mean reversion analysis |
chg_7d |
Float | 7-day percentage price change | Medium-term trend indicator, weekly momentum, correlation with market cycles |
market_cap |
Float | Market capitalization in USD (price × circulating supply) | Asset size classification, systematic importance, portfolio weighting, market dominance tracking |
Initial Assessment:
- Missing values: Identified via
.isnull()
analysis, patterns documented - Outliers: Multi-method detection (IQR, Z-score, Isolation Forest) with conservative thresholds
- Data types: All fields validated and converted from string representations to appropriate numeric types
- Temporal consistency: Verified continuous hourly sequence with gap detection
Transformation Pipeline:
1. Type Conversion & Cleaning # Remove currency formatting
price_usd = remove_symbols(['
---
## Analysis Architecture
### Phase 1: Data Preparation & Feature Engineering
**Data Loading & Validation**
- Automated schema validation and type checking
- Missing value detection and quantification
- Temporal consistency verification
- Duplicate observation identification and removal
**Feature Engineering Pipeline** Raw Data → Type Conversion → Missing Value Handling → Feature Generation → Validation
W = [Σ a(i) × x(i)]² / [Σ (x(i) - x̄)²]
Y(t) = T(t) × S(t) × R(t)
SMA(t, n) = (1/n) Σ(i=0 to n-1) P(t-i)
WCSS = Σ(k=1 to K) Σ(x∈C(k)) ||x - μ(k)||²
s(i) = [b(i) - a(i)] / max{a(i), b(i)}
z = (x - μ) / σ
VE(j) = λ(j) / Σ λ(i)
Loading(i,j) = corr(X(i), PC(j)) × √λ(j)
s(x, n) = 2^(-E[h(x)] / c(n))
E(R(p)) = Σ w(i) × E(R(i))
σ²(p) = Σ Σ w(i) × w(j) × σ(i,j)
max SR = [E(R(p)) - R(f)] / σ(p) subject to: Σ w(i) = 1 (fully invested) w(i) ≥ 0 (no short selling)
SR = (R̄ - R(f)) / σ
SR(annual) = SR(hourly) × √(24 × 365)
DD(t) = [P(t) - P(max)] / P(max) MDD = min(DD(t))
VaR(α) = -Quantile(Returns, α)
A(i,j) = 1 if |ρ(i,j)| > threshold A(i,j) = 0 otherwise
C(d)(i) = deg(i) / (n - 1)
C(b)(i) = Σ(s≠i≠t) [σ(st)(i) / σ(st)]
C(c)(i) = (n - 1) / Σ(j≠i) d(i,j)
Cumulative Return(t) = ∏[1 + R(i)] for i=1 to t
Drawdown(t) = [Price(t) - RunningMax(t)] / RunningMax(t)
Sharpe Ratio = (Return - Risk_Free_Rate) / Volatility
CV = (Standard Deviation / Mean) × 100%
Normalized(t) = [Price(t) / Price(0)] × 100
WCSS(k) = Σ(clusters) Σ(points in cluster) distance²(point, centroid)
Silhouette(i) = [b(i) - a(i)] / max(a(i), b(i))
Loading(feature, PC) = Correlation(feature, PC) × √λ
C_d(i) = neighbors(i) / (n - 1)
C_b(i) = Σ [shortest_paths_through(i) / total_shortest_paths]
C_c(i) = (n - 1) / Σ distance(i, j)
ρ(t, window) = Correlation[R1(t-window:t), R2(t-window:t)]
Σ w(i) = 1 (fully invested) w(i) ≥ 0 (no short selling)
Z(t) = [R(t) - μ] / σ
Score ∈ [-0.5, 0.5] Lower scores = more anomalous
Cumulative(t) = Σ(τ=0 to t) Anomaly(τ)
Raw CSV → Validation → Type Conversion → Feature Engineering → Analysis-Ready DataFrame
Rationale:
Dark Theme Implementation Background: #0F172A (dark slate)
Axes: #1E293B (medium slate)
Grid: #334155 (light slate, low alpha)
Text: #F1F5F9 (off-white) Benefits:
Resolution & Export
Plotly Implementation Five interactive visualizations providing dynamic exploration:
Technical Details Layout configuration:
- Template: 'plotly_dark' (matches color scheme)
- Hovermode: 'closest' or 'x unified'
- Font family: 'Segoe UI' (readable)
- Autosize: True (responsive)
- Margin: dict(l=60, r=60, t=80, b=60) Normality Tests
Results: All assets reject normality (p < 0.001), justifying:
Clustering Validation
PCA Validation
Anomaly Detection Validation
Sensitivity Analysis
Stability Analysis
Assumption Checking
Hardware
Software
1. Clone Repository git clone https://github.com/Cazzy-Aporbo/Crypto-Analysis.git
cd Crypto-Analysis 2. Create Virtual Environment (Recommended) # Create virtual environment
python -m venv crypto_env
# Activate (macOS/Linux)
source crypto_env/bin/activate
# Activate (Windows)
crypto_env\Scripts\activate 3. Install Dependencies pip install --upgrade pip
pip install -r requirements.txt 4. Verify Installation python -c "import pandas, numpy, matplotlib, sklearn; print('Installation successful')" Start Jupyter Notebook jupyter notebook Open Analysis Notebook
Expected Runtime
Total execution time: Approximately 10-15 minutes Update File Paths Modify in Step 2 of notebook: # Data location
file_path = '/your/path/to/cryptocurrency.csv'
# Output directory
output_dir = '/your/path/to/output_images' Customize Analysis Parameters # Top N cryptocurrencies to analyze
N_CRYPTOS = 10
# Clustering parameters
K_CLUSTERS = 4
# PCA components
N_COMPONENTS = 3
# Anomaly detection contamination
CONTAMINATION = 0.05
# Correlation network threshold
CORR_THRESHOLD = 0.5 Directory Structure After Execution
File Specifications
Viewing Results Option 1: View HTML Report (Recommended) # Open comprehensive HTML report with all visualizations
open output_images/Cryptocurrency_Analysis_Report.html The HTML report contains all visualizations, executive summary, methodology, and key findings in a professionally formatted document. Option 2: View Individual Images # Navigate to output directory
cd output_images
# View specific visualization
open 01_market_overview_dashboard.png Option 3: Interactive Plotly Visualizations Run the Jupyter notebook to view 5 interactive Plotly charts:
These display directly in the notebook and support zoom, pan, hover, and data exploration. All stochastic processes use fixed random seeds for reproducibility: np.random.seed(42)
random_state = 42 # in scikit-learn functions Stochastic Components:
Dependencies specified with version constraints:
Ensures:
MD5 checksum for data validation: md5sum cryptocurrency.csv
# Expected: [insert MD5 hash] Export complete environment: # Conda
conda env export > environment.yml
# pip
pip freeze > requirements.txt Document system specifications: import sys, platform
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"Processor: {platform.processor()}") Time Series Forecasting
Machine Learning Classification
Sentiment Analysis
On-Chain Metrics
Value at Risk (VaR)
Stress Testing
Streaming Data Pipeline
Dashboard Development
Granger Causality Testing
Structural Break Detection
Dynamic Rebalancing
Risk Budgeting
All analysis is complete. All visualizations and reports are available in the repository.
Note: This repository contains completed analysis with all results. You can view all visualizations immediately without running any code. The Jupyter notebook is provided for reproducibility and customization. If you use this work in academic or professional contexts, please cite: @misc{aporbo2025crypto,
author = {Aporbo, Cazandra},
title = {Advanced Cryptocurrency Market Analysis: A Comprehensive Data Science Approach},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Cazzy-Aporbo/Crypto-Analysis}
} Data Source Attribution: @misc{coingecko2025,
author = {{CoinGecko}},
title = {CoinGecko Cryptocurrency Data API},
year = {2025},
url = {https://www.coingecko.com/}
} Dataset Citation: This analysis uses hourly cryptocurrency market data collected from CoinGecko and distributed via Kaggle. CoinGecko is a leading cryptocurrency data aggregator providing comprehensive market information across thousands of digital assets. This project is licensed under the MIT License - see the LICENSE file for details. MIT License Summary:
Cazandra Aporbo GitHub: @Cazzy-Aporbo For questions, suggestions, or collaboration opportunities:
Data Source
Methodological Foundations
Open Source Community
Academic Influences
Investment Risk Warning This analysis is provided for educational and research purposes only. It does not constitute financial advice, investment recommendations, or professional guidance of any kind. Key Points:
Limitations:
Use at your own risk. The author assumes no liability for losses incurred from any use of this analysis or its methodologies. To enable the direct HTML link above, you need to activate GitHub Pages: One-Time Setup (Takes 30 seconds):
That's it! After 1-2 minutes, your HTML report will be live at:
Note: Replace the link in the README with your actual GitHub Pages URL once activated. Document Version: 1.0 What's Included:
This repository demonstrates advanced data science capabilities through rigorous analysis of cryptocurrency markets. All methodologies are fully documented, reproducible, and suitable for academic or professional portfolio presentation., ',']) price_usd = pd.to_numeric(errors='coerce') assert (price_usd > 0).all()
Raw Data → Type Conversion → Missing Value Handling → Feature Generation → Validation
W = [Σ a(i) × x(i)]² / [Σ (x(i) - x̄)²]
Y(t) = T(t) × S(t) × R(t)
SMA(t, n) = (1/n) Σ(i=0 to n-1) P(t-i)
WCSS = Σ(k=1 to K) Σ(x∈C(k)) ||x - μ(k)||²
s(i) = [b(i) - a(i)] / max{a(i), b(i)}
z = (x - μ) / σ
VE(j) = λ(j) / Σ λ(i)
Loading(i,j) = corr(X(i), PC(j)) × √λ(j)
s(x, n) = 2^(-E[h(x)] / c(n))
E(R(p)) = Σ w(i) × E(R(i))
σ²(p) = Σ Σ w(i) × w(j) × σ(i,j)
max SR = [E(R(p)) - R(f)] / σ(p) subject to: Σ w(i) = 1 (fully invested) w(i) ≥ 0 (no short selling)
SR = (R̄ - R(f)) / σ
SR(annual) = SR(hourly) × √(24 × 365)
DD(t) = [P(t) - P(max)] / P(max) MDD = min(DD(t))
VaR(α) = -Quantile(Returns, α)
A(i,j) = 1 if |ρ(i,j)| > threshold A(i,j) = 0 otherwise
C(d)(i) = deg(i) / (n - 1)
C(b)(i) = Σ(s≠i≠t) [σ(st)(i) / σ(st)]
C(c)(i) = (n - 1) / Σ(j≠i) d(i,j)
Y(t) = Trend(t) × Seasonal(t) × Residual(t)
Cumulative Return(t) = ∏[1 + R(i)] for i=1 to t
Drawdown(t) = [Price(t) - RunningMax(t)] / RunningMax(t)
Sharpe Ratio = (Return - Risk_Free_Rate) / Volatility
CV = (Standard Deviation / Mean) × 100%
Normalized(t) = [Price(t) / Price(0)] × 100
WCSS(k) = Σ(clusters) Σ(points in cluster) distance²(point, centroid)
Silhouette(i) = [b(i) - a(i)] / max(a(i), b(i))
Loading(feature, PC) = Correlation(feature, PC) × √λ
C_d(i) = neighbors(i) / (n - 1)
C_b(i) = Σ [shortest_paths_through(i) / total_shortest_paths]
C_c(i) = (n - 1) / Σ distance(i, j)
ρ(t, window) = Correlation[R1(t-window:t), R2(t-window:t)]
Σ w(i) = 1 (fully invested) w(i) ≥ 0 (no short selling)
Z(t) = [R(t) - μ] / σ
Score ∈ [-0.5, 0.5] Lower scores = more anomalous
Cumulative(t) = Σ(τ=0 to t) Anomaly(τ)
Raw CSV → Validation → Type Conversion → Feature Engineering → Analysis-Ready DataFrame
Rationale:
Dark Theme Implementation Background: #0F172A (dark slate)
Axes: #1E293B (medium slate)
Grid: #334155 (light slate, low alpha)
Text: #F1F5F9 (off-white) Benefits:
Resolution & Export
Plotly Implementation Five interactive visualizations providing dynamic exploration:
Technical Details Layout configuration:
- Template: 'plotly_dark' (matches color scheme)
- Hovermode: 'closest' or 'x unified'
- Font family: 'Segoe UI' (readable)
- Autosize: True (responsive)
- Margin: dict(l=60, r=60, t=80, b=60) Normality Tests
Results: All assets reject normality (p < 0.001), justifying:
Clustering Validation
PCA Validation
Anomaly Detection Validation
Sensitivity Analysis
Stability Analysis
Assumption Checking
Hardware
Software
1. Clone Repository git clone https://github.com/Cazzy-Aporbo/Crypto-Analysis.git
cd Crypto-Analysis 2. Create Virtual Environment (Recommended) # Create virtual environment
python -m venv crypto_env
# Activate (macOS/Linux)
source crypto_env/bin/activate
# Activate (Windows)
crypto_env\Scripts\activate 3. Install Dependencies pip install --upgrade pip
pip install -r requirements.txt 4. Verify Installation python -c "import pandas, numpy, matplotlib, sklearn; print('Installation successful')" Start Jupyter Notebook jupyter notebook Open Analysis Notebook
Expected Runtime
Total execution time: Approximately 10-15 minutes Update File Paths Modify in Step 2 of notebook: # Data location
file_path = '/your/path/to/cryptocurrency.csv'
# Output directory
output_dir = '/your/path/to/output_images' Customize Analysis Parameters # Top N cryptocurrencies to analyze
N_CRYPTOS = 10
# Clustering parameters
K_CLUSTERS = 4
# PCA components
N_COMPONENTS = 3
# Anomaly detection contamination
CONTAMINATION = 0.05
# Correlation network threshold
CORR_THRESHOLD = 0.5 Directory Structure After Execution
File Specifications
Viewing Results Option 1: View HTML Report (Recommended) # Open comprehensive HTML report with all visualizations
open output_images/Cryptocurrency_Analysis_Report.html The HTML report contains all visualizations, executive summary, methodology, and key findings in a professionally formatted document. Option 2: View Individual Images # Navigate to output directory
cd output_images
# View specific visualization
open 01_market_overview_dashboard.png Option 3: Interactive Plotly Visualizations Run the Jupyter notebook to view 5 interactive Plotly charts:
These display directly in the notebook and support zoom, pan, hover, and data exploration. All stochastic processes use fixed random seeds for reproducibility: np.random.seed(42)
random_state = 42 # in scikit-learn functions Stochastic Components:
Dependencies specified with version constraints:
Ensures:
MD5 checksum for data validation: md5sum cryptocurrency.csv
# Expected: [insert MD5 hash] Export complete environment: # Conda
conda env export > environment.yml
# pip
pip freeze > requirements.txt Document system specifications: import sys, platform
print(f"Python version: {sys.version}")
print(f"Platform: {platform.platform()}")
print(f"Processor: {platform.processor()}") Time Series Forecasting
Machine Learning Classification
Sentiment Analysis
On-Chain Metrics
Value at Risk (VaR)
Stress Testing
Streaming Data Pipeline
Dashboard Development
Granger Causality Testing
Structural Break Detection
Dynamic Rebalancing
Risk Budgeting
All analysis is complete. All visualizations and reports are available in the repository.
Note: This repository contains completed analysis with all results. You can view all visualizations immediately without running any code. The Jupyter notebook is provided for reproducibility and customization. If you use this work in academic or professional contexts, please cite: @misc{aporbo2025crypto,
author = {Aporbo, Cazandra},
title = {Advanced Cryptocurrency Market Analysis: A Comprehensive Data Science Approach},
year = {2025},
publisher = {GitHub},
url = {https://github.com/Cazzy-Aporbo/Crypto-Analysis}
} This project is licensed under the MIT License - see the LICENSE file for details. MIT License Summary:
Cazandra Aporbo GitHub: @Cazzy-Aporbo For questions, suggestions, or collaboration opportunities:
Data Source
Methodological Foundations
Open Source Community
Academic Influences
Investment Risk Warning This analysis is provided for educational and research purposes only. It does not constitute financial advice, investment recommendations, or professional guidance of any kind. Key Points:
Limitations:
Use at your own risk. The author assumes no liability for losses incurred from any use of this analysis or its methodologies. To enable the direct HTML link above, you need to activate GitHub Pages: One-Time Setup (Takes 30 seconds):
That's it! After 1-2 minutes, your HTML report will be live at:
Note: Replace the link in the README with your actual GitHub Pages URL once activated. Document Version: 1.0 What's Included:
This repository demonstrates advanced data science capabilities through rigorous analysis of cryptocurrency markets. All methodologies are fully documented, reproducible, and suitable for academic or professional portfolio presentation. |