Skip to content

chenjingen-jane/gsoc-prep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ GSoC 2026: Multi-Track Preparation Journey

Welcome! This repository documents my technical evolution as I prepare for GSoC 2026. I am currently focused on Cybersecurity, with parallel tracks in Big Data and AI.

πŸ›€οΈ Strategy & Tracks

πŸ›‘οΈ Track A: Cyber Security (Active)

  • Target: OWASP Python Honeypot
  • Status: Analyzing core architecture, environment constraints, and log data for anomalies.

πŸ“Š Track B: Big Data & Infra (Active)

  • Target: Distributed systems and high-throughput data processing
  • Status: Explored Kafka message acknowledgments and log integrity concepts.

πŸ€– Track C: AI & LLM Integration (Active)

  • Goal: AI-driven log anomaly detection and analysis using Isolation Forest.

πŸ“ˆ Consolidated Progress Dashboard

Date Track Focus Topic Key Accomplishments Detailed Log
Dec 22 πŸ› οΈ Foundation Environment & Git Mastered Linux CLI, Git Rebase, and PDB theory. Day 1
Dec 23 πŸ›‘οΈ Security Codebase Tracing Verified ohp.py entry point; Traced startup flow via PDB; Identified dependency constraints. Day 2
Jan 1 πŸ›‘οΈ/πŸ“Š/πŸ€– Multi-Track Security, Big Data & AI βœ… Docker handled Windows environment limitations; βœ… Learned Kafka ACK importance for log reliability; βœ… Fetched honeypot logs and applied Isolation Forest; βœ… Handled KeyError and empty DataFrame issues; βœ… Generated screenshots for logs and anomalies. Day 3

πŸ› οΈ Technical Insights (Key Milestones)

πŸ›‘οΈ Security Track

Jan 1, 2026

  • Docker Advantage 🐳: Solved dependency/version issues that failed on Windows natively.
  • Observation πŸ‘€: Running the honeypot without Docker triggered ModuleNotFoundError and Flask/Werkzeug conflicts.
  • Resolution βœ…: Recognized that missing external services (Elasticsearch) are environment constraints, not bugs.

Dec 23–24, 2025 (OWASP Python Honeypot – Reference Work)

  • Dynamic Tracing πŸ•΅οΈβ€β™€οΈ: Moved from static code reading to live execution tracing using PDB.
  • Logic Verification πŸ”: Identified that elasticsearch connection failure is an intentional sys.exit(1) design, not a random crash.

πŸ“Š Big Data Track

Jan 1, 2026

  • Kafka ACK Mechanism πŸ“¬: Learned that ACK ensures log messages are reliably stored and not lost.
  • Impact πŸ’‘: Understanding ACK guarantees helps maintain log integrity when feeding data into AI models.

πŸ€– AI Track

Jan 1, 2026

  • Data Pipeline πŸ—‚οΈ: Fetched honeypot logs from Elasticsearch into pandas DataFrame.

  • Feature Engineering πŸ”§:

    • Converted event to numeric labels.
    • Converted timestamp to numeric for modeling.
  • Anomaly Detection πŸ•΅οΈβ€β™‚οΈ: Applied Isolation Forest on numeric features to flag anomalies.

  • Bugs & Solutions 🐞:

    • KeyError: 'event_type' β†’ column did not exist, switched to event.
    • Empty DataFrame initially β†’ added a synthetic attack_log to test workflow.
  • Visualization πŸ“Έ: Screenshots captured:

    • day3_df_preview.png β€” preview of loaded logs.
    • day3_anomaly.png β€” rows flagged as anomalies.

πŸ“¬ Contact & Open Source Profile

  • Current Status: Multi-track progress on OWASP Honeypot logs, Kafka concepts, and AI anomaly detection.
  • GitHub: chenjingen-jane

"Reading code is like reading a map; debugging is like walking the path."

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published