Welcome! This repository documents my technical evolution as I prepare for GSoC 2026. I am currently focused on Cybersecurity, with parallel tracks in Big Data and AI.
- Target: OWASP Python Honeypot
- Status: Analyzing core architecture, environment constraints, and log data for anomalies.
- Target: Distributed systems and high-throughput data processing
- Status: Explored Kafka message acknowledgments and log integrity concepts.
- Goal: AI-driven log anomaly detection and analysis using Isolation Forest.
| Date | Track | Focus Topic | Key Accomplishments | Detailed Log |
|---|---|---|---|---|
| Dec 22 | π οΈ Foundation | Environment & Git | Mastered Linux CLI, Git Rebase, and PDB theory. | Day 1 |
| Dec 23 | π‘οΈ Security | Codebase Tracing | Verified ohp.py entry point; Traced startup flow via PDB; Identified dependency constraints. | Day 2 |
| Jan 1 | π‘οΈ/π/π€ Multi-Track | Security, Big Data & AI | β Docker handled Windows environment limitations; β Learned Kafka ACK importance for log reliability; β Fetched honeypot logs and applied Isolation Forest; β Handled KeyError and empty DataFrame issues; β Generated screenshots for logs and anomalies. | Day 3 |
Jan 1, 2026
- Docker Advantage π³: Solved dependency/version issues that failed on Windows natively.
- Observation π: Running the honeypot without Docker triggered
ModuleNotFoundErrorand Flask/Werkzeug conflicts. - Resolution β : Recognized that missing external services (Elasticsearch) are environment constraints, not bugs.
Dec 23β24, 2025 (OWASP Python Honeypot β Reference Work)
- Dynamic Tracing π΅οΈββοΈ: Moved from static code reading to live execution tracing using
PDB. - Logic Verification π: Identified that
elasticsearchconnection failure is an intentionalsys.exit(1)design, not a random crash.
Jan 1, 2026
- Kafka ACK Mechanism π¬: Learned that ACK ensures log messages are reliably stored and not lost.
- Impact π‘: Understanding ACK guarantees helps maintain log integrity when feeding data into AI models.
Jan 1, 2026
-
Data Pipeline ποΈ: Fetched honeypot logs from Elasticsearch into pandas DataFrame.
-
Feature Engineering π§:
- Converted
eventto numeric labels. - Converted
timestampto numeric for modeling.
- Converted
-
Anomaly Detection π΅οΈββοΈ: Applied Isolation Forest on numeric features to flag anomalies.
-
Bugs & Solutions π:
KeyError: 'event_type'β column did not exist, switched toevent.- Empty DataFrame initially β added a synthetic
attack_logto test workflow.
-
Visualization πΈ: Screenshots captured:
day3_df_preview.pngβ preview of loaded logs.day3_anomaly.pngβ rows flagged as anomalies.
- Current Status: Multi-track progress on OWASP Honeypot logs, Kafka concepts, and AI anomaly detection.
- GitHub: chenjingen-jane
"Reading code is like reading a map; debugging is like walking the path."