gdax-orderbook-ml

Application of machine learning to the GDAX orderbook using a stacked bidirectional LSTM/GRU model to predict new support and resistance on a 15-minute basis; Currently under heavy development.

Model Structure (visual):

General project API/data structure:

General Project Requirements

Pandas, MongoDB, Git LFS, Feather, Keras, Tensorflow, Scikit-Learn
See requirements.txt for full list of required packages
- pip **install** -r **requirements**.**txt**

Project/File Structure

Latest notebook file(s) with project code:

9_data_pipeline_development.ipynb:

Development of data pipelines and optimization of data from MongoDB instance to ML model pretraining
Removal of deprecated packages + base package version upgrade (i.e. Pandas)
Development groundwork for automation pipeline for automated hourly data scrape, cycling, and training for model through segregated instance or live online-based model

8_program_structure_improvement.ipynb:

Even further refinement to program structure
- Function scope and structure & function creation for common operations
- Parsing of raw data into 4 separate l2 update (4 consecutive 15 minute l2update segments)
Machine learning model refinement & training + model structure updates
~~Refinement of data storage format/file type~~
- ~~Researching/testing implementations to move away from .csv due to file I/O limitations:~~
  - ~~Research into Pickle/h5py/msgpack/feather to saving dataframe contents to disk is underway~~
Currently broken:
- API calls from the gdax-python API for candlestick data for the first timestamp of each l2update_15min file (1 hour of l2 updates split into four 15-minute increments) currently not working; see pull requests
- Implementation of historical candlestick/OHLC data from 'gdax-ohlc-import' work in progress
- ~~HDF5 format for data storage would be ideal over msgpack format, except for how HDF5 handles object references for I/O operations~~

6_raw_dataset_update.ipynb:

Notebook file used to scrape/update raw_data for both MongoDB and csv format, 1 hour of websocket data from GDAX)
- L2 Snapshot + L2 Updates without overhead of Match data response (does not have Match data; test data has Match data and adds significant I/O overhead)

Folder/Repository Structure:

'gdax-python' and 'gdax-ohlc-import' are repositories imported as Git Submodules:
- After cloning the main project repository, the following command is required to ensure that the submodule repository contents are pulled/present: git submodule update --init --recursive
- .gitmodules file is file for submodule parameters
'model_saved' folder:
- Contains .json and .h5 files for current and previous Tensorflow/Keras models (trained model and model weight export/import)
'documentation' folder:
- 'rds_ml_yu_01b_revised.pptx' is a powerpoint presentation summarizing the key technical components, scope, limitations, of this project.
- 'design_mockup' folder:
  - Contains diagrams, drawings, and notes used in the process of model and project design during prototyping, testing, and expansion.
- 'design_explanation' folder:
  - Contains 8 pages of detailed explanations and diagrams in regards to both project/model structure and design.
- 'previous_revisions' folder:
  - Contains previous/outdated versions of readme documentation and powerpoint presentations documenting the nature of this project
'saved_charts' folder:
- Output of generate_chart() for candlestick chart with visualized autogenerated support and resistance from autoSR()
- Screenshot of model layer structure in text format
- Graphviz output of model layer structure
'test_data' folder:
- Only has 10 minutes of scraped data for testing, development, and model input prototyping (snapshot + l2 response updates)
'raw_data' folder:
- 1 hour of scraped data (snapshot + l2 response updates)
  - l2update_15min_1-4: 1 hour of l2 updates split into four 15-minute increments
  - mongo_raw.json: 1 hour of scraped data from the gdax-python API websocket in raw mongoDB format
'raw_data_10h' folder:
- 10 hours of scraped data:
  - l2update_10h, request_log_10h, and snapshot_asks/bids_10h
  - 10 hours of scraped data in raw mongoDB export (JSON): mongo_raw_10h.json
- Data in .msg (MessagePack) format currently experimental/testing as alternative to .csv format for I/O operations
'archived_ipynb' folder:
- Contains previous Jupyter Notebook files used in the construction, design, and prototyping of components of this project.
  - Jupyter Notebook (.ipynb) notebook files 1-5 & 7
  - Each successive notebook was used to construct and test whether at each "stage" if a project of this kind of scope would even be technically possible.
- Successive numbered notebooks generally improve and are iterative in nature on previous notebook files for this project.*

Publications, whitepapers, and other resources referenced for model structure and design:

Misc. Technical Reference

###Repositories checked out as Git Submodules

gdax-python
gdax-ohlc-import (Currently not fully used/implemented)

License

- gdax-orderbook-ml: BSD-3 Licensed, Copyright (c) 2018 Timothy Yu
- gdax-python: MIT Licensed, Copyright (c) 2017 Daniel Paquin 
- gdax-ohlc-import: MIT Licensed, Copyright (c) 2018 Arthur Koziel
- autoSR() function adapted from nakulnayyar/SupResGenerator, Copyright (c) 2016 Nakul Nayyar (https://github.com/nakulnayyar/SupResGenerator)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gdax-orderbook-ml

General Project Requirements

Project/File Structure

Folder/Repository Structure:

Publications, whitepapers, and other resources referenced for model structure and design:

Misc. Technical Reference

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
archived_ipynb		archived_ipynb
documentation		documentation
gdax-ohlc-import @ 502d47a		gdax-ohlc-import @ 502d47a
gdax-python @ 86f7e64		gdax-python @ 86f7e64
model_saved		model_saved
raw_data		raw_data
raw_data_10h		raw_data_10h
raw_data_pipeline		raw_data_pipeline
saved_charts		saved_charts
test_data		test_data
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
6_raw_dataset_update.ipynb		6_raw_dataset_update.ipynb
8_program_structure_improvement.ipynb		8_program_structure_improvement.ipynb
9_data_pipeline_development.ipynb		9_data_pipeline_development.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

metricle/gdax-orderbook-ml

Folders and files

Latest commit

History

Repository files navigation

gdax-orderbook-ml

General Project Requirements

Project/File Structure

Folder/Repository Structure:

Publications, whitepapers, and other resources referenced for model structure and design:

Misc. Technical Reference

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages