Skip to content

Latest commit

 

History

History
32 lines (25 loc) · 1.15 KB

README.md

File metadata and controls

32 lines (25 loc) · 1.15 KB

DOIs of the Artifacts

DOI Artifact
Java corpus https://doi.org/10.7488/ds/1690
C corpus https://doi.org/10.5281/zenodo.3628775
Python corpus https://doi.org/10.5281/zenodo.3628784
Java, pre-processed https://doi.org/10.5281/zenodo.3628665
C, pre-processed https://doi.org/10.5281/zenodo.3628638
Python, pre-processed https://doi.org/10.5281/zenodo.3628636
Trained models https://doi.org/10.5281/zenodo.3628628

Code used to run experiments

Codeprep library (for vocabulary study): https://github.com/giganticode/codeprep

Open-vocabulary Neural LM: https://github.com/mast-group/OpenVocabCodeNLM

Paper

If you jse the artifacts, please cite the paper:

@article{karampatsis2020big,
 title={Big Code!= Big Vocabulary: Open-Vocabulary Models for Source Code},
 author={Karampatsis, Rafael-Michael and Babii, Hlib and Robbes, Romain and Sutton, Charles and Janes, Andrea},
 journal={arXiv preprint arXiv:2003.07914},
 year={2020}
}