DOIs of the Artifacts

DOI	Artifact
Java corpus	https://doi.org/10.7488/ds/1690
C corpus	https://doi.org/10.5281/zenodo.3628775
Python corpus	https://doi.org/10.5281/zenodo.3628784
Java, pre-processed	https://doi.org/10.5281/zenodo.3628665
C, pre-processed	https://doi.org/10.5281/zenodo.3628638
Python, pre-processed	https://doi.org/10.5281/zenodo.3628636
Trained models	https://doi.org/10.5281/zenodo.3628628

Code used to run experiments

Codeprep library (for vocabulary study): https://github.com/giganticode/codeprep

Open-vocabulary Neural LM: https://github.com/mast-group/OpenVocabCodeNLM

Paper

If you jse the artifacts, please cite the paper:

@article{karampatsis2020big,
 title={Big Code!= Big Vocabulary: Open-Vocabulary Models for Source Code},
 author={Karampatsis, Rafael-Michael and Babii, Hlib and Robbes, Romain and Sutton, Charles and Janes, Andrea},
 journal={arXiv preprint arXiv:2003.07914},
 year={2020}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DOIs of the Artifacts

Code used to run experiments

Paper

Files

README.md

Latest commit

History

README.md

File metadata and controls

DOIs of the Artifacts

Code used to run experiments

Paper