-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAbstract.tex
49 lines (27 loc) · 2.4 KB
/
Abstract.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
\documentclass[12pt,a4paper]{article}
\begin{document}
%\title{Proposal Title}
%\author{Your name}
%\date{}
%\maketitle
\noindent{\bf Project}: Analysis of Competitive Codebases
\vspace*{.5cm}
\noindent{\bf Students Names}: Aashish Kumar Jayant, Animesh Baranawal, Shikhar Bharadwaj
\vspace*{.5cm}
\noindent{\bf Problem Statement}: To analyze correlation between coding style and coding proficiency, and whether coding styles show regional variations.
\vspace*{.5cm}
\noindent{\bf Data sets}: Data for top 1000 performers in 5 Division-2 contests was scraped from Codeforces, a competitive coding platform. Total number of datapoints is $\sim 18000$ codes.
\vspace*{.5cm}
\noindent{\bf Abstract}: An important employability indicator in software field is coding proficiency. Codeforces is a popular platform enabling one to practice coding skills through regular contests. It also assigns a proficiency rating to every user based on contest performances. Performing such an analysis can help in giving better feedback to novices in what structure they should use and which APIs they should use more often.
\vspace*{.5cm}
\noindent{\bf Approach}: Approaches taken in increasing complexity:
a) Extracting simple features like function calls, variables declared, number of macros etc.
b) Using tree edit distance between abstract syntax trees of two different codebases to identify coding style similarity.
c) Extracting features: low level- tokens used, and high level- code structure via doc2vec embedding of the abstract syntax tree.
\vspace*{.5cm}
\noindent{\bf Conclusion}: We analyze correlation between coding style and coding proficiency and whether coding styles varies across regions. We are able to find coding style difference across regions but are unable to find any significant correlation between coding proficiency and coding style. We also provide some possible explanations of how the features used help in determining the correlations under study.
\vspace*{.5cm}
\noindent{\bf References}:
1. Zhang, Kaizhong, and Dennis Shasha. "Simple fast algorithms for the editing distance between trees and related problems." SIAM journal on computing 18.6 (1989): 1245-1262.
2. Lau, Jey Han, and Timothy Baldwin. "An empirical evaluation of doc2vec with practical insights into document embedding generation." arXiv preprint arXiv:1607.05368 (2016).
\end{document}