This is the replication package for our literature study.
It contains files used to generate graphs or help us to classify elements. We also describe the files in Google Drive here.
Here you find the following files:
Literature Study: contains all the raw data for literature studyQuery: All papers we identified through our search query. A legend is at the bottom of the sheet for the color of the row. This is a result of excluding a paper later in the process and is more relevant for us than you. The sheet includes selection criteria and whether we accepted the paper or not.Snowballing Forward Iteration 1: All papers we identified from the kept papers of theQuerysheet through forward snowballing. The sheet includes selection criteria and whether we accepted the paper or not.Snowballing Back Iteration 1: All papers we identified from the kept papers of theQuerysheet through backward snowballing.Data Extraction Back: For backwards snowballing, the sheet includes selection criteria and whether we accepted the paper or not. The other sheet was too messy.RQ1_Languages: Extracted information for RQ1RQ2_XLLs: Extracted information for RQ2RQ3_Methods: Extracted information for RQ3RQ4_Requirements: Extracted information for RQ4- Originally, we also planned to assess the quality of the papers. Therefore, for some papers, we have information about that in
Quantitative_Final_Try_QueryandQualitative_Final_Try_Query. It's neither complete nor used in the paper.
LiteratureStudyGraph: Graph used to show high-level process of literature studyrq2.drawio: Graph used to form categories in paper. Started out with 1 bubble per row, then we unified it iteratively.rq3.drawio: Graph used to form categories in paper. Started out without colors. We iteratively grouped the bubbles through coloring.rq4.drawio: Graph used to form categories in paper. Basically same process as others. Note that the tababstractioncontains the graph used in the paper, whileclassificationcontains our detailed version.
Note that all IDs in the graph correspond to the row number in the corresponding Literature Study tab.
On the top level, you find Jupyter notebooks that contain the code needed to transform data and generate plots. We describe them in detail below.
Next to them, the folders have the following purpose:
data: 1:1 export of the RQ1 and RQ4 data from theLiterature Studyfile to be able to process it heregenerated: data generated by the notebooks. Not modified by the authors. For RQ1, this also contains the other two graphs mentioned in the literature study.annotated_from_generated: Manual classification of the languages based on the generated file ingenerated/rq1. This data exists only here and not in Google Drive.util: Two scripts used to manipulate generated LaTeX files to our liking (only styling and comments).
Order of execution:
rq1_studiedLanguages: Used for parsing the raw file from Google Drive. Performs some sanity checks and outputs list of XLLs we can use later.rq1_edgeGeneration: Transforms XLLs into properly annotated dataframe. Used content ofannotated_from_generatedfor that.rq1_buildHeatMap: Generates heatmaps used in paper.
Note that you can change the variable category in rq1_edgeGeneration to change the
grouping of the heatmap created in the next step.
Also note, that we originally planned to include some descriptive statistics in the paper. We also wanted to create a hierarchical edge bundling graph instead of the heatmap using R. Much of the code you see is a remainder of that, meaning that there can be a lot of unused files/structures. Sorry for the confusion.
This script translates rq2.drawio into Python code.
We manually transferred all the IDs from the graph into Python.
Then, we use the script to generate a nice textual representation of the sets to pass to
deepvenn.com to generate our Venn diagram.
Note, that we played around with the granularity of the diagram. Therefore, the code, again, expresses more detail than the paper shows, because we found that the Venn diagrams become unreadable with too many details.
We were too lazy to type out all the requirements in DrawIO. Therefore, we generated information in a format DrawIO accepts as input. We then uploaded it there and started creating our graph there.
This notebook just transforms the data.