-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy path96-appendixF.Rmd
112 lines (65 loc) · 5.31 KB
/
96-appendixF.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# Archive HR datasets {#appendixF}
```{r archive_HR_datasets, include=FALSE, purl=FALSE}
knitr::opts_chunk$set(tidy = FALSE, out.width = '\\textwidth')
# This bit of code is a bug fix on asis blocks, which we use to show/not show LC solutions, which are written like markdown text. In theory, it shouldn't be necessary for knitr versions <=1.11.6, but I've found I still need to for everything to knit properly in asis blocks. More info here:
# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr
library(knitr)
knit_engines$set(asis = function(options) {
if (options$echo && options$eval) knit_child(text = options$code)
})
```
The appendix describes the datasets used in this companion book.
## Gender Pay Gap {#gender_pay_gap}
The Gender Pay Gap dataset comes from the "Glassdor Research" website. It is contains the salary details for an hypothetical employer with 1,000 employees, spread across 10 job roles and 5 company departments.
The dataset can be accessed using:
"https://glassdoor.box.com/shared/static/beukjzgrsu35fqe59f7502hruribd5tt.csv"
Here are sample rows from this dataset:
```{r pay_gap_data, echo=FALSE}
data <- read.csv("https://glassdoor.box.com/shared/static/beukjzgrsu35fqe59f7502hruribd5tt.csv", stringsAsFactors=FALSE) # N = 1000 total observations
knitr::kable(head(data), "html")
```
## Overhead value analysis {#overhead}
## HR Service Desk {#service_desk_data}
There are two publicly available datasets on the HR service desk.
"https://www.ibm.com/communities/analytics/watson-analytics-blog/it-help-desk/""
"https://www.kaggle.com/lyndonsundmark/service-request-analysis/data""
The datasets can be accessed using:
"https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Fn-UseC_-IT-Help-Desk.xlsx"
Here are sample rows from this dataset:
<!-- The following 5 lines are not working, so I commented them. Hendrik -->
<!-- require(gdata) -->
<!-- servicedesk <- read.xls("https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Fn-UseC_-IT-Help-Desk.xlsx", sheet = 1, header = TRUE, method="csv") -->
<!-- knitr::kable(head(servicedesk), "html") -->
## HR recruitment, selection and performance data {#HRrecruitment}
Large dataset of selected HR applicants and performance data purchased from the Data and Sons website. Row Count: 1312450
IMPORTANT: this file was generated solely for pedagogical purposes. Due to the method of generation (R: BinOrdNonNor), it should NOT be used for research purposes. Note that the files will need to be joined in order to fully explore most relevant questions. This was intentionally left to the students to do as an exercise in order to further develop relevant skills. Selection Data Description provides a description of the variables contained in each of the remaining files.
Dataset Terms & Conditions: Creative Commons Attribution-ShareAlike 4.0 International Public License
There are two publicly available datasets on the HR service desk.
"https://www.dataandsons.com/dataset/preview/90"
## Job classification
The Job classification dataset comes from a blog article from Lyndon Sundmark. It is contains the salary details for an hypothetical employer with 1,000 employees, spread across 10 job roles and 5 company departments.
The dataset can be accessed using:
https://onedrive.live.com/?authkey=%21ABv-gHg5jVluYpc&cid=4EF2CCBEDB98D0F5&id=4EF2CCBEDB98D0F5%216440&parId=4EF2CCBEDB98D0F5%216433&o=OneUp
Here are ten sample rows from this dataset:
ID JobFamily JobFamilyDescription JobClass JobClassDescription PayGrade
-- --------- -------------------- -------- ------------------- --------
EducationLevel Experience OrgImpact ProblemSolving Supervision ContactLevel FinancialBudget PG
-------------- ---------- --------- -------------- ----------- ------------ --------------- --
## Absenteeism at work
The Abesnteeism at work dataset can be accessed from the UC Irvine Machine Learning Repository. The data set allows for several new combinations of attributes and attribute exclusions, or the modification of the attribute type (categorical, integer, or real) depending on the purpose of the research.
The dataset can be accessed using:
https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work
Here are ten sample rows from this dataset:
ID JobFamily JobFamilyDescription JobClass JobClassDescription PayGrade
-- --------- -------------------- -------- ------------------- --------
EducationLevel Experience OrgImpact ProblemSolving Supervision ContactLevel FinancialBudget PG
-------------- ---------- --------- -------------- ----------- ------------ --------------- --
The database was created with records of absenteeism at work from July 2007 to July 2010 at a courier company in Brazil.
Creators original owner and donors: Andrea Martiniano (1), Ricardo Pinto Ferreira (2), and Renato Jose Sassi (3).
E-mail address:
[email protected] (1) - PhD student;
[email protected] (2) - PhD student;
[email protected] (3) - Prof. Doctor.
Universidade Nove de Julho - Postgraduate Program in Informatics and Knowledge Management.
Address: Rua Vergueiro, 235/249 Liberdade, Sao Paulo, SP, Brazil. Zip code: 01504-001.
Website: http://www.uninove.br/curso/informatica-e-gestao-do-conhecimento/