Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulling bmarketing package after finishing Day 2 training. #4

Open
wants to merge 77 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
0ec1dc7
added data type corrections
May 8, 2019
e0bc84a
Changing some factor variables to numeric.
nandakallugjeri May 8, 2019
25e8689
2nd commit.
nandakallugjeri May 8, 2019
ef6e54a
r
nandakallugjeri May 8, 2019
4cc55db
test
nandakallugjeri May 8, 2019
be82170
reverting to patrick version of file.
nandakallugjeri May 8, 2019
04dbffc
first converting to char then to numeric
May 8, 2019
823a371
adding as.character()
nandakallugjeri May 8, 2019
05d47e2
t
nandakallugjeri May 8, 2019
18110fd
Initial commit for the new package.
nandakallugjeri May 8, 2019
e56f235
removing bmarketing2
nandakallugjeri May 8, 2019
cd22bc4
added package folders and moved R scripts to /R anda csv to /data
sibidev May 8, 2019
2daea2e
update path
sibidev May 8, 2019
14fe2d5
cleanimport function
sibidev May 8, 2019
501cbb1
Delete cleanimport.R
sibidev May 8, 2019
b4c438a
newly created model functions
May 8, 2019
0b27ae2
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
May 8, 2019
28e4474
transformation functions added
DianaRalcheva May 8, 2019
51eae2b
small changes
sibidev May 8, 2019
4aa6c6c
added description
sibidev May 8, 2019
d730b5a
added documentation for model functions
May 8, 2019
5d0ff71
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
May 8, 2019
68b6798
adding "Depends" to description file
May 8, 2019
35142dd
Adding Cleaning Functions.
nandakallugjeri May 8, 2019
8748735
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
nandakallugjeri May 8, 2019
07e3bc8
adding imports
May 8, 2019
d47f9a0
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
May 8, 2019
13aa4e0
Delete bmarketing.R
sibidev May 8, 2019
d36e051
oveview changes
May 8, 2019
df8001e
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
May 8, 2019
50c6d3a
Bug fixed, checking if trere are NAs in target variable
MarkoBarzic May 8, 2019
9f14bd8
tests
sibidev May 8, 2019
2f43cd7
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
sibidev May 8, 2019
c5fc973
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
MarkoBarzic May 8, 2019
8ca351d
chNGE Var to x
DianaRalcheva May 9, 2019
3c8e635
fixing var again
DianaRalcheva May 9, 2019
e4c3bf3
transformation documentation fix
DianaRalcheva May 9, 2019
e6b3117
exporting the transformation functions
DianaRalcheva May 9, 2019
478ca4a
export added for dtree functions
May 9, 2019
693af98
namespace conflict
May 9, 2019
204ef56
fixed issue #3 : other cases than numeric/factors also handled
sibidev May 9, 2019
a17745c
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
May 9, 2019
c2cb943
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
May 9, 2019
1e18a3c
fixed tests for checkNA
sibidev May 9, 2019
2bb6310
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
sibidev May 9, 2019
3ee2bc6
merge transform
DianaRalcheva May 9, 2019
f765c72
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
DianaRalcheva May 9, 2019
1ea8592
dtree export added
DianaRalcheva May 9, 2019
2cf757a
namespace export for dtree added
DianaRalcheva May 9, 2019
8366c32
cleanData function fixed. #4
nandakallugjeri May 9, 2019
4e85038
NAs can be replaced by mean
sibidev May 9, 2019
6b1d52e
added standardize function
sibidev May 9, 2019
bccd130
generating sensible README file
May 9, 2019
9c693cd
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
sibidev May 9, 2019
1427daf
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
sibidev May 9, 2019
7b6a5a9
export of clean functions added
DianaRalcheva May 9, 2019
e00d110
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
DianaRalcheva May 9, 2019
febe940
Adding documantation for CleanData
nandakallugjeri May 9, 2019
b1c5237
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
nandakallugjeri May 9, 2019
c9e4963
added standardize
May 9, 2019
0b5d8e1
CleanData update: imputing MISSING values to categorical variables
MarkoBarzic May 9, 2019
9d5d816
dataClean, added warning that imputation took place.
MarkoBarzic May 9, 2019
ce1ae53
included tests for replacing NAs
sibidev May 9, 2019
e993df8
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
sibidev May 9, 2019
8e9a3eb
formatting
sibidev May 9, 2019
8faa81a
lot of changes
May 9, 2019
aea7888
added logit
May 9, 2019
2953fa6
auto added imports
sibidev May 9, 2019
fce556f
Merge branch 'master' of https://github.com/nandakallugjeri/bmarketing
sibidev May 9, 2019
51408d5
import glm
sibidev May 9, 2019
94196c9
description
May 9, 2019
1d31ff8
dtree change
May 9, 2019
e18995e
change in dtree - repairing the function
May 9, 2019
12adba1
updating the documentation
May 9, 2019
56631b4
changing logit, small changes in README and DESCRIPTION files
May 9, 2019
592afd9
adding plot
May 9, 2019
6b654e1
adding plot, hopefully for real
May 9, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.Rproj.user
.Rhistory
.RData
.Ruserdata
18 changes: 18 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Package: bmarketing
Title: package for some marketing data analysis
Version: 0.0.1
Authors@R:
person(given = "Group 2",
family = "GoIT DS",
role = c("aut", "cre"),
email = "[email protected]",
comment = c(ORCID = "YOUR-ORCID-ID"))
Description: decision tree model to predict if customer signing a term deposit
License: MIT + file LICENSE
Encoding: UTF-8
LazyData: true
Depends:
rpart.plot
Imports:
rpart
RoxygenNote: 6.1.1
2 changes: 2 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
YEAR: 2019
COPYRIGHT HOLDER: patrick sibetz
18 changes: 18 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Generated by roxygen2: do not edit by hand

export(checkNA)
export(cleanData)
export(dtree)
export(dtreeperf)
export(dtreeplot)
export(dtreepredict)
export(dtreesummary)
export(logit)
export(standardize)
export(trans)
export(translog)
importFrom(rpart,rpart)
importFrom(stats,glm)
importFrom(stats,na.omit)
importFrom(stats,predict)
importFrom(stats,sd)
22 changes: 22 additions & 0 deletions R/checkNA.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#' Checking for NA values considering all the variables present in the dataset
#'
#' @param ds dataframe object
#'
#'
#' @return
#' Return an error message indicating how many columns have NAs present, in case there is any.
#'
#' @export

checkNA <- function(ds){
# Check if any NA's are found in the whole dataset.
newdata <- na.omit(ds)


if( nrow(newdata)==nrow(ds) )
print('No empty records found in the dataframe')
else
print( paste('There are',nrow(ds)-nrow(newdata), 'rows having NAs' ))

}

68 changes: 68 additions & 0 deletions R/cleanData.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#' Checking the quality of data.
#'
#' Check if Target variable for the model has NA values.
#' It also checks all the variables for NAs and removes the column if more then half of the values are NAs.
#'
#' @param ds dataframe object
#' @param targetVar name of the target variable of the dataframe \code ds
#' @param replaceNAs True or False, if you want to replace NAs with mean value.
#'
#' @return
#' Return the cleaned dataframe and prints warning messages in case there are any.
#'
#' @importFrom stats sd predict na.omit
#'
#' @export
cleanData <- function( ds, targetVar, replaceNAs=FALSE ){

#Firstly checking if the target Variable parameter actually exists inside the dataframe.
if (any( colnames(ds) == targetVar ) == FALSE)
stop( paste(targetVar, "variable not part of the dataframe passed"))

#Secondly checking if target variable contains any NA's values.
if(length(which(is.na(ds$targetVar)==TRUE)>0)){
stop("Missing Value found in the target column")
} else{
print("Target Variable looks clean. No NA values")
}

#Thirdly checking if there is any NA values inside the dataframe. /
# and replace iff replaceNAs=TRUE NAs with mean
if(replaceNAs){
for(i in 1:dim(ds)[2]){
if(is.numeric(ds[,i])){
ds[is.na(ds[,i]),i] <- mean(ds[,i], na.rm = TRUE)
warning(paste(colnames(ds)[i],"has been imputed with mean values."))
}else{
ds[is.na(ds[,i]),i]<-"MISSING"
warning(paste("For variable",colnames(ds)[i],"category MISSING was defined."))
}
}
}else {
checkNA(ds)

#Forth, find which columns contain NAs and remove these columns in case more then half of the values
#are NAs.

# flag[1:dim(ds)[2]]<-FALSE
flag <- c(logical(dim(ds)[2]) )

for (i in 1:dim(ds)[2]){

if((nrow(ds)-nrow(na.omit(ds[i])))/nrow(ds)>0.5)
{
warning(paste(colnames(ds)[i],"has more than half NA's, and was excluded from the sample"));
flag[i]<-TRUE
}
if(flag[i]==FALSE && any(is.na(ds[i])))warning(paste(colnames(ds)[i],"has NA values!"))

}

ds<-ds[!flag]

}

#Return the cleaned dataframe.
return(ds)

}
66 changes: 66 additions & 0 deletions R/dtree.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#' dtree
#' @description fits the decision tree model based on given parameters
#' @param udata input data for the model.
#' @param target the name of the target variable from the input data (string) for example: target_name="Y"
#' @return A decision tree model.
#' @importFrom rpart rpart
#'
#' @export

dtree <- function(udata,target) {
rpart(as.formula(paste(target, "~ .")),data=udata,model=TRUE)
}


#' dtreesummary
#' @description returns the summary of the chosen model
#' @param dt_model model that we want to summarize
#' @return a summary. :)
#'
#' @export
#'
dtreesummary<-function(dt_model){
summary(dt_model)
}

#' dtreeplot
#' @description returns the plot of the chosen decision tree model
#' @param dt_model model that we want to plot
#' @return a plot. :)
#'
#' @export
#'
dtreeplot<-function(dt_model) {
rpart.plot(dt_model)
}

#' dtreepredict
#' @description returns the prediction
#' @param dt_model model that we want to use to generate our predictions
#' @param predictdata data that we want to score
#' @return predictions
#'
#' @export
#'
dtreepredict<-function(dt_model,predictdata){

predictions <- predict(dt_model, predictdata, type = "class")
predictions
}


#' dtreeperf
#' @description checks model accuracy
#' @param target actual target in our data
#' @param predictions predicted target
#' @return accuracy
#'
#' @export
#'
dtreeperf<-function(target,predictions){
accuracy<-mean(target == predictions)
accuracy
}



16 changes: 16 additions & 0 deletions R/logit.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#' logit
#' @description fits the logit model based on
#'
#' @param target the name of the target variable from the input data (string)
#' @param udata Input data for the model.
#'
#' @return logit model.
#'
#' @importFrom stats glm
#'
#' @export
logit<-function(udata,target){
glm(as.formula(paste(target,"~.")) , data=udata,family = "binomial")
}


45 changes: 45 additions & 0 deletions R/transform.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#' translog
#'
#' A simple log transformation
#'
#' @param x A numeric vector
#'
#' @examples
#' translog(exp(rnorm(7)))
#'
#' @export
translog<-function(x){
if ( !is.numeric(x) ) stop("Input must be numeric!")
if ( any(x<0) ) stop("Input must not be negative!")
x <- log(x)
}

#' trans
#'
#' A class transformation, wich converts numeric to factor and factor to numeric
#'
#' @param x A numeric or factor data
#'
#' @export
trans <- function(x){
if (is.numeric(x)){
x <- as.factor(x)
} else if (is.factor(x)) {
x <- as.numeric(as.character(x))
}
}


#' standardize
#'
#' standardize or normalize the range of independent variables or features of data
#'
#' @param x numeric data
#'
#' @export
standardize <- function(x){
if(!is.numeric(x)) stop("Input must be numeric!")
x <- (x - mean(x)) / sd(x)
x
}

44 changes: 37 additions & 7 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,50 @@ output: github_document

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# bmarketing

[![Travis Build Status](https://travis-ci.org/Quantargo/bmarketing.svg?branch=master)](https://travis-ci.org/Quantargo/bmarketing)
[![Coverage Status](https://img.shields.io/codecov/c/github/Quantargo/bmarketing/master.svg)](https://codecov.io/github/Quantargo/bmarketing?branch=master)
<!-- badges: start -->
<!-- badges: end -->

## Overview
The goal of bmarketing package is to create decision tree model and use it to generate predictions based on provided dataset. It is also able to clean the dataset before creating model or making predictions.

The bmarketing dataset
## Installation


You can install the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("nandakallugjeri/bmarketing")
```
## Functions

Funcion documentation is accessible through ?functionname. For example, run
?checkNA to see the documentation for checkNA function.
\item checkNA
\item cleanData
\item dtree
\item dtreeplot
\item dtreesummary
\item dtreepredict
\item dtreeperf
\item translog
\item trans
\item standardize
\item logit

```{r}
library(bmarketing)
mytree<-dtree(bmarketing,"y")
dtreeplot(mytree)
```

<!-- TODO: Change README to make it more descriptive, add examples, etc. -->

61 changes: 53 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,59 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->

[![Travis Build
Status](https://travis-ci.org/Quantargo/bmarketing.svg?branch=master)](https://travis-ci.org/Quantargo/bmarketing)
[![Coverage
Status](https://img.shields.io/codecov/c/github/Quantargo/bmarketing/master.svg)](https://codecov.io/github/Quantargo/bmarketing?branch=master)
# bmarketing

## Overview
<!-- badges: start -->

The bmarketing
dataset
<!-- badges: end -->

<!-- TODO: Change README to make it more descriptive, add examples, etc. -->
The goal of bmarketing package is to create decision tree model and use
it to generate predictions based on provided dataset. It is also able to
clean the dataset before creating model or making predictions.

## Installation

You can install the development version from
[GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("nandakallugjeri/bmarketing")
```

## Functions

Funcion documentation is accessible through ?functionname. For example,
run ?checkNA to see the documentation for checkNA function.

checkNA

cleanData

dtree

dtreeplot

dtreesummary

dtreepredict

dtreeperf

translog

trans

standardize

logit

``` r
library(bmarketing)
#> Loading required package: rpart.plot
#> Loading required package: rpart
mytree<-dtree(bmarketing,"y")
dtreeplot(mytree)
```

<img src="man/figures/README-unnamed-chunk-2-1.png" width="100%" />
Loading