-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathSlides1.Rpres
97 lines (62 loc) · 2.97 KB
/
Slides1.Rpres
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
DSCI401
========================================================
author: Dr. Gregory J. Matthews, Ph.D.
date: Today
autosize: true
A Question
===
[***What is Data Science?***](https://mdsr-book.github.io/mdsr2e/ch-prologue.html#what-is-data-science)
What is Data Science?
====
1. It's a science!
- "a rigorous discipline combining elements of statistics and computer science, with roots in mathematics"
2. Domain knowledge
- Business, Public Health, Sports, Astronomy, etc.
3. Convert data to information
- "the distinction between data and information is the raison d’etre of data science. Data scientists are people who are interested in converting the data that is now abundant into actionable information that always seems to be scarce"
What is this class?
===
Question - What are we going to do in this class?
Answer - Everything
Languages
========================================================
What languages are we going to use?
- [R](https://www.r-project.org/)
- [python](https://www.python.org/)
IDEs
========================================================
- [R Studio](https://www.rstudio.com/)
- [Google Colab](https://colab.research.google.com)
Reproducible Research
========================================================
- Source Control - [Github](github.com)
- Reproducible documentation - [R Markdown](https://rmarkdown.rstudio.com/), [knitr](https://www.r-project.org/nosvn/pandoc/knitr.html), etc.
Topics
============
1\. [Basics of R](https://github.com/gjm112/DSCI401/blob/main/R/Basics_of_R.R) and [Python](https://colab.research.google.com/drive/1yMjxaCJkCxVzF30IcqI6K9C8n2lKjqUm)
1a\. Python objects: [Sets and Dictionaries](https://colab.research.google.com/drive/1nmWys8jdgJfpGyZsmJr-bdMtxkufClEl), [Strings and Tuples](https://colab.research.google.com/drive/1J87N_1NerOLBLz4zxFL42nIsj5GmoNEt), [Pandas](https://colab.research.google.com/drive/1N22wXpWLv3SKZ2WxU1Ntjgv63LkUz01G#scrollTo=lQ6cEWgiry5S)
1b\. [R Objects: Vectors, Matrices, Arrays, Lists, and Data Frames](https://github.com/gjm112/DSCI401/blob/main/R/vectors_lists_dataframes.R)
2\. [Data Wrangling - Single Table](https://github.com/gjm112/DSCI401/blob/main/R/Data_wrangling_on_one_table.Rmd)
3a\. [Data Visualization in R](https://github.com/gjm112/DSCI401/blob/main/R/Data_visualizations.Rmd)
3b\. [Data Visualization in Python](https://colab.research.google.com/drive/1-LbqqSotHupNGd507kn9bP4Ywv4M3hXM)
4\. [Data Wrangling - Multiple Tables](https://github.com/gjm112/DSCI401/blob/main/R/Data_wrangling_on_multiple_tables.Rmd)
5\. [Working with tidy data](https://github.com/gjm112/DSCI401/blob/main/R/tidy_data.Rmd)
6\. Data Ethics
Topics
============
7\. Statistical concepts
- What is statistics?
- Sampling Distributions
- Bootstrapping
- Statistical Modeling
8\. Intro to Predictive Modeling
9\. Supervised and Unsupervised Learning
10\. Simulation
Topics
============
12\. Interactive Data Visualization
- Shiny
- plotly
- matplotlib
- seaborn
13\. SQL and databases