|
| 1 | +# Data Science for Business |
| 2 | +A vast majority of the knowledge is from this [book](https://www.amazon.com/Data-Science-Business-Data-Analytic-Thinking-ebook/dp/B00E6EQ3Xs). |
| 3 | + |
| 4 | +## Main structure of data science at workplace |
| 5 | +The Cross Industry Standard Process for Data Mining |
| 6 | +See [wiki](https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining). |
| 7 | + |
| 8 | +Center of data mining: automated pattern, knowledge and regularities discovery. |
| 9 | + |
| 10 | +### Classic Tasks |
| 11 | + |
| 12 | +* Classification |
| 13 | +* Regression |
| 14 | +* Similarity matching |
| 15 | +* Clustering |
| 16 | +* Co-occurrence grouping |
| 17 | +* Profiling |
| 18 | +* Link Prediction |
| 19 | +* Data Reduction |
| 20 | +* Causal modeling |
| 21 | + |
| 22 | + |
| 23 | +### Phases of CISP-DM |
| 24 | + |
| 25 | +* Business Understanding: Formulate the business problem to unambiguous data mining problems |
| 26 | + * What exactly do we want to do? |
| 27 | + * How exactly would we do it? |
| 28 | + * What parts of this use scenario constitute possible data mining models? |
| 29 | +* Data Understanding |
| 30 | + * How reliable is the data for our task? |
| 31 | + * What is the cost of getting data? |
| 32 | + * How the data affects our approach? Note that superficially similar tasks could have distinct approaches due to different data available. |
| 33 | + * Business understanding + data understanding determines possible solutions. |
| 34 | +* Data Preparation |
| 35 | + * creative, sensible and business minded varialble crafting |
| 36 | + * systematic data processing/clearning |
| 37 | + * Pay Special Attention to Data Leakage |
| 38 | +* Modeling |
| 39 | + * Most technical and scietific part. Others are arts. (joking). |
| 40 | +* Evaluation: in business context, not in the lab. |
| 41 | + * Quantitative and qualitative assessments |
| 42 | + * Stakeholders considerations: pros and cons |
| 43 | + * Comprehensibility of model, or how to making the model more comprehensible? |
| 44 | + * Do this *Before the deployment* |
| 45 | + * How susceptible is the model to the changing behaviour of data source? |
| 46 | + * The model is what developers build (advisable to include them in data science projects) |
| 47 | + |
| 48 | + |
| 49 | +### Side Remark: Managing a data science team |
| 50 | +* Data science tasks are exploratory undertaking in nature and is closer to research and development than it is to engineering. |
| 51 | +* Iterates on approaches and strategy rather than software designs |
| 52 | +* Outcomes are far less certain |
| 53 | +* Results of each step change change the understandings of problems |
| 54 | +* Do not engineeting solution directly for deployment: most of the efforts should go to analytical testings, pilot studies and thowaway prototypes to reduce risks. |
| 55 | +* In building a data science team, the most important qualities are: |
| 56 | + * Formulate problems well |
| 57 | + * Making reasonable assumptions if face of ill-structured problems |
| 58 | + * Prototype solutions quickly |
| 59 | + * Design Experiments that represent good investments |
| 60 | + * Ability to analyze the results |
| 61 | + * NOT traditional software engineering expertise |
| 62 | + |
| 63 | +### Related Skills |
| 64 | +* Statistics |
| 65 | +* Querying Database |
| 66 | +* Data Warehousing |
| 67 | +* Machine Learning or Applied Statistics or Pattern Recognition |
| 68 | +* Answer Business Questions with These Techniques |
| 69 | + * Who are the most profitable customers? Querying DB |
| 70 | + * Is there really a difference between the profitable customers and the average customer? Hypothesis Testing |
| 71 | + * However, who really are these customers? Can I characterize them? Find pattern that differentiate profitable customers from unprofitable ones. |
| 72 | + * Will some particular new customer be profitable? How much revenue should I expect this customer to generate? |
| 73 | + |
| 74 | +### Summary |
| 75 | +* There are fields of study closely related to data science, each task type serves different purpose and has an associated set of solution techniques |
| 76 | +* Data Scientist combine these components |
| 77 | +* A successful data project involves an intelligent compromise between what the data can do and project goals |
| 78 | +* Need to keep in mind how data mining results will be used and use this to inform the data mining process itself. |
| 79 | + |
| 80 | + |
0 commit comments