DSChallangeRepo
diff --git a/‎CapitalOne/Part1/codetest_test.txt
Lines changed: 1001 additions & 0 deletions b/‎CapitalOne/Part1/codetest_test.txt
Lines changed: 1001 additions & 0 deletions
diff --git a/‎CapitalOne/Part1/codetest_train.txt
Lines changed: 5001 additions & 0 deletions b/‎CapitalOne/Part1/codetest_train.txt
Lines changed: 5001 additions & 0 deletions
diff --git a/‎CapitalOne/codetest_instructions.txt
Lines changed: 106 additions & 0 deletions b/‎CapitalOne/codetest_instructions.txt
Lines changed: 106 additions & 0 deletions
diff --git a/‎CivisAnalytics/CivisTest.docx.docx
8.36 KB b/‎CivisAnalytics/CivisTest.docx.docx
8.36 KB
diff --git a/‎Mattersight/documents-export-2015-11-15/CA_Crimes_Data Dictionary.csv
Lines changed: 1 addition & 0 deletions b/‎Mattersight/documents-export-2015-11-15/CA_Crimes_Data Dictionary.csv
Lines changed: 1 addition & 0 deletions
diff --git a/‎Mattersight/documents-export-2015-11-15/EXAMPLEOUTPUT_March_2015_Rankings_Last_First_v1.csv
Lines changed: 25 additions & 0 deletions b/‎Mattersight/documents-export-2015-11-15/EXAMPLEOUTPUT_March_2015_Rankings_Last_First_v1.csv
Lines changed: 25 additions & 0 deletions
diff --git a/‎Mattersight/documents-export-2015-11-15/Mattersight Data Squire Assessment Directions_20150917.docx
14.2 KB b/‎Mattersight/documents-export-2015-11-15/Mattersight Data Squire Assessment Directions_20150917.docx
14.2 KB
diff --git a/‎Mattersight/documents-export-2015-11-15/SQL Quiz.docx
13.2 KB b/‎Mattersight/documents-export-2015-11-15/SQL Quiz.docx
13.2 KB
@@ -0,0 +1,106 @@
+--------------------------------------------------------------------------------
+|                  Capital One Labs Coding Challenge                           | 
+--------------------------------------------------------------------------------
+The purpose of this test is to test your ability to write software to collect, 
+normalize, store, analyze and visualize “real world” data. The test is designed 
+to take about four hours, but it is not timed. Please try to deliver your 
+results within 24 hours.
+
+You may also use any tools or software on your computer, or that are freely 
+available on the Internet. We prefer that you use simpler tools to more complex 
+ones and that you are “lazy” in the sense of using third party APIs and 
+libraries as much as possible. (However, use of obscure, undocumented “black 
+box” libraries is discouraged.)
+
+Do as much as you can, as well as you can. Prefer efficient, elegant solutions. 
+Prefer scripted analysis to unrepeatable use of GUI tools. For data security and
+transfer time reasons, you have been given a relatively small data file. Prefer 
+solutions that do not require the full data set to be stored in memory.
+
+There is certainly no requirement that you have previous experience working on 
+these kind of problem, or that you be able to finish everything. Rather, we are 
+looking for an ability to research and select the appropriate tools for an open 
+ended problem and implement something meaningful. We are also interested in your
+ability to work on a team, which means considering how to package and deliver 
+your results in a way that makes it easy for us to review them. Undocumented 
+code and data dumps are virtually useless; commented code and a clear writeup 
+with elegant visuals are ideal. Also consider how asking targeted questions to 
+members of our team may allow you to get more done in less time.
+
+
+--------------------------------------------------------------------------------
+|         Code Test Part 1: Model building on a synthetic dataset              | 
+--------------------------------------------------------------------------------
+
+We have provided two tab-delimited files along with these instructions:
+
+    - codetest_train.txt: 5,000 records x 254 features + 1 target (~18.0MB)
+    - codetest_test.txt : 1,000 records x 254 features            (~ 3.6MB)
+
+These two synthetic datasets were generated using the same underlying data 
+model. Your goal is to build a predictive model using the data in the training 
+dataset to predict the withheld target values from the test set. 
+
+You may use any tools available to you for this task. Ultimately, we will
+assess predictive accuracy on the test set using the mean squared error metric.
+You should return to us the following:
+
+    - A 1,000 x 1 text file containing 1 prediction per line for each record
+        in the test dataset.
+
+    - A brief writeup describing the techniques you used to generate the
+        predictions. Details such as important features and your estimates of 
+        predictive performance are helpful here, though not strictly 
+        necessary.
+
+    - (Optional) An implementable version of your model. What this would look
+        like largely depends on the methods you used, but could include things
+        like source code, a pickled Python object, a PMML file, etc. Please
+        do not include any compiled executables. If you choose not to submit
+        this, please ensure your modeling methods are adequately described 
+        in the writeup.
+
+
+--------------------------------------------------------------------------------
+|                       Code Test Part 2: Baby Names!                          |
+--------------------------------------------------------------------------------
+
+In this section, you will acquire and analyze a real dataset on baby name 
+popularity provided by the Social Security Administration. To warm up, we will 
+ask you a few simple questions that can be answered by inspecting the data.
+
+A) Descriptive analysis
+
+The data can be downloaded in zip format from:
+http://www.ssa.gov/oact/babynames/state/namesbystate.zip
+
+1.  Please describe the format of the data files. Can you identify any 
+    limitations or distortions of the data?
+2.  What is the most popular name of all time? (Of either gender.)
+3.  What is the most gender ambiguous name in 2013? 1945?
+4.  Of the names represented in the data, find the name that has had the largest 
+    percentage increase in popularity since 1980. Largest decrease?
+5.  Can you identify names that may have had an even larger increase or decrease 
+    in popularity?
+
+
+B) Onward to Insight!
+
+What insight can you extract from this dataset? Feel free to combine the baby 
+names data with other publicly available datasets or APIs, but be sure to include 
+code for accessing any alternative data that you use.
+
+This is an openended question and you are free to answer as you see fit. In 
+fact, we would love it if you find an interesting way to look at the data that 
+we haven't thought of! 
+
+Please provide us with both your code and an informative writeup of your 
+results. The code should be in a runnable form. Do not assume that we have a 
+copy of the data set or that we are familiar with the build procedures for your 
+chosen language.  
+
+If you do not have time to implement your solution, a detailed, actionable 
+description of how you would attack the problem would also count in your favor.
+
+
+                                  Good luck!
@@ -0,0 +1 @@
+Column,MeaningCommunity.Area,A community area in ChicagoWeek,Week in the year (starting on Sunday)Year,YearWeeek,Week in the year (starting on Sunday)Crimes,Reported crimes for the community area for a given week in a given yearCrimes.LastWeek,Reported crimes for the community area for the previous week of a given yearArrests.LastWeek,Number of arrests for the community area for the previous week of a given yearDomestics.LastWeek,Number of domestic crimes reported for the community area for the previous week of a given yearMonth,Calendar monthMinDay,"Smallest number calendar day in the specified week (e.g. if week starts on Feb 7 and ends Feb 13, MinDay=7)"MaxDay,"Largest number calendar day in the specified week (e.g. if week starts on Feb 7 and ends Feb 13, MaxDay=13)"CommonCrimes.LastWeek,Number of common crimes (i.e. crimes with codes reprenting greater than 33% of all reported crimes) reported for the community area for the previous week of a given year
 
@@ -0,0 +1,25 @@
+Community.Area,Week,PREDICTED.RANK
+1,1,2
+2,1,3
+3,1,4
+4,1,5
+5,1,6
+6,1,1
+1,2,3
+2,2,4
+3,2,5
+4,2,6
+5,2,1
+6,2,2
+1,3,4
+2,3,5
+3,3,6
+4,3,1
+5,3,2
+6,3,3
+1,4,5
+2,4,6
+3,4,1
+4,4,2
+5,4,3
+6,4,4
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	+Column,MeaningCommunity.Area,A community area in ChicagoWeek,Week in the year (starting on Sunday)Year,YearWeeek,Week in the year (starting on Sunday)Crimes,Reported crimes for the community area for a given week in a given yearCrimes.LastWeek,Reported crimes for the community area for the previous week of a given yearArrests.LastWeek,Number of arrests for the community area for the previous week of a given yearDomestics.LastWeek,Number of domestic crimes reported for the community area for the previous week of a given yearMonth,Calendar monthMinDay,"Smallest number calendar day in the specified week (e.g. if week starts on Feb 7 and ends Feb 13, MinDay=7)"MaxDay,"Largest number calendar day in the specified week (e.g. if week starts on Feb 7 and ends Feb 13, MaxDay=13)"CommonCrimes.LastWeek,Number of common crimes (i.e. crimes with codes reprenting greater than 33% of all reported crimes) reported for the community area for the previous week of a given year
-Original file line number
+Diff line change
@@ @@ -0,0 +1,25 @@ @@
 +Community.Area,Week,PREDICTED.RANK
 +1,1,2
 +2,1,3
 +3,1,4
 +4,1,5
 +5,1,6
 +6,1,1
 +1,2,3
 +2,2,4
 +3,2,5
 +4,2,6
 +5,2,1
 +6,2,2
 +1,3,4
 +2,3,5
 +3,3,6
 +4,3,1
 +5,3,2
 +6,3,3
 +1,4,5
 +2,4,6
 +3,4,1
 +4,4,2
 +5,4,3
 +6,4,4