|
| 1 | +-------------------------------------------------------------------------------- |
| 2 | +| Capital One Labs Coding Challenge | |
| 3 | +-------------------------------------------------------------------------------- |
| 4 | +The purpose of this test is to test your ability to write software to collect, |
| 5 | +normalize, store, analyze and visualize “real world” data. The test is designed |
| 6 | +to take about four hours, but it is not timed. Please try to deliver your |
| 7 | +results within 24 hours. |
| 8 | + |
| 9 | +You may also use any tools or software on your computer, or that are freely |
| 10 | +available on the Internet. We prefer that you use simpler tools to more complex |
| 11 | +ones and that you are “lazy” in the sense of using third party APIs and |
| 12 | +libraries as much as possible. (However, use of obscure, undocumented “black |
| 13 | +box” libraries is discouraged.) |
| 14 | + |
| 15 | +Do as much as you can, as well as you can. Prefer efficient, elegant solutions. |
| 16 | +Prefer scripted analysis to unrepeatable use of GUI tools. For data security and |
| 17 | +transfer time reasons, you have been given a relatively small data file. Prefer |
| 18 | +solutions that do not require the full data set to be stored in memory. |
| 19 | + |
| 20 | +There is certainly no requirement that you have previous experience working on |
| 21 | +these kind of problem, or that you be able to finish everything. Rather, we are |
| 22 | +looking for an ability to research and select the appropriate tools for an open |
| 23 | +ended problem and implement something meaningful. We are also interested in your |
| 24 | +ability to work on a team, which means considering how to package and deliver |
| 25 | +your results in a way that makes it easy for us to review them. Undocumented |
| 26 | +code and data dumps are virtually useless; commented code and a clear writeup |
| 27 | +with elegant visuals are ideal. Also consider how asking targeted questions to |
| 28 | +members of our team may allow you to get more done in less time. |
| 29 | + |
| 30 | + |
| 31 | +-------------------------------------------------------------------------------- |
| 32 | +| Code Test Part 1: Model building on a synthetic dataset | |
| 33 | +-------------------------------------------------------------------------------- |
| 34 | + |
| 35 | +We have provided two tab-delimited files along with these instructions: |
| 36 | + |
| 37 | + - codetest_train.txt: 5,000 records x 254 features + 1 target (~18.0MB) |
| 38 | + - codetest_test.txt : 1,000 records x 254 features (~ 3.6MB) |
| 39 | + |
| 40 | +These two synthetic datasets were generated using the same underlying data |
| 41 | +model. Your goal is to build a predictive model using the data in the training |
| 42 | +dataset to predict the withheld target values from the test set. |
| 43 | + |
| 44 | +You may use any tools available to you for this task. Ultimately, we will |
| 45 | +assess predictive accuracy on the test set using the mean squared error metric. |
| 46 | +You should return to us the following: |
| 47 | + |
| 48 | + - A 1,000 x 1 text file containing 1 prediction per line for each record |
| 49 | + in the test dataset. |
| 50 | + |
| 51 | + - A brief writeup describing the techniques you used to generate the |
| 52 | + predictions. Details such as important features and your estimates of |
| 53 | + predictive performance are helpful here, though not strictly |
| 54 | + necessary. |
| 55 | + |
| 56 | + - (Optional) An implementable version of your model. What this would look |
| 57 | + like largely depends on the methods you used, but could include things |
| 58 | + like source code, a pickled Python object, a PMML file, etc. Please |
| 59 | + do not include any compiled executables. If you choose not to submit |
| 60 | + this, please ensure your modeling methods are adequately described |
| 61 | + in the writeup. |
| 62 | + |
| 63 | + |
| 64 | +-------------------------------------------------------------------------------- |
| 65 | +| Code Test Part 2: Baby Names! | |
| 66 | +-------------------------------------------------------------------------------- |
| 67 | + |
| 68 | +In this section, you will acquire and analyze a real dataset on baby name |
| 69 | +popularity provided by the Social Security Administration. To warm up, we will |
| 70 | +ask you a few simple questions that can be answered by inspecting the data. |
| 71 | + |
| 72 | +A) Descriptive analysis |
| 73 | + |
| 74 | +The data can be downloaded in zip format from: |
| 75 | +http://www.ssa.gov/oact/babynames/state/namesbystate.zip |
| 76 | + |
| 77 | +1. Please describe the format of the data files. Can you identify any |
| 78 | + limitations or distortions of the data? |
| 79 | +2. What is the most popular name of all time? (Of either gender.) |
| 80 | +3. What is the most gender ambiguous name in 2013? 1945? |
| 81 | +4. Of the names represented in the data, find the name that has had the largest |
| 82 | + percentage increase in popularity since 1980. Largest decrease? |
| 83 | +5. Can you identify names that may have had an even larger increase or decrease |
| 84 | + in popularity? |
| 85 | + |
| 86 | + |
| 87 | +B) Onward to Insight! |
| 88 | + |
| 89 | +What insight can you extract from this dataset? Feel free to combine the baby |
| 90 | +names data with other publicly available datasets or APIs, but be sure to include |
| 91 | +code for accessing any alternative data that you use. |
| 92 | + |
| 93 | +This is an openended question and you are free to answer as you see fit. In |
| 94 | +fact, we would love it if you find an interesting way to look at the data that |
| 95 | +we haven't thought of! |
| 96 | + |
| 97 | +Please provide us with both your code and an informative writeup of your |
| 98 | +results. The code should be in a runnable form. Do not assume that we have a |
| 99 | +copy of the data set or that we are familiar with the build procedures for your |
| 100 | +chosen language. |
| 101 | + |
| 102 | +If you do not have time to implement your solution, a detailed, actionable |
| 103 | +description of how you would attack the problem would also count in your favor. |
| 104 | + |
| 105 | + |
| 106 | + Good luck! |
0 commit comments