-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log transform for US$ variables #13
Comments
I think in my pre-pummeler attempt at this I did |
don't understand... 1+sign(x)? I just looked through the codebook more carefully. Most (all?) of these are truncated below ("Rounded & bottom-coded") so I think something like my solution actually makes sense. Sure, it won't be a normal distribution, but if we're featurizing using KDE than it'll just have a weird bump in the lower tail. Of course my solution doesn't work when x = min(x) so I guess now I'm proposing:
|
I was a little off before: what I want is |
OK, finally went through case-by-case using the sampled data. Here are the only two monetary variables that I found that can actually be negative:
So maybe we just do categorical variables for whether INTP/SEMP are non-zero? But I still don't know what transform to use for positive / negative. Here are our two proposals, neither looks great: |
Update: forgot about Also what's |
IIRC |
Here's the variables I think we should log transform, all representing income/wages/etc.
Only issue is that some of these variables can be negative (for losses). So I guess the transformation for those should be x = log(x - min(x)) or something?
Once we figure that out it should be easy to put this into get_dummies.
The text was updated successfully, but these errors were encountered: