-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't need map() for summarizing models #33
Comments
This is to me beautiful, simple and logical. Split the dataset by cyl. For each part, run the following functions: run model, extract model fit, and finally bind it all together, while preserving name of splitted variable in output. Oh, and all the output one could need. library(dplyr)
library(broom)
mtcars %>%
group_by(cyl) %>%
group_map(.f=~lm(mpg ~ wt, data=.x) %>% glance()) %>%
bind_rows(.id = "cyl")
#> # A tibble: 3 × 13
#> cyl r.squared adj.r…¹ sigma stati…² p.value df logLik AIC BIC devia…³
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0.509 0.454 3.33 9.32 0.0137 1 -27.7 61.5 62.7 99.9
#> 2 2 0.465 0.357 1.17 4.34 0.0918 1 -9.83 25.7 25.5 6.79
#> 3 3 0.423 0.375 2.02 8.80 0.0118 1 -28.7 63.3 65.2 49.2
#> # … with 2 more variables: df.residual <int>, nobs <int>, and abbreviated
#> # variable names ¹adj.r.squared, ²statistic, ³deviance Created on 2022-08-06 by the reprex package (v2.0.1) |
A point I make in the essay that intermediate steps are GOOD for beginning coders, the group who my essay focuses on. |
I'm a tidyverse advocate. But I have to concede that by(
mtcars,
mtcars$cyl,
\(x) summary(lm(mpg ~ wt, data = x))
) And use S3 class for make list with customized print methods (like If you want the results in a Anyway, here is the code: by(
mtcars,
mtcars$cyl,
\(x) {
res <- summary(lm(mpg ~ wt, data = x))
c(
res[c("r.squared", "adj.r.squared")],
"fstatistic"=res$fstatistic[1],
"p-value"=res$coefficients[2, "Pr(>|t|)"]
)
}
) |> array2DF()
# output
# mtcars$cyl r.squared adj.r.squared fstatistic.value p-value
# 1 4 0.5086326 0.4540362 9.316233 0.01374278
# 2 6 0.4645102 0.3574122 4.337245 0.09175766
# 3 8 0.4229655 0.3748793 8.795985 0.01179281 |
The admisc::using(
warpbreaks,
coef(lm(breaks ~ wool)),
split.by = tension
)
# (Intercept) woolB
# L 44.556 -16.333
# M 24.000 4.778
# H 24.556 -5.778 |
People keep forgetting the central point of my Tidyverse Skeptic essay: The Tidyverse is an awful environment for R learners who lack coding background. This discussion here, which debates whether one complex Tidyverse solution is better than another, ignores that basic fact. |
Right, but the evidence you point to to support your claim in this section of the readme is (purposefully?) misleading. Shouldn't you want to correct that? Your "evidence" here is specifically that in order to use the tidyverse to solve this problem, you must use 3 "different"1
That is incorrect. You need not use any. In fact, it is not recommended that you use any. Footnotes
|
May I ask that you not use words like "purposely"? Among other things, it
seems to imply that I have some sort of enmity towards the Tidyverse
people, which is certainly not the case. I try to dispel that notion in the
introduction.
Your saying that is recommended not to use map functions is quite
interesting, and seems to be at odds with everything I've heard the
Tidyverse people say, including Hadley, as well as general functional
programming tenets. Sounds like you might have references that I might look
at. Please provide links, thanks.
…On Mon, Feb 24, 2025, 1:43 PM Eric R. Scott ***@***.***> wrote:
Right, but the evidence you point to to support your claim in this section
of the readme is (purposefully?) misleading. Shouldn't you want to correct
that? Your "evidence" here is specifically that in order to use the
tidyverse to solve this problem, you *must* use 3 "different"1
<#m_-8589839883679646234_user-content-fn-1-a0e325d37bc6ad1a1c168b715af6ed71>
map() functions.
The R learner here must learn two different FP map functions for this
particular example. This is an excellent example of Tidy's cognitive
overload problem.
That is incorrect. You need not use any. In fact, it is not recommended
that you use any.
Footnotes
1.
This is also misleading in two ways. First, map(), and map_dbl() have
the exact same interface and only differ in what they return so there is
very little to learn here, by design. Second, even if you did *have*
to learn map(), you don't need map_dbl() there! Just use map() and get
your output as a list! ↩
<#m_-8589839883679646234_user-content-fnref-1-a0e325d37bc6ad1a1c168b715af6ed71>
—
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZ34ZLMRSMZ6WIMBSPTOFD2ROG7DAVCNFSM6AAAAABXWWRYUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZZG4YTOOBZGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: Aariq]*Aariq* left a comment (matloff/TidyverseSkeptic#33)
<#33 (comment)>
Right, but the evidence you point to to support your claim in this section
of the readme is (purposefully?) misleading. Shouldn't you want to correct
that? Your "evidence" here is specifically that in order to use the
tidyverse to solve this problem, you *must* use 3 "different"1
<#m_-8589839883679646234_user-content-fn-1-a0e325d37bc6ad1a1c168b715af6ed71>
map() functions.
The R learner here must learn two different FP map functions for this
particular example. This is an excellent example of Tidy's cognitive
overload problem.
That is incorrect. You need not use any. In fact, it is not recommended
that you use any.
Footnotes
1.
This is also misleading in two ways. First, map(), and map_dbl() have
the exact same interface and only differ in what they return so there is
very little to learn here, by design. Second, even if you did *have*
to learn map(), you don't need map_dbl() there! Just use map() and get
your output as a list! ↩
<#m_-8589839883679646234_user-content-fnref-1-a0e325d37bc6ad1a1c168b715af6ed71>
—
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZ34ZLMRSMZ6WIMBSPTOFD2ROG7DAVCNFSM6AAAAABXWWRYUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNZZG4YTOOBZGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Not sure I understand, because the |
My own view -- different people have different views -- is that in teaching R learners who lack prior coding background, one should keep it as simple as possible. Abstractions that are second nature to experienced coders are not easy for such learners to understand, let alone use. So to me, just because a construct is part of base-R does not mean it is appropriate for these learners. |
If the problem is that some FP approaches are too complex for begginers because overly complicated abstractions demand you to bend your mind into a tesseract, then yes I totally agree, overly complex abstractions (as the ones in purrr package) should be avoided to begginers. If the problem is that FP is abstract, if we stretch that argument we can say that everything is abstract. If you let me put another argument. Take the dataset For all the country:
For every region:
This is a very basic task that you would learn in every introductory course of Stata (and SPSS and every other statistical package). ¿How you do the same task on R? For the full country is pretty straightforward:
But now we need to repeat the same for every region ¿how you should learn this to a begginer? a)
b)
c)
I totally agree than c) is a total mindbending confusion to any begginer. I would never recommend to teach this to a non coder. But, feel free to disagree, I think a) is far easier than b) for a non coder. a) is almost the same than the Stata code. The most complex part is the use of I'm a sociologist and a non coder myself and I have teach a couple of sociologist, so I can tell, from my experience, than a social scientist, begginer in R and non coder can manage the level of complexity of the a) option. |
Someone once said, "Programming is the management and design of abstraction." I totally agree. But of course there are various levels of abstraction. You may not find your example (b) to be aesthetically pleasing, but I submit that it involves the least amount of abstraction. |
Maybe you are right. Maybe is a mistake to try to teach R the same way of Stata or SPSS. And maybe is better teach the b) option first and the a) option later as a time saving trick. I think that has some benefits, like introduce the concept of loops. A better aproach could be teach loops first, named functions later, anonymus functions with the full But I still thinking that anonymus functions, pipes and dplyr verbs are a kind of necessary evil. My life is a lot easier with them. I can code faster and I can understand the code when I read it several months later. And I can spend more time doing what a sociologist is supposed to do, analyse data through a sociological reference framework. |
As far as I know, Stata and SPSS are not programming languages, so a reasonable comparison is not possible. Instead, they are very much like the way dplyr is taught to beginners -- do a few simple use cases, then do tons of examples using those use cases. For experienced coders, FP solutions can be more compact and clearer than a loop. |
I still find this line as having the absolute least amount of abstraction:
It is clear, it is straightforward, and easy to use. |
I find that teaching R to non-coders is easier when jumping straight to the
best practices. Going first for base R and then later to tidyverse often
creates frustration that they feel they have wasted time earlier, or that
there are multiple ways of doing things. Loops are actually really easy for
people to grasp. I do agree that closures and anonymous functions are
tricky, but once learned it can be used in many places.
In the same vein: Why learn == equivalence and is.na when testing set
membership (albeit slightly slower) that handles missingness without
shooting oneself in the foot. Easier to stick to case_when than
ifelse/if_else since the former allows easy extension.
Non-programmers dislike learning more than necessary or multiple ways to
solve a problem. That's why I often end up jumping to tidyverse.
…On Fri, 28 Feb 2025, 23:55 Norm Matloff, ***@***.***> wrote:
As far as I know, Stata and SPSS are not programming languages, so a
reasonable comparison is not possible. Instead, they are very much like the
way dplyr is taught to beginners -- do a few simple use cases, then do tons
of examples using those use cases.
For experienced coders, FP solutions can be more compact and clearer than
a loop.
—
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADE3365GZMESK2FPISVLYTT2SDSPHAVCNFSM6AAAAABXWWRYUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJRGY4DEMRRG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
[image: matloff]*matloff* left a comment (matloff/TidyverseSkeptic#33)
<#33 (comment)>
As far as I know, Stata and SPSS are not programming languages, so a
reasonable comparison is not possible. Instead, they are very much like the
way dplyr is taught to beginners -- do a few simple use cases, then do tons
of examples using those use cases.
For experienced coders, FP solutions can be more compact and clearer than
a loop.
—
Reply to this email directly, view it on GitHub
<#33 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADE3365GZMESK2FPISVLYTT2SDSPHAVCNFSM6AAAAABXWWRYUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOJRGY4DEMRRG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
You don't need to. The fact that a language offers dozens of possibilities is a sign of flexibility, not a teaching weakness.
If having a single, simple way to solve things is the issue, I suggest going back to SPSS. But I am totally against fixing beginners into a single mindset framework that would give the false impression that is the (only) "standard". |
If one is looking at the long term, in which former beginners now tackle more complex settings, sometimes there is no good way to avoid loops. So for those who think beginners should be equipped with "advanced" tools, one such tool is loops. The tidyverse people, on the other hand, tell learners that loops are Bad Things. My own view is that we should aim to quickly bring beginners up to a level where they can handle real problems. I believe that in general, FP slows down this process, even though one can point to special cases in which FP might be clearer. It may be a good idea to bring in FP a little bit at a time. |
TidyverseSkeptic/README.md
Line 641 in a8ad4dd
This works fine:
The problem here is not having to learn new paradigms just to do it, it's that you can't easily save intermediate steps because summarize wants the right hand side to be a vector, not a model object.
For example, the following code errors:
And to get it to work, you have to start dealing with list-columns, which is a whole thing:
The text was updated successfully, but these errors were encountered: