Skip to content

Less expensive lazy seq forcing #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

jjl
Copy link

@jjl jjl commented Mar 18, 2017

Additionally this change should also force delays.

@clojureman
Copy link
Owner

Hi,
thank you for addressing the efficiency of the eagerization.
I am very open for better ways than using pr-str.
Maybe you could suggest another way of solving it?
Unfortunately the present pull request will not ensure eagerization.
This can be shown by counter-example:

(instance? clojure.lang.IPending
           (first (doall (for [x [1]]
                              (for [x [0]]
                                   x))))) 
=> true

Again, let me emphasize that I'm very glad to see a pull request and hope to see more.

@jjl
Copy link
Author

jjl commented Mar 19, 2017

You mean essentially that it doesn't force recursively? That can be fixed...

@jjl
Copy link
Author

jjl commented Mar 19, 2017

On reflection, this doesn't cover all the cases, back to the drawing board again...

@clojureman
Copy link
Owner

My first thought was to use walk, but I have seen it fail in scenarios where it should not.
Therefore I do not trust walk in the general case.
However, I have not dived sufficiently deep into the problem.

The dream would be to have an eagerize function that could be called multiple times on the same structure and only traverse it once.

@jjl
Copy link
Author

jjl commented Mar 19, 2017

I'll have a think and a play and see what I come up with.

@clojureman
Copy link
Owner

We'll have to figure out some smart way of testing it

@kurt-o-sys
Copy link

kurt-o-sys commented Jul 21, 2017

Not sure, but would this eager-function make any sense to replace the pr-str (making a slight change to one of the earlier propositions):

(defn eager [s]
  (if (instance? #?(:clj clojure.lang.IPending :cljs cljs.core.IPending) s)
    (cond
     (map? s) (reduce
               #(merge {(first %2) (eager (second %2))} %1)
               {} s)
     (seq? s) (doall (map eager s))
     :else (force s))
  s))

Other thought: @clojureman d'you remember in which case postwalk didn't do well?

@jjl
Copy link
Author

jjl commented Jul 21, 2017

postwalk won't work if a lazy thing is contained by a java object that clojure doesn't recognise as being a sequence of some kind.

@kurt-o-sys
Copy link

right... so the code I posted above will probably not work either?

@jjl
Copy link
Author

jjl commented Jul 21, 2017

No, in fact I don't think it's generally solveable. The str hack isn't even guaranteed to work, it's just significantly more likely to. If you were to put an IPending in a java object whose tostring didn't actually print that value, it would not be realised.

@kurt-o-sys
Copy link

kurt-o-sys commented Jul 22, 2017

Allright, so to summarize:

  1. The (only?) problem is 'wrapped `IPending``
  2. pr-str doesn't solve the problem of 'wrapped IPending'
  3. code above doesn't solve the problem of 'wrapped IPending'

If both don't solve the wrapped IPending problem - having things like 'it is less likely, well, that's just asking for problems some day - why not be a little more pragmatic and add a kind of warning or restriction to the docs. Just document in which cases it doesn't work and use the optimized version? (Neither guarantee to serialize a 'wrapped IPending', so I'd prefer the optimized one anyway).

Other questions:

  1. how often would it be a problem? I mean, if it's very unlikely to be an issue, it certainly shouldn't be a big issue, as long as it's well-documented, right? What's a (real-case) scenario? (What would be a test-case?)
  2. would there be a possibility to add a kind of meta-data/hints to make it work on tagged (Java) objects?
  3. am I right that if one would use reflection to check all the attributes, one should be able to spot all IPending? (I don't think it'd be a good idea to do so, I'm just trying to fully get what the real-case scenario problem would be and how it could conceptually be solved)

@jjl
Copy link
Author

jjl commented Jul 22, 2017

More or less. The current fix is probabilistic. It will work so long as you use only objects which happen to do the right thing under stringification (which is most clojure data!)

Personally I don't think we should care so much about laziness, because laziness is the devil and there are already so many circumstances under which it can go wrong that what's one more?

Answers:

  1. If you never use lazy data or things that don't stringify their attributes properly, it will never be a problem
  2. No, however you can implement a protocol for one and have it default to pr-str
  3. I considered the reflection approach, but performance will take a helluva dive if we go down that route...

@kurt-o-sys
Copy link

Right... agreed.
So, if laziness is a problem anyway (certainly wrapped laziness), wouldn't an optimized 'eagerization' not be the way to go (replacing pr-str)?

@jjl
Copy link
Author

jjl commented Jul 22, 2017

That depends upon the user. I wasn't prepared to make that call for someone else's project :)

@kurt-o-sys
Copy link

Right. So would you consider the 'eagerization'-function as something to be more pluggable (choosing between pr-str, of the function above, or ..., or a combination thereof)?

Anyway, as a reminder for myself (and maybe others may be interested as well), this is how one can reproduce it (not checking it with postwalk etc, eager being the function above):

boot.user> (deftype Test [s])
boot.user.Test
boot.user> (def o (boot.user.Test. (repeatedly 1000 #(rand-int 42))))
#'boot.user/o
boot.user> (realized? (.-s o))
false
boot.user> (pr-str o)
"#object[boot.user.Test 0x6f6c7d72 \"boot.user.Test@6f6c7d72\"]"
boot.user> (realized? (.-s o))
false
boot.user> (eager o)
#object[boot.user.Test 0x6f6c7d72 "boot.user.Test@6f6c7d72"]
boot.user> (realized? (.-s o))
false
boot.user> (.-s o)
(26 35 3 38 ...)
boot.user> (realized? (.-s o))
true

@jjl
Copy link
Author

jjl commented Jul 22, 2017

It's not my call, it's not my repository. The only thing approaching a solution I came up with was to have a protocol or multimethod and default to pr-str if we don't have a better option, but if it fails, it will fail silently, which rather defeats the purpose.

My ultimate conclusion is that laziness in clojure is mostly a bad idea and you should default to eagerness in the general case.

@kurt-o-sys
Copy link

Right. Makes sense to me. So, instead of having an efficient eager-function, the advice would be to 'eagerize' everything yourself, so to avoid the (rather inefficient) pr-str. It may be a good idea to document it?

@jjl
Copy link
Author

jjl commented Jul 22, 2017

The library has already made that choice for you. If you don't agree with it, either fork it or use one of the alternatives that make a different choice.

As it happens, I no longer agree with it as a decision, so it's probably time to close this issue.

@jjl jjl closed this Jul 22, 2017
@clojureman
Copy link
Owner

tnx, both of you for the comments.
Since dynamic bindings are not carried around for lazy evaluation, we are forced to eagerize if we want to reliably reason about our code.
The question is how to best eagerize. Your idea of pluggable eagerization makes sense, I'll give it some thought.
As for the possible eagerization strategies, some benefit could be gained from using meta-data to mark already eagerized data, but again it's up the application to decide, as it could break code that relies heavily on meta-data.

I could change the function manage-with-courage from private to public if that would help you.
It is like manage, but without eagerization

@kurt-o-sys
Copy link

Hey... we're all convinced about the eagerize-step. The question is more how to do it and how to handle the 'impossible to capture' lazy seqs (see my example above). When it comes to making it pluggable, here are 2 ways I can think of:

  1. make it a multimethod, so one could provide an own implementation. Add a (first?) parameter to manage to tell what method should be used.
  2. pass in a eager-function directly into manage.

It all together would look like this (for the case of passing an eagerize-fn, similar for using multimethods):

(defn- eagerize
   ... ;;default implementation
  )

(defn manage-dispatch
  "Takes an eagerize-function, a function f and an \"inlined\" map of conditions and keywords.
  Returns a function in which these conditions are managed.
  The returned function can safely be called from another thread
  than the one in which it was created.
  f is allowed to be lazy, but the result must be finite, as it will
  always be fully realized. In other words: manage returns an eager function."

  [eager-fn f & restarts]
  (apply manage-with-courage (eager-fn f) restarts))

;; use a default the manage, don't break compatability
(def manage (partial manage-dispatch eagerize))

Also, I'd add some stuff to the docs, about the cases it won't work, and examples of eagerize-functions with postwalk etc.

@didibus
Copy link

didibus commented Jul 23, 2017 via email

@kurt-o-sys
Copy link

kurt-o-sys commented Jul 23, 2017

If a function is public, well... making your own version would mean redefining it, right? I don't see the point of this - or, well, please provide me an example of how this makes more sense than using function passing or multimethods.

I don't think it's necessary to make an own version of manage-with-courage. The function that matters is the eagerize function. That's the one that may (or may not) have expensive eagerization. I don't see how making a function public is an advantage over passing in the important function that would change (or using multimethods, for that matter). That's the whole point of passing the right function in some way or another: you can choose which eagerization to use - or none if you know it's all fine.

In the code I propose, you still have the default, or you can choose to add another eagerization.

@jjl
Copy link
Author

jjl commented Jul 24, 2017

I think @didibus' point was that manage-with-courage being made public would allow you to create your own alternative under a different name, either in a library or your application, not overwriting anything.

It's a practical choice. There is nothing to stop you from implementing eagerize on top of it. Personally, I don't care for eagerisation because I think laziness in clojure is a terrible idea.

@kurt-o-sys
Copy link

kurt-o-sys commented Jul 24, 2017

OK... I know there's nothing to stop me from forking, or implementing eagerize on top of a function that's called somewhere else in a library. It just makes the library less useful. tbh, I think you do care about eagerisation, 'cause that's why you forked and implemented your own version. It's pretty clear that the discussion is about eagerisation and nothing else, actually. To avoid a dozen of forks of the same library, it makes more sense to me to let people choose how to eagerize, if that's the variable part (which it is).

So, I still would prefer to have this library being flexible enough to have a custom eager function - that's really not hard to implement, and would allow for everyone who likes so to avoid pr-str, and choose e.g. to use post-walk, e.g.: (manage-with #(walk/postwalk %) f :odd ...) instead of (manage f :odd ...). This would allow you to do something like: (my-manage (partial manage-with #(walk/postwalk %))).

It can be documented how the default eager-function works, and why this default is chosen. It can also be documented what alternatives there may be (or some alternatives might even be embedded in the library, again, documenting the trade-offs).

I agree laziness can be a terrible idea in many cases, and I don't expect a lot of issues with it either (especially laziness embedded in an object). This has been discussed already: if one avoids laziness, one would prefer an efficient eager-function, not pr-str. This is why making the eager-function, and only that part, extendable/pluggable/variable makes sense to me. I mainly wonder if the maintainer of this lib agrees or not. He seemed to have interest in a pluggable eagerization, so there's hope (for me, at least).

@jjl
Copy link
Author

jjl commented Jul 24, 2017

I forked it because it looked interesting and the pr-str caught my eye when perusing the implementation. I'm just a casual contributor in this case, not a user with a stake in its future, sorry to disappoint.

Might I suggest a fresh issue?

@kurt-o-sys
Copy link

Oh, I'm not dissapointed... I fully agree with you! I have no stake in its future either. I just think it's a bit weird if people start to fork because about 1 or 2 lines of code are well... up to the user of the lib to decide.
There's already a new pull request, implementing both multimethods and function-passing :).
#8

@clojureman
Copy link
Owner

clojureman commented Jul 24, 2017

Tnx I'll make a decision within a week and release a new version

@didibus
Copy link

didibus commented Jul 24, 2017

Doesn't it work to just capture the binding in a closure to solve the lazy issue? Like this:

(fn [& args]
    (binding [*-special-condition-handlers-* (merge *-special-condition-handlers-* restarts)]
(let [*-special-condition-handlers-* *-special-condition-handlers-*]
      (apply f args)))))

Just a thought.

@didibus
Copy link

didibus commented Jul 24, 2017

Nevermind, this works in simpler cases. The lexical scope won't reach far enough for condition to see it I believe.

I still vote for manage-with-courage to be made public. I would change its name to manage! at the same time.

I'm ok with manage also being modified to be configurable about its eagerization scheme though, both are not mutually exclusive.

In practice, I don't think that lazyness is an issue. In fact, forced eagerization can be surprising to someone knowingly being lazy. Even Clojure's try/catch can not handle lazyness in that way. If someone wants to restart lazy conditions, he should call manage from inside the lazy construct.

@kurt-o-sys
Copy link

Agreed on changing the name of manage-with-courage. Not sure if adding a ! is really the way to go. Naming's hard...
Other comments: see #8

@clojureman
Copy link
Owner

Actually a ! (bang) suggests mutability, so maybe it would confuse some people.
How about a (manage-with ...) function for custom eagerization and (manage-with-courage ...) for the daring uneagerized calls?

@kurt-o-sys
Copy link

:)
manage-with-courage is a bit funny... why not manage-eager (or something similar)?
I like manage-with (or even manage-as) for custom eagerization.

@clojureman would you go for multimethods or function passing for custom eagerization? (My personal preference is the multimethod route, since it's looks easier to me to embed some different strategies - see #8 )

@clojureman
Copy link
Owner

clojureman commented Jul 24, 2017

See #8 (comment) for my concern about a global setting for customisation.
Instead doing a
(def my-manage (partial manage-with my-wonderful-eagerizer))
should be ok for most uses I think?

@kurt-o-sys
Copy link

kurt-o-sys commented Jul 24, 2017

see #8 (comment) - where does the idea of a global setting comes from? I guess nobody wants that. Looks pretty terrible to me.

@didibus
Copy link

didibus commented Jul 25, 2017

The clojure style guide says:

The names of functions/macros that are not safe in STM transactions should end with an exclamation mark (e.g. reset!)

Lazyness is not safe inside an STM transaction for the same reason it isn't in special.

I've also seen ! used more often to simply mean "be careful with this function". The use of it for purity is not consistent.

That's just my two cents. I'll be okay with any name.

@kurt-o-sys
Copy link

@didibus Good point! My first thought was to use transact! as well, but I didn't use it 'cause I should use it for 'functions that mutate' only. And true as well: pragmatism > dogmatism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants