Less expensive lazy seq forcing #2

jjl · 2017-03-18T11:36:57Z

Additionally this change should also force delays.

clojureman · 2017-03-19T13:46:49Z

Hi,
thank you for addressing the efficiency of the eagerization.
I am very open for better ways than using pr-str.
Maybe you could suggest another way of solving it?
Unfortunately the present pull request will not ensure eagerization.
This can be shown by counter-example:

(instance? clojure.lang.IPending
           (first (doall (for [x [1]]
                              (for [x [0]]
                                   x))))) 
=> true

Again, let me emphasize that I'm very glad to see a pull request and hope to see more.

jjl · 2017-03-19T13:53:31Z

You mean essentially that it doesn't force recursively? That can be fixed...

jjl · 2017-03-19T14:04:23Z

On reflection, this doesn't cover all the cases, back to the drawing board again...

clojureman · 2017-03-19T14:23:05Z

My first thought was to use walk, but I have seen it fail in scenarios where it should not.
Therefore I do not trust walk in the general case.
However, I have not dived sufficiently deep into the problem.

The dream would be to have an eagerize function that could be called multiple times on the same structure and only traverse it once.

jjl · 2017-03-19T14:26:08Z

I'll have a think and a play and see what I come up with.

clojureman · 2017-03-19T14:35:20Z

We'll have to figure out some smart way of testing it

kurt-o-sys · 2017-07-21T00:27:40Z

Not sure, but would this eager-function make any sense to replace the pr-str (making a slight change to one of the earlier propositions):

(defn eager [s]
  (if (instance? #?(:clj clojure.lang.IPending :cljs cljs.core.IPending) s)
    (cond
     (map? s) (reduce
               #(merge {(first %2) (eager (second %2))} %1)
               {} s)
     (seq? s) (doall (map eager s))
     :else (force s))
  s))

Other thought: @clojureman d'you remember in which case postwalk didn't do well?

jjl · 2017-07-21T05:48:22Z

postwalk won't work if a lazy thing is contained by a java object that clojure doesn't recognise as being a sequence of some kind.

kurt-o-sys · 2017-07-21T19:56:26Z

right... so the code I posted above will probably not work either?

jjl · 2017-07-21T20:23:00Z

No, in fact I don't think it's generally solveable. The str hack isn't even guaranteed to work, it's just significantly more likely to. If you were to put an IPending in a java object whose tostring didn't actually print that value, it would not be realised.

kurt-o-sys · 2017-07-22T07:42:57Z

Allright, so to summarize:

The (only?) problem is 'wrapped `IPending``
pr-str doesn't solve the problem of 'wrapped IPending'
code above doesn't solve the problem of 'wrapped IPending'

If both don't solve the wrapped IPending problem - having things like 'it is less likely, well, that's just asking for problems some day - why not be a little more pragmatic and add a kind of warning or restriction to the docs. Just document in which cases it doesn't work and use the optimized version? (Neither guarantee to serialize a 'wrapped IPending', so I'd prefer the optimized one anyway).

Other questions:

how often would it be a problem? I mean, if it's very unlikely to be an issue, it certainly shouldn't be a big issue, as long as it's well-documented, right? What's a (real-case) scenario? (What would be a test-case?)
would there be a possibility to add a kind of meta-data/hints to make it work on tagged (Java) objects?
am I right that if one would use reflection to check all the attributes, one should be able to spot all IPending? (I don't think it'd be a good idea to do so, I'm just trying to fully get what the real-case scenario problem would be and how it could conceptually be solved)

jjl · 2017-07-22T08:07:19Z

More or less. The current fix is probabilistic. It will work so long as you use only objects which happen to do the right thing under stringification (which is most clojure data!)

Personally I don't think we should care so much about laziness, because laziness is the devil and there are already so many circumstances under which it can go wrong that what's one more?

Answers:

If you never use lazy data or things that don't stringify their attributes properly, it will never be a problem
No, however you can implement a protocol for one and have it default to pr-str
I considered the reflection approach, but performance will take a helluva dive if we go down that route...

kurt-o-sys · 2017-07-22T08:33:49Z

Right... agreed.
So, if laziness is a problem anyway (certainly wrapped laziness), wouldn't an optimized 'eagerization' not be the way to go (replacing pr-str)?

jjl · 2017-07-22T08:38:20Z

That depends upon the user. I wasn't prepared to make that call for someone else's project :)

kurt-o-sys · 2017-07-22T18:29:11Z

Right. So would you consider the 'eagerization'-function as something to be more pluggable (choosing between pr-str, of the function above, or ..., or a combination thereof)?

Anyway, as a reminder for myself (and maybe others may be interested as well), this is how one can reproduce it (not checking it with postwalk etc, eager being the function above):

boot.user> (deftype Test [s])
boot.user.Test
boot.user> (def o (boot.user.Test. (repeatedly 1000 #(rand-int 42))))
#'boot.user/o
boot.user> (realized? (.-s o))
false
boot.user> (pr-str o)
"#object[boot.user.Test 0x6f6c7d72 \"boot.user.Test@6f6c7d72\"]"
boot.user> (realized? (.-s o))
false
boot.user> (eager o)
#object[boot.user.Test 0x6f6c7d72 "boot.user.Test@6f6c7d72"]
boot.user> (realized? (.-s o))
false
boot.user> (.-s o)
(26 35 3 38 ...)
boot.user> (realized? (.-s o))
true

jjl · 2017-07-22T19:17:21Z

It's not my call, it's not my repository. The only thing approaching a solution I came up with was to have a protocol or multimethod and default to pr-str if we don't have a better option, but if it fails, it will fail silently, which rather defeats the purpose.

My ultimate conclusion is that laziness in clojure is mostly a bad idea and you should default to eagerness in the general case.

kurt-o-sys · 2017-07-22T19:36:31Z

Right. Makes sense to me. So, instead of having an efficient eager-function, the advice would be to 'eagerize' everything yourself, so to avoid the (rather inefficient) pr-str. It may be a good idea to document it?

jjl · 2017-07-22T20:03:30Z

The library has already made that choice for you. If you don't agree with it, either fork it or use one of the alternatives that make a different choice.

As it happens, I no longer agree with it as a decision, so it's probably time to close this issue.

clojureman · 2017-07-23T08:28:03Z

tnx, both of you for the comments.
Since dynamic bindings are not carried around for lazy evaluation, we are forced to eagerize if we want to reliably reason about our code.
The question is how to best eagerize. Your idea of pluggable eagerization makes sense, I'll give it some thought.
As for the possible eagerization strategies, some benefit could be gained from using meta-data to mark already eagerized data, but again it's up the application to decide, as it could break code that relies heavily on meta-data.

I could change the function manage-with-courage from private to public if that would help you.
It is like manage, but without eagerization

kurt-o-sys · 2017-07-23T18:21:14Z

Hey... we're all convinced about the eagerize-step. The question is more how to do it and how to handle the 'impossible to capture' lazy seqs (see my example above). When it comes to making it pluggable, here are 2 ways I can think of:

make it a multimethod, so one could provide an own implementation. Add a (first?) parameter to manage to tell what method should be used.
pass in a eager-function directly into manage.

It all together would look like this (for the case of passing an eagerize-fn, similar for using multimethods):

(defn- eagerize
   ... ;;default implementation
  )

(defn manage-dispatch
  "Takes an eagerize-function, a function f and an \"inlined\" map of conditions and keywords.
  Returns a function in which these conditions are managed.
  The returned function can safely be called from another thread
  than the one in which it was created.
  f is allowed to be lazy, but the result must be finite, as it will
  always be fully realized. In other words: manage returns an eager function."

  [eager-fn f & restarts]
  (apply manage-with-courage (eager-fn f) restarts))

;; use a default the manage, don't break compatability
(def manage (partial manage-dispatch eagerize))

Also, I'd add some stuff to the docs, about the cases it won't work, and examples of eagerize-functions with postwalk etc.

didibus · 2017-07-23T19:12:14Z

If manage-with-courage is made public, everyone is free to use it to make their own eagerized version, and the advantage over polymorphism is that you can also choose to use it as is, if you know you're not in a situation where laziness will break or you already have an eager collection.

…

On Sun, Jul 23, 2017, 11:21 kurtosys ***@***.***> wrote: Hey... we're all convinced about the eagerize-step. The question is more how to do it and how to handle the 'impossible to capture' lazy seqs (see my example above). When it comes to making it pluggable, here are 2 ways I can think of: 1. make it a multimethod, so one could provide an own implementation. Add a (first?) parameter to manage to tell what method should be used. 2. pass in a eager-function directly into manage. It all together would look like this (for the case of passing an eagerize-fn, similar for using multimethods): (defn- eagerize ... ;;default implementation ) (defn manage-dispatch "Takes an eagerize-function, a function f and an \"inlined\" map of conditions and keywords. Returns a function in which these conditions are managed. The returned function can safely be called from another thread than the one in which it was created. f is allowed to be lazy, but the result must be finite, as it will always be fully realized. In other words: manage returns an eager function." [eager-fn f & restarts] (apply manage-with-courage (eager-fn f) restarts)) ;; use a default the manage, don't break compatability (def manage (partial manage-dispatch eagerize)) Also, I'd add some stuff to the docs, about the cases it won't work, and examples of eagerize-functions with postwalk etc. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAktxEVzVJTr6g64YBP4DfrpEua61y2qks5sQ48bgaJpZM4MhYt3> .

kurt-o-sys · 2017-07-23T19:45:12Z

If a function is public, well... making your own version would mean redefining it, right? I don't see the point of this - or, well, please provide me an example of how this makes more sense than using function passing or multimethods.

I don't think it's necessary to make an own version of manage-with-courage. The function that matters is the eagerize function. That's the one that may (or may not) have expensive eagerization. I don't see how making a function public is an advantage over passing in the important function that would change (or using multimethods, for that matter). That's the whole point of passing the right function in some way or another: you can choose which eagerization to use - or none if you know it's all fine.

In the code I propose, you still have the default, or you can choose to add another eagerization.

jjl · 2017-07-24T10:04:06Z

I think @didibus' point was that manage-with-courage being made public would allow you to create your own alternative under a different name, either in a library or your application, not overwriting anything.

It's a practical choice. There is nothing to stop you from implementing eagerize on top of it. Personally, I don't care for eagerisation because I think laziness in clojure is a terrible idea.

kurt-o-sys · 2017-07-24T10:15:57Z

OK... I know there's nothing to stop me from forking, or implementing eagerize on top of a function that's called somewhere else in a library. It just makes the library less useful. tbh, I think you do care about eagerisation, 'cause that's why you forked and implemented your own version. It's pretty clear that the discussion is about eagerisation and nothing else, actually. To avoid a dozen of forks of the same library, it makes more sense to me to let people choose how to eagerize, if that's the variable part (which it is).

So, I still would prefer to have this library being flexible enough to have a custom eager function - that's really not hard to implement, and would allow for everyone who likes so to avoid pr-str, and choose e.g. to use post-walk, e.g.: (manage-with #(walk/postwalk %) f :odd ...) instead of (manage f :odd ...). This would allow you to do something like: (my-manage (partial manage-with #(walk/postwalk %))).

It can be documented how the default eager-function works, and why this default is chosen. It can also be documented what alternatives there may be (or some alternatives might even be embedded in the library, again, documenting the trade-offs).

I agree laziness can be a terrible idea in many cases, and I don't expect a lot of issues with it either (especially laziness embedded in an object). This has been discussed already: if one avoids laziness, one would prefer an efficient eager-function, not pr-str. This is why making the eager-function, and only that part, extendable/pluggable/variable makes sense to me. I mainly wonder if the maintainer of this lib agrees or not. He seemed to have interest in a pluggable eagerization, so there's hope (for me, at least).

jjl · 2017-07-24T10:29:27Z

I forked it because it looked interesting and the pr-str caught my eye when perusing the implementation. I'm just a casual contributor in this case, not a user with a stake in its future, sorry to disappoint.

Might I suggest a fresh issue?

kurt-o-sys · 2017-07-24T10:33:31Z

Oh, I'm not dissapointed... I fully agree with you! I have no stake in its future either. I just think it's a bit weird if people start to fork because about 1 or 2 lines of code are well... up to the user of the lib to decide.
There's already a new pull request, implementing both multimethods and function-passing :).
#8

clojureman · 2017-07-24T10:57:50Z

Tnx I'll make a decision within a week and release a new version

didibus · 2017-07-24T17:58:25Z

Doesn't it work to just capture the binding in a closure to solve the lazy issue? Like this:

(fn [& args]
    (binding [*-special-condition-handlers-* (merge *-special-condition-handlers-* restarts)]
(let [*-special-condition-handlers-* *-special-condition-handlers-*]
      (apply f args)))))

Just a thought.

didibus · 2017-07-24T18:35:01Z

Nevermind, this works in simpler cases. The lexical scope won't reach far enough for condition to see it I believe.

I still vote for manage-with-courage to be made public. I would change its name to manage! at the same time.

I'm ok with manage also being modified to be configurable about its eagerization scheme though, both are not mutually exclusive.

In practice, I don't think that lazyness is an issue. In fact, forced eagerization can be surprising to someone knowingly being lazy. Even Clojure's try/catch can not handle lazyness in that way. If someone wants to restart lazy conditions, he should call manage from inside the lazy construct.

kurt-o-sys · 2017-07-24T19:54:24Z

Agreed on changing the name of manage-with-courage. Not sure if adding a ! is really the way to go. Naming's hard...
Other comments: see #8

clojureman · 2017-07-24T20:00:48Z

Actually a ! (bang) suggests mutability, so maybe it would confuse some people.
How about a (manage-with ...) function for custom eagerization and (manage-with-courage ...) for the daring uneagerized calls?

kurt-o-sys · 2017-07-24T20:08:11Z

:)
manage-with-courage is a bit funny... why not manage-eager (or something similar)?
I like manage-with (or even manage-as) for custom eagerization.

@clojureman would you go for multimethods or function passing for custom eagerization? (My personal preference is the multimethod route, since it's looks easier to me to embed some different strategies - see #8 )

clojureman · 2017-07-24T20:13:45Z

See #8 (comment) for my concern about a global setting for customisation.
Instead doing a
(def my-manage (partial manage-with my-wonderful-eagerizer))
should be ok for most uses I think?

kurt-o-sys · 2017-07-24T20:26:39Z

see #8 (comment) - where does the idea of a global setting comes from? I guess nobody wants that. Looks pretty terrible to me.

didibus · 2017-07-25T04:13:38Z

The clojure style guide says:

The names of functions/macros that are not safe in STM transactions should end with an exclamation mark (e.g. reset!)

Lazyness is not safe inside an STM transaction for the same reason it isn't in special.

I've also seen ! used more often to simply mean "be careful with this function". The use of it for purity is not consistent.

That's just my two cents. I'll be okay with any name.

kurt-o-sys · 2017-07-25T07:18:50Z

@didibus Good point! My first thought was to use transact! as well, but I didn't use it 'cause I should use it for 'functions that mutate' only. And true as well: pragmatism > dogmatism.

Less expensive lazy seq forcing

4e86c8d

force lazy seqs recursively

da11922

Recursive eagerification

080d553

jjl closed this Jul 22, 2017

kurt-o-sys mentioned this pull request Jul 24, 2017

user-defined eagerization #8

Closed

Less expensive lazy seq forcing #2

Less expensive lazy seq forcing #2

Uh oh!

Conversation

jjl commented Mar 18, 2017

Uh oh!

clojureman commented Mar 19, 2017

Uh oh!

jjl commented Mar 19, 2017

Uh oh!

jjl commented Mar 19, 2017

Uh oh!

clojureman commented Mar 19, 2017

Uh oh!

jjl commented Mar 19, 2017

Uh oh!

clojureman commented Mar 19, 2017

Uh oh!

kurt-o-sys commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjl commented Jul 21, 2017

Uh oh!

kurt-o-sys commented Jul 21, 2017

Uh oh!

jjl commented Jul 21, 2017

Uh oh!

kurt-o-sys commented Jul 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjl commented Jul 22, 2017

Uh oh!

kurt-o-sys commented Jul 22, 2017

Uh oh!

jjl commented Jul 22, 2017

Uh oh!

kurt-o-sys commented Jul 22, 2017

Uh oh!

jjl commented Jul 22, 2017

Uh oh!

kurt-o-sys commented Jul 22, 2017

Uh oh!

jjl commented Jul 22, 2017

Uh oh!

clojureman commented Jul 23, 2017

Uh oh!

kurt-o-sys commented Jul 23, 2017

Uh oh!

didibus commented Jul 23, 2017 via email

Uh oh!

kurt-o-sys commented Jul 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjl commented Jul 24, 2017

Uh oh!

kurt-o-sys commented Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjl commented Jul 24, 2017

Uh oh!

kurt-o-sys commented Jul 24, 2017

Uh oh!

clojureman commented Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

didibus commented Jul 24, 2017

Uh oh!

didibus commented Jul 24, 2017

Uh oh!

kurt-o-sys commented Jul 24, 2017

Uh oh!

clojureman commented Jul 24, 2017

Uh oh!

kurt-o-sys commented Jul 24, 2017

Uh oh!

clojureman commented Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kurt-o-sys commented Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kurt-o-sys commented Jul 21, 2017 •

edited

Loading

kurt-o-sys commented Jul 22, 2017 •

edited

Loading

kurt-o-sys commented Jul 23, 2017 •

edited

Loading

kurt-o-sys commented Jul 24, 2017 •

edited

Loading

clojureman commented Jul 24, 2017 •

edited

Loading

clojureman commented Jul 24, 2017 •

edited

Loading

kurt-o-sys commented Jul 24, 2017 •

edited

Loading