-
Notifications
You must be signed in to change notification settings - Fork 294
Selected patches from Calibre #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This seems to have broken 2.6 badly. Huh. |
Current coverage is 89.15%@@ master #245 diff @@
==========================================
Files 51 50 -1
Lines 6817 6726 -91
Methods 0 0
Messages 0 0
Branches 1316 1307 -9
==========================================
- Hits 6172 5996 -176
- Misses 485 559 +74
- Partials 160 171 +11
|
Oh, right, this is |
How do you suggest I override the application of attributes for html and body tags in my builder? Since without those patches, it would require overriding the entire getPhases() method in html5parser.py Remember that the problem those patches is solving is that there can be multiple If you dont want to merge gsnedders/html5lib-python@a2d2e05 then how do you suggest I replace the the stream input class? The one in html5lib is too slow. The only alternative I can see is monkey patching -- which is less than optimal for obvious reasons. |
@kovidgoyal I'll take a look at dealing with html/body attributes later (I'm literally amount to board a plane). When it comes to the input stream, if it yields good perf increases when given a byte/unicode object we should just specialise them in html5lib. |
Sure you are welcome to take the input stream class from calibre for dealing with unicode objects. It is faster because it avoids wrapping the unicode in StringIO. And it actually implements tracking of positions. For my use case, that is important, since I need line and col numbers. |
82b971c
to
1f04a3f
Compare
76bf242
to
761f3ab
Compare
See #119. CC @kovidgoyal.
This cherry-picks a few things from https://github.com/gsnedders/html5lib-python/commits/calibre-patches, which was a complete set of Calibre's patches from November 2013. https://github.com/kovidgoyal/calibre/commits/master/src/html5lib has very little changed in it since then, primarily a move to 0.999999-dev and a separate downstream fix for 0c551c9.
So, of those on that branch…
True
/False
cases it's likely slower, therefore failing at its stated goal, as it results in more byte code andPOP_JUMP_IF_FALSE
andPOP_JUMP_IF_TRUE
special-case the condition beingTrue
orFalse
(oddly, they don't specialiseNone
, though it is inPyObject_IsTrue
; if that makes any notable performance difference then I'd suggest fixing that in CPython).