-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate syntax map externally, feed to plugin #332
Comments
Just as a benchmark for the kind of speed we're talking about, I took the source code for a 100k word book that has about 2k cross-reference links and fed it though Pandoc: 12.1 seconds I had to run it on loops of 100 to even get a reasonable estimate of the time elapsed, and even then it took me a while to convince myself it wasn't just dumping some cached result and exiting. |
Really interesting find! How much do you estimate it would take to extend the parser so it can handle some of the pandoc extensions to commonmark? I was looking at the commonmark spec the other day and it really is very lean. Is the output of pulldown-cmark comparable with pandoc's? |
The HTML they generated is not 100% compatible. The pulldown-cmark parser didn't know what do to with my inline SILE code and just output it wrapped in Right now extending the parser it uses might be a little beyond me. It is pretty low level byte by byte parsing. However the high level idiomatic Rust interface it provides when it is done is a dream to work with. It does optionally cover more than the CommonMark spec (footnates, Github flavored tables, task lists, stikethrough, etc.), so in theory it should be possible to add more optional extensions. I'm not sure bringing it around to legacy Pandoc flavored markup is in the cards though. Adding extensions for extra features maybe, but changing the existing ones to reflect the flavor variations, probably not. The main thing that caught my attention is not it's rendered output but the combination of the speed plus the easy access to the source map data from a library that would be relatively simple to build our own backed on. |
Really, as long as we can get a good representation of the structure of documents, we can supplement anything it's missing on the client side. |
We don't even need to call the library, the binary has an option to emit the events and ranges it detects:
It is possible to process this output and feed vim syntax highlighting with |
I realize we could parse that text dump –and maybe as a proof of concept it's worth tinkering with– but I think it will be an order of magnitude faster to write ourselves a small library that parses exactly what we need in Rust and spoon feeds it to Lua inside vim in away that needs as little post-processing as possible to refresh the highlights. |
I haven't worked with rust at all, but if you are willing to go in that direction it's worth exploring I think. I can help out with some glue on the vim side. From what I gather from the event dump, it should be enough to get all the |
It looks like there are a couple different architectures we could go with. I'm not sure I can rightly describe them all, much less pick what's best. An interesting note for now is that there seems to be a way to use RPC calls directly from rust to add/remove highlights in a Neovim instance. If using the msgpack-rpc interface isn't what we want to do (or we don't want to limit this to Neovim) the other main alternative I see is using mlua to create a module |
This is just a version of The
In neovim we could listen to document changes events, and keep something like a shadow version of the documents that could be parsed asynchronously. Slices are difficult because partial markdown might not parse correctly. For example, in this case:
vs.
we might get a list in the first case and a codeblock in the second. |
I got a basic module working with Buggers. What this suggests is that maybe this road leads to insanity. If the plugin has to be compiled against the version of Lua that people's editors was compiled against (i.e. they need matching header files) we're going down a road to purgatory. We'll never come out alive. We might be able to limit ourselves to Neovim, but we can't force people to recompile their Neovim to use a version of Lua we support. Unless it turns out I'm missing something there, it looks like the RPC route is the way to go. There is some effort to make RPC work with VIM8 too. We don't have to use the highlight command there is an RPC binding for to |
I just want to say
(Sorry for the non-links! I haven't been able to figure out how to reference a repo/org rather than an issue!) |
@bpj Thanks for the feedback.
|
@alerque Did you make more progress? This morning I made a little python module using pyo3, but I didn't get very far because I have no rust experience (I did manage to make the module consume commonmark and spit out html, so it wasn't all loss :p) |
I did get somewhere actually. I got the What version of Lua is built into your nvim? I still suspect the RPC method is the right way to do this, but this is an interesting Learning experience for sure. |
I'm running Arch too (using the neovim-git package), so 2.0.5. |
I'm hacking in the LUA_INC=/usr/include/luajit-2.0/ LUA_LIB=/usr/lib LUA_LIB_NAME=luajit-5.1 LUA_LINK=dynamic cargo build After building (once) I've just been symlinking the the module into the project root directory to load in Neovim: ln -s target/debug/libvim_pandoc_syntax.so Then from Neovim: :lua s = require("libvim_pandoc_syntax")
:lua print(s.render_html("some *markdown* string")) Messing with passing other data back now. I'm in Gitter if you're around. |
@alerque Vim-pandoc works in Termux but it uses a lot more storage than I feel I can spare just to get some more bells and whistles. I use it on my tablet where I've got more storage. So JGM deems his PEG markdown parsers to be failures? Doesn't really surprise me! I've been thinking all day about reimplementing my Perl search/replace plugin in Python. If you drop regular Vim I'll probably do it! However I also found out that the Termux |
Sorry I meant to say that the termux nvim build barfs on plugins requiring python 3 while the vim-python build doesn't. I haven't checked on my laptop yet but I imagine nvim is/can be built with py 3 support there. As for running Pandoc in Termux last I heard Pandoc doesn't build on ARM. |
@bpj This didn't sound right to me. I looked into it and the Perl snafu seems to have just been a case of temporarily breakage during refactoring. Lots of things didn't work at first when they were first bringing up the basics after the great dead code purge, and getting Perl back online seems to have taken longer than most features. The good news is Perl is back on the menu (or will be in the next release if you're on stable release packages, it works for me now in the current Git master). |
@alerque a small update on this: I can now use plugins which use python3 (but not python2) in nvim on termux ( I still can't use Perl in neither vim nor nvim on termux simply because their builds of both are done without Perl, and it doesn't seem like they are going to change their mind! :) So it seems I'll have to do that rewrite of my regex substitution plugin using the python regex module. Makes an old Perl hacker's heart bleed but seems like a good way of getting my feet wet in writing python plugins. |
New idea. While mucking around with LPEG and various CommonMark implementations for #327, I ran into the pulldown-cmark project. Given I've been learning Rust lately, the speed at which this thing works got my attention.
I know there are plugins out there that use external scripts or binaries to inform their work. For example vim-clap uses a Rust backend as a data provider to its fuzzy search operations. Many other plugins use Python or Lua data providers.
We know we can use a Lua based PEG grammar and feed the syntax highlighter data. What's to stop us from using a much faster backend (and an implementation type the author of CommonMark doesn't think is impossible) that is already CommonMark compliant to generate a source map and use that to inform our plugin?
Obviously asking Pandoc would have been preferable (see #300), but as of yet Pandoc doesn't keep a source map, and pulldown-cmark does. If Pandoc is moving towards CommanMark and this already is 100% CommonMark compliant, is there a reason not to go down this road?
Asking for a friend.
The text was updated successfully, but these errors were encountered: