-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting thousands of snippets #7410
Comments
I'm sure the fastest way is using the Haskell API, but how slow is it really to invoke |
This is for a user facing application, 100 snippets per second could be considered as slow imho. I believe most of the time is spent loading pandoc (maybe some caching is done?) between loops. I explored keeping pandoc in memory with Python subprocess, but there is no way to tell pandoc that the end of input has been reached. |
Yes I'm sure the process overhead is substantial. Something compiled from Haskell to iterate over your json and do the conversions using Pandoc as a library should be able to sidestep all that and chew through large inputs pretty fast. |
Using the Haskell API seems to be the right approach. I do not have any experience in Haskell though. Hence the feature request: I believe it would be beneficial to be able to run pandoc in a request/reply setup to avoid the process overhead. |
What do you expect a "request/reply setup" to actually look like? Running as a daemon listening on a TCP port with an API that you can pass it questions and get answers? If so I would suggest that's actually a cool idea, but I would be opposed to it being included in My suggestion is such a "feature" should be housed in its own project as an app that uses Pandoc's libraries but also loads up the necessary TCP port or socket handling and provides a REST or similar API that maps to Pandoc's provided API. Such a project could iterate much faster than Pandoc itself, release on it's own cadence, not worry about experimenting until you get the right ergonomics without needing to do long term support of release because it shipped in Pandoc, would keep the already onerous dependency tree and compile times down, make it easier to maintain and document both parts, and so on. |
I have some corporate constraints and need to support multiple platforms. In some instances ports are blocked, in some other instances I need to support Windows (sockets would not work). I had a REPL in mind:
Maybe there are easier ways to achieve this. If not, I guess your idea is the only practical option. |
I've never seen an environment that blocks localhost ports, so I still think my suggestion would fit (you don't have to expose a port to other machines, and it doesn't have to be a privileged port). That being said a REPL would certainly be another option, but the mechanics of query-response and the design of the API would be largely the same as for a port or socket based API listener. All 3 could be provided by the same project. I still hold that none of the 3 fit well inside this project but would be a great stand alone project based on the library version. |
Then a REST API running on localhost is probably the better approach. I imagine it would need to be developed in Haskell. Then it can be compiled into an executable and can support some basic parameters like host, port and logging. The real issue on my side is the lack of Haskell experience... |
I think such a server would be surprisingly easy to code. |
For the use case of "thousands of snippets", you'd probably want the REST api to accept a batch of conversions in a single POST request, otherwise the networking overhead will be similar to the process overhead. |
Here's an example to get you started. It may need some customization for your needs: |
@mb21 that's a good thought; I've added a |
Wow, that's actually pretty cool and has the potential to be a game changer for application that do a lot of small conversions and use other libraries just to be light weight. @fmoralesc We might consider this as a way to access the AST using the new CommonMark parser with source positions to speed up syntax highlighting! (cf. vim-pandoc-syntax #300) |
@jgm That was quick, thank you! |
@jgm Could the |
No, I think it's better not to build into pandoc things that can easily be done with shell scripting. |
@jgm Performance, not convenience, would be the main draw. I did some measurements using a collection of about 100 Markdown files, totaling about 3000 lines of text. The following loop takes 3.5 seconds on my computer: for md in *.md; do pandoc --from gfm --to json "$md" >/dev/null; done By contrast, sending the same documents as pre-prepared JSON to the The slowdown could be eliminated without incurring the complexity of starting a HTTP server in the background, if the |
yes, that's why jgm kindly put together https://github.com/jgm/pandoc-server
that's just moving the complexity from one place to another though... |
For local use, a pipeline is less complex and more reliable than a web server. No temp files, no sockets, no HTTP. I can write my own |
I think the proposed feature adds too much complexity and confusion. |
Fair enough. I'm satisfied with this outcome, and will write a separate |
There is now a |
Excellent! |
I have large json files containing markdown snippets. I want to convert each snippet to LaTeX.
I could invoke pandoc in a loop to process them, but that is slow.
I could artificially concatenate the snippets into a markdown file and then convert, but would need to parse the output back to json.
Are there any alternatives? If not, I think this is a valuable feature to have.
The text was updated successfully, but these errors were encountered: