-
-
Notifications
You must be signed in to change notification settings - Fork 222
htmlToText/convert functions much slower than fromString #265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for this report. This is the first time I've got feedback on package performance. It has nothing to do with the function name. Had I preserved Could you please perform your measurements with versions Please use empty options or simple |
I have made now the tests. We use this library when exporting data to excel, so I have a sample of 12120 strings that get processed. Please find below the results: Note: I used on { wordwrap: false } as you suggested |
Thank you. Could you please repeat the test for versions const compiledConvert = compile({ wordwrap: false });
for (const sample of samples) {
const stringData = compiledConvert(sample.htmlData);
// ...
} instead of for (const sample of samples) {
const stringData = convert(sample.htmlData, { wordwrap: false });
// ...
} |
It works now, thank you :) I will close this issue now |
Awesome. I'm glad to hear it works so well after all. After introducing selectors, Because it got so much smarter compared to version 5.1.1, getting this close in performance is a really good achievement, I think. Thank you for working with me on this question, now I have these valuable measurements from a real word use case. |
Updating from 5.1.1 to 9.0.0 makes the html to text conversion 4 times slower.
In 5.1.1 was using fromString function which is now deprecated.
In 9.0.0 have tried with both convert and htmlToText functions and they are both 4 times slower than previous fromString method.
The method call from code is:
data = htmlToText.fromString(data, {
wordwrap: false,
noLinkBrackets: true,
preserveNewlines: true,
unorderedListItemPrefix: "*",
format: {
anchor: function (elem, fn, options) {
var h = fn(elem.children, options);
return "[" + h + "](" + elem.attribs.href + ")";
},
unorderedList: function (elem, fn, options) {
return formatUnorderedList(elem, fn, options, false);
},
text: function (elem, options) {
if (elem.next && elem.data) {
//Return the element that has an email, and add a special string to avoid removing the < >
return elem.next.name && elem.next.name.indexOf("@") !== -1 ?
elem.data + " <" + elem.next.name + DNR + ">" :
elem.data;
} else {
return elem.data;
The text was updated successfully, but these errors were encountered: