You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not sure if this is a problem with the pdf itself, but it seems like when mapping the mean_character_width from @runs in initialize of lib/pdf/reader/page_layout.rb that the width on some runs is extremely low(less than 1e-15) and getting the median from those results returns an abnormal number for the number of columns.
I've made a workaround in this fork and it now works for those PDFs: kodius@6b232e9
The PDF that is causing these issues for me is this one: dorset.pdf
Specifically pages 31 and 39-50, so those that are mostly blank or contain images.
It should break at page 31 with no error message given, when debugged deeply it actually fails to allocate memory because it does the following in to_s(same as .text method) of page_layout.rb and the col_count is simply too high.
page=row_count.times.map{ |i| " " * col_count}
The text was updated successfully, but these errors were encountered:
I am not sure if this is a problem with the pdf itself, but it seems like when mapping the
mean_character_width
from@runs
in initialize of lib/pdf/reader/page_layout.rb that the width on some runs is extremely low(less than 1e-15) and getting the median from those results returns an abnormal number for the number of columns.I've made a workaround in this fork and it now works for those PDFs:
kodius@6b232e9
The PDF that is causing these issues for me is this one:
dorset.pdf
Specifically pages 31 and 39-50, so those that are mostly blank or contain images.
This is how to reproduce it:
It should break at page 31 with no error message given, when debugged deeply it actually fails to allocate memory because it does the following in
to_s
(same as .text method) ofpage_layout.rb
and the col_count is simply too high.The text was updated successfully, but these errors were encountered: