-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easiest way to get form fields from a pdf #497
Comments
I'm confident that pdf-reader can deserialize the data you're after, but unfortunately I'm not personally very familiar with PDF forms or how the fields or data are stored. The PDF spec says the optional
Maybe filtering the I see there's also an Something like this: diff --git a/lib/pdf/reader.rb b/lib/pdf/reader.rb
index 22aea3d..8c3266b 100644
--- a/lib/pdf/reader.rb
+++ b/lib/pdf/reader.rb
@@ -142,6 +142,12 @@ def metadata
end
end
+ # Return a Hash with interactive form details from this file. Not always present
+ #
+ def acroform
+ @objects.deref_hash(root[:AcroForm])
+ end Would allow: PDF::Reader.open("somefile.pdf") do |pdf|
puts pdf.acroform
end |
Gotcha, filtering to look at only the
On one pdf, I was able to just look at the |
`
` |
Get all fields from a file using the low level API: fields_from_pdf_form = PDF::Reader.new(file).pages.map do |page|
page.objects.deref!(page.attributes[:Annots])&.pluck(:T)
end.flatten.compact_blank UPDATE not all fields: skips radio button groups. But they are there inside the Annots. Need to find a way to collect these. |
Hello everybody, I came up with this script to extract acrofields:
This seems to work. I thought it might be useful for you as well. Keep up the good work everybody :-) |
I'm trying to parse a standard documents like w9 forms (https://www.irs.gov/pub/irs-pdf/fw9.pdf). I want to parse out the name which is the first form fields that is inputted by someone. Whats the easiest way to do that?
I've tried doing:
When I take a look at
result
for a bunch of different w9s that have been filled it, there doesn't seem to be a single structure in theresult
variable that I can use to figure out the name. I know name is always going to be the first form field, is there an easy way to search for that?The text was updated successfully, but these errors were encountered: