Skip to content

A simple routine to download and parse all PDF attachments in a Zotero library.

License

Notifications You must be signed in to change notification settings

nbreznau/zotero_library_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zotero Library Parser

A tool to import all saved PDF files from a cloud-synced Zotero library, then auto parse them with Python and save as utf8 .txt files plus a .csv with meta-data.

It uses (pyzotero) and (pdfminer) to import and parse the pdfs.

The parsed texts are saved in the folder "texts" (create it if you don't have it). They are named after their Zotero IDs, which is also a variable in the meta-data. The meta-data contains all entries in the Zotero library but the folder "texts" only contains those entires that have a PDF attachment that is longer than 200 characters and did not have an exit exception during the parsing.

Background

This parser is a simple derivative of coding work done as part of the (bibfilter app), developed in the German Science Foundation funded project "The Reciprocal Relationship of Public Opinion and Social Policy" (BR 5423/2-1).

Its intended application is to parse files for the (Replication Wiki) to help train a machine to update the RepWiki and to use for teaching purposes for students learning to do natural language processing.

User Notes

The coding here is basic. The file get_pdfs_zot.py can be run after installing the necessary packages in Python - these dependencies are listed in commented out pip install command lines at the top of the file. For this to run properly the user needs to create a file 'keys.py'. This is where the Zotero group library name and API key are stored, in addition to sub-folder names if desired. There is a 'key_example.py' file to demonstrate how to create the 'keys.py' file.

Debugging

Ensure that items in Zotero are not themselves PDFs and have all necessary information in them.

Depending on the way you put studies into Zotero, the sub-collection variables may not fill in properly. Sometimes a sub-sub-collection appears, but not he higher level sub-collection. If this is the case, you may have to add sub-sub-collection items to the next level up sub-collection, even if these appear in your Zotero.

About

A simple routine to download and parse all PDF attachments in a Zotero library.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages