|
| 1 | +-- PURPOSE -- |
| 2 | + |
| 3 | +SHORT: |
| 4 | +Convert dublin core metadata stored in text files to a machine-readable |
| 5 | + JSON object to be used by other software. |
| 6 | + |
| 7 | +LONG: |
| 8 | +Assist in cataloguing batches of similar or series-based items from a |
| 9 | +collection by: |
| 10 | + -decreasing the complexity of logging each individual item/issue |
| 11 | + -minimizing repetitive tying and template updating |
| 12 | + -combining the information that is shared across collections |
| 13 | + |
| 14 | +-- USAGE INSTRUCTIONS -- |
| 15 | + |
| 16 | +1. CREATE TEXT FILE |
| 17 | +Create a text file of the basic metadata for each issue in the collection. |
| 18 | +2. EDIT CONFIG FILE/s |
| 19 | +Edit the settings to include any and all shared metadata applicable to all |
| 20 | + of the issues in the given batch (e.g language, publisher, etc.) |
| 21 | +3. RUN SCRIPT |
| 22 | +Run the script to create a JSON object from each of these text files. |
| 23 | +4. CHECK AND UTILIZE JSON OBJECTS |
| 24 | +Using the actual json objects is out of scope for this program. |
| 25 | +5. CLEAN UP OR REFERENCE TEXT FILES |
| 26 | +After completion of the above tasks, the text files can be discarded as |
| 27 | + irrelevant, or used as a quick reference to the metadata info along- |
| 28 | + side where you're storing the files themselves. |
| 29 | + |
| 30 | +-- NOTES -- |
| 31 | + |
| 32 | +UPLOADING ORIGINAL FILES |
| 33 | +-If you'd like to upload the original files alongside cataloguing them, |
| 34 | + include the filename with its tag and ensure that they are located in |
| 35 | + the same folder as the digital item you are trying to upload. |
| 36 | + |
| 37 | +COLLECTION/BATCH DETERMINATION |
| 38 | +-I've made how you determine how you batch the uploads entirely open to |
| 39 | + the user. For our purposes, I plan to use it according to the calendar |
| 40 | + year (~15-25 issues) but organization by volume or else would work too. |
| 41 | + |
| 42 | +-- BACKGROUND -- |
| 43 | +This program was written to assist with my work in cataloguing and uploading |
| 44 | + a collection of magazines into our DSpace database. I found working with |
| 45 | + the web interface we were using laborious and inefficient, and the OCR |
| 46 | + scan very helpful, though occasionally faulty. |
| 47 | + |
| 48 | +Sticking to the plan, I had to input each name one-by-one, and choose |
| 49 | + whether I wanted to be constantly re-typing names, or using a template |
| 50 | + and removing ones that were not relevant issue to issue. Both options |
| 51 | + introduced the potential for a lot of errors, took a lot of time, and |
| 52 | + failed to leverage the OCR sufficiently. |
| 53 | + |
| 54 | +My alternative protocol suggests that very basic data (e.g. contributors and |
| 55 | + article titles) be inputted into a text file according to some basic |
| 56 | + heading types in bulk (all issues in a given collection) |
| 57 | + |
| 58 | +This basic script is meant as the first step in the move from an OCR-ed pdf |
| 59 | + into a metadata holding format usable by both |
| 60 | + |
| 61 | + |
| 62 | +-- RATIONALE -- |
| 63 | + |
| 64 | +JSON (and not MARC) |
| 65 | + -growing universal standard for encoded machine-readable data |
| 66 | + -easily transferable and lightweight |
| 67 | + -human (and non-professional) readability |
| 68 | + -integration with web (and micro-) services |
| 69 | + -easy to learn/intuit with little to no pre-requisite knowledge |
| 70 | +CONFIG: //TODO |
| 71 | + -portability/customization (many collection-specific tweaks) |
| 72 | + -ease of use (decrease pre-requisites for use) |
| 73 | + -speed (faster to tweak a config than to fork and edit code) |
| 74 | + |
| 75 | + |
| 76 | + -- LINKS -- |
| 77 | +Dublin Core documentation: http://dublincore.org/documents/dces/ |
0 commit comments