Skip to content

Commit c964f05

Browse files
committed
initial commit with README and IDEAS
0 parents  commit c964f05

File tree

3 files changed

+167
-0
lines changed

3 files changed

+167
-0
lines changed

1990-v123n04.txt

+64
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
Title
2+
Volume 123, Number 04
3+
1990-10-23
4+
Subtitle
5+
6+
-ARTICLES-
7+
Article 1 Name
8+
This is the Second Article
9+
The Third Article is One Which has a Very Long Title: And a Subtitle to Boot
10+
11+
-CONTRIBUTORS-
12+
EDITOR IN CHIEF
13+
Editor InChief
14+
MANAGING EDITOR
15+
Manage N.G. Editor
16+
EXECUTIVE EDITOR
17+
Exec U. Tiv
18+
WRITERS
19+
Ernest Hemingway
20+
Charles Dickens
21+
Stephen King
22+
Jane Austin
23+
Ed Poe
24+
Mark Twain
25+
James Joyce
26+
NEWS EDITOR
27+
Edward R. Murrow
28+
NEWS WRITERS
29+
James Agee
30+
Christiane Amanpour
31+
James Baldwin
32+
ENTERTAINMENT EDITOR
33+
Alan Coren
34+
ENTERTAINMENT WRITERS
35+
Russel Baker
36+
SPORTS EDITOR
37+
John Anderson
38+
SPORTS WRITERS
39+
Matt Barrie
40+
Chris Berman
41+
Max Bretos
42+
Nicole Briscoe
43+
ILLUSTRATOR
44+
Herbert Block
45+
ART DIRECTOR
46+
Leonardo DaVinci
47+
PRODUCTION MANAGERS
48+
Pro Duct Manager
49+
PRODUCTION STAFF
50+
John Doe
51+
PHOTO EDITOR
52+
Jane Doe
53+
CHIEF PHOTOGRAPHER
54+
Robert Capa
55+
PHOTOGRAPHERS
56+
Ansel Adams
57+
Henri Cartier-Bresson
58+
Dorothea Lange
59+
ADVERTISING MANAGER
60+
Don Draper
61+
BUSINESS MANAGER
62+
Andrew Carnegie
63+
ADVISOR
64+
Benjamin Franklin

IDEAS.txt

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
2+
STORE COMMON NAMES
3+
-names are often shared among adjacent issues, and spelling mistakes
4+
can be an unfortunately common occurrance.
5+
-keep an unseen list of used names and raise a passive flag if it
6+
appears there is a spelling mistake or misprint.
7+
8+
ERROR LOGGING/FLAGGING
9+
-flag/hold files that look like they may have
10+
11+
MAKE IT A REST WEB SERVICE
12+
13+
-set it up so that you can send an HTTP GET request
14+
with the contents of one of the text files in the body
15+
and have it return the json object
16+
17+
-write script to automate sending each individual one so that
18+
no one has to touch the code at all
19+
20+
CONVERT TO INSTALLABLE PACKAGE
21+
22+
-convert to an installable package that you could install with
23+
'sudo apt-get install __' and point it at a repository
24+
containing all the .txt files, and the text config file locations
25+
26+
-move some of the config properties into bash flags and/or commands

README.md

+77
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
-- PURPOSE --
2+
3+
SHORT:
4+
Convert dublin core metadata stored in text files to a machine-readable
5+
JSON object to be used by other software.
6+
7+
LONG:
8+
Assist in cataloguing batches of similar or series-based items from a
9+
collection by:
10+
-decreasing the complexity of logging each individual item/issue
11+
-minimizing repetitive tying and template updating
12+
-combining the information that is shared across collections
13+
14+
-- USAGE INSTRUCTIONS --
15+
16+
1. CREATE TEXT FILE
17+
Create a text file of the basic metadata for each issue in the collection.
18+
2. EDIT CONFIG FILE/s
19+
Edit the settings to include any and all shared metadata applicable to all
20+
of the issues in the given batch (e.g language, publisher, etc.)
21+
3. RUN SCRIPT
22+
Run the script to create a JSON object from each of these text files.
23+
4. CHECK AND UTILIZE JSON OBJECTS
24+
Using the actual json objects is out of scope for this program.
25+
5. CLEAN UP OR REFERENCE TEXT FILES
26+
After completion of the above tasks, the text files can be discarded as
27+
irrelevant, or used as a quick reference to the metadata info along-
28+
side where you're storing the files themselves.
29+
30+
-- NOTES --
31+
32+
UPLOADING ORIGINAL FILES
33+
-If you'd like to upload the original files alongside cataloguing them,
34+
include the filename with its tag and ensure that they are located in
35+
the same folder as the digital item you are trying to upload.
36+
37+
COLLECTION/BATCH DETERMINATION
38+
-I've made how you determine how you batch the uploads entirely open to
39+
the user. For our purposes, I plan to use it according to the calendar
40+
year (~15-25 issues) but organization by volume or else would work too.
41+
42+
-- BACKGROUND --
43+
This program was written to assist with my work in cataloguing and uploading
44+
a collection of magazines into our DSpace database. I found working with
45+
the web interface we were using laborious and inefficient, and the OCR
46+
scan very helpful, though occasionally faulty.
47+
48+
Sticking to the plan, I had to input each name one-by-one, and choose
49+
whether I wanted to be constantly re-typing names, or using a template
50+
and removing ones that were not relevant issue to issue. Both options
51+
introduced the potential for a lot of errors, took a lot of time, and
52+
failed to leverage the OCR sufficiently.
53+
54+
My alternative protocol suggests that very basic data (e.g. contributors and
55+
article titles) be inputted into a text file according to some basic
56+
heading types in bulk (all issues in a given collection)
57+
58+
This basic script is meant as the first step in the move from an OCR-ed pdf
59+
into a metadata holding format usable by both
60+
61+
62+
-- RATIONALE --
63+
64+
JSON (and not MARC)
65+
-growing universal standard for encoded machine-readable data
66+
-easily transferable and lightweight
67+
-human (and non-professional) readability
68+
-integration with web (and micro-) services
69+
-easy to learn/intuit with little to no pre-requisite knowledge
70+
CONFIG: //TODO
71+
-portability/customization (many collection-specific tweaks)
72+
-ease of use (decrease pre-requisites for use)
73+
-speed (faster to tweak a config than to fork and edit code)
74+
75+
76+
-- LINKS --
77+
Dublin Core documentation: http://dublincore.org/documents/dces/

0 commit comments

Comments
 (0)