I’ve been interning for six months now at a library/archive in a museum. Since this was on the side, I haven’t felt right about publicizing it much until some of my contributions showed up online. However, this particular project was intriguing and got me digging through a 1915 directory.
My task is to match 1915 newspaper clippings about historic houses “in Bridgeport and the surrounding vicinity” to actual addresses. It’s surprising how many houses don’t even have a town mentioned. Instead you’d get vague descriptions like “down the street from Honeysuckle and up on Sunspot Hill.” My supervisor pulled the directory for me and showed me how to cross-reference streets to people’s names.
First, I’d skim the article to see if they laid it out: city or address. If not, I’d then jump to the end of the article where the author usually placed the name of the current owner. Then I’d leaf through the directory (always starting with Bridgeport then Stratford) to see if I could find that person’s name. If I’m lucky, there they are and the address vaguely relates in some way to the article.
Then there is the tricky one. This house belonged to Mr. Hatchback for 30 years, but he died last year. His estate sold the house to Mrs. Owens. I can find Mrs. Owens in the directory but does she live there? The directory is actually for 1914, so maybe not. Another one: the house was moved from place A to place B. Then another building was moved to the original location. Where is the house located now? Oh, right. It’s down by the river. Which river? No name.
Am I still stuck? Time to break out my smartphone. If I’m in luck, I’ll find it right away. Otherwise, I’m downloading PDFs of articles related to that area to scan for information. The State Librarian had a nice census done on buildings around that time period, so that’s somewhat helpful. Unfortunately, the addresses are not included so it’s just “Hudson House.” If I’m at this point, I then go to Google Maps and see if I can tell from the article photo if the contemporary building is the same one. Often times it is not, but a few times I’ve struck gold. Most of the houses are from around 1800, but some are rumored to be from closer to the 1600s.
Once I had an answer or my best guess, I’d write it on a piece of paper and lay it on top of the photocopy. If I felt unconfident, I’d mark that address with a question mark.
I finished with sorting the newspaper pages into a folder for each issue. The next step was to “bind” the pages together. Unfortunately, Adobe does not make this a simple task. Therefore the following are all the steps involved in getting the pages in the correct order in each issue:
- Look in the —- folder to see which was the last issue worked on
- Then go up a level and click on year 1925. Now select the folder that follows the last one completed.
- Click on the first pdf in that folder.
- Now go to Documents > Insert Pages > then navigate to this folder and select the second PDF > Click “Select”
- Just click “OK.” You just inserted the second page after the first one in the file.
- Click Documents > Insert Pages
- Now, to get the rest of the pages in the final PDF in correct order, you have to right click on the white background in the open dialog
- Select Arrange Icons > Name
- The files will now be in reverse order. Look at the two files on the bottom of the list. These two files have already been added to the PDF. So you want:
- Left-click on the third file from the bottom, then click and hold the shift key. Select the file at the very top of the list.
- Hit Select
- This time, you need to change the number next to the Page radio button. Change it from 1 to 2
- Click OK
- Scroll/click through the document while keeping an eye on the page numbers. It should be in numerical order now.
- Go back to the top of the page.
- Click File > Save As and give the “bound PDF” the name of whatever the issue date is in the format of YYYY.MM.DD
- Then make sure you save the file in the —— folder
- Repeat ad-nauseum
Between yesterday and today, I put together 184 issues, which is roughly half of this collection. My internship is now over, but I’ll resume work on this collection next week as a practicum student with my co-leader, another practicum student.
Today was a continuation of day 7. My boss solved the problem of combining multiple OCRed PDFs into one file which retains their OCR text. Tomorrow I’ll post about that process! I did find some interesting things in the newspaper articles. I sent a lot of them to my boss, but I tweeted about a few of my discoveries here:
- Nice: In May 1926, it was claimed that 40% of women college graduates were atheists.
- “The best looking, most polite, and well behaved people to be seen in Europe were Americans.” ~Leonard B. Hurley, 10/7/1926
- Scanning these 1925-27 newspapers, you can see the hope of no more war as well as the rise of American arrogance.
The best part of the day though was going out to lunch with my lovely boyfriend.
I got caught up in my work so I did not have time to write about what I did on days 5 and 6 last week. At this point, what I mostly remember is scanning in girls’ books from the late 1800s-1930s and then trying to color correct them. The digital projects lab has overhead florescent lighting with no windows. I tried my best, but was only able to get through about seven books because I simply could not get the scanned image and the actual book cover to match.
This week my supervisor is on vacation, so I am trying to complete some tasks he left for me. However, I had to leave early to go import the commencement video I shot last Thursday. Below is a description of the task and what I’ve done so far to combat it. Suggestions are most welcomed!
About 11 years of student newspapers were microfilmed and then recently digitized. The company we outsourced the digitization to, made every page an individual PDF. In addition, every page has a JPEG2000 image, a tif image, and a txt file. I need to rename these 2000+ files. I will then have to combine the pages of each issue.
- Copy the PDFs into another folder I created which I’ll be doing my work within in order to preserve the original master copies.
- Open Adobe Bridge in order to view the PDFs so I can see where one issue ends and where another begins.
- In my folder, I create a folder for the year and then within it a folder for the day that newspaper was printed in yyyy-mm-dd format.
- So my third window (Adobe Bridge, the yyyy-mm-dd folder, and this one) that I open is the master copies.
- First, I double click on the front page of a newspaper in order to find out the date.
- Second, I create a folder in my working copy folder with that date.
- Third, I then highlight all the file types (PDF, JPEG2000, TIF, TXT) for those pages and copy them.
- Fourth, I then paste these files into my working copy folder.
Today I completed the task from last week. Since the work did not require much thought, I was able to maintain a lovely conversation with another woman in the room. My supervisor was at a meeting but came back just as she left and I finished my work.
I was then told about the project I’ll be co-leading with a UNC practicum student. He let me get a headstart on it today by studying the metadata from other digital projects about student newspapers. I researched the problem by first googling “digitalized student newspapers” but then realized Penn State had a list of historical newspaper collections which I then checked. Next I did a quick keyword search in CONTENTdm’s Collection of Collections to look for more repositories. My discoveries are as follows:
- Most university digital projects do not bother to write much metadata for their student newspaper collections.
- The software used is: CONTENTdm, Olive Software (OCLC), 1 dspace site, and many used one that starts with an “a” (which was not easy to navigate)
- Some digital projects did not even bother to write any metadata and just uploaded the newspapers as PDFs that had not been OCRed.
- Carroll University had the most extensive metadata which included the editor, staff writers, photographers, and article topics.
My work today was pretty much a mirror image of yesterday’s work. I finished the Alumnae Magazine entries and since my supervisor was at a meeting, I started in on the student magazine. When my supervisor got back, he informed me that someone had already started working on that project, so I had to delete about 100 folders (4 per year) of content.
I look forward to Monday though because we’re having a meeting about the big project I’ll be working on with another student!
Today was a bit of a mix thanks to my supervisor having other things to tend to and tours being led through the digital projects lab. When things settled down, my supervisor asked me for my thoughts on what metadata fields would make this one collection more useful to student researchers. I suggested:
- Short description of content for each book.
- OCR text to be added so it could be searched.
- A better design layout since the giant “Click folder for access” image which leads the student off-site is off-putting.
We then discussed another project which he wants me to help lead on how to handle it. That makes me nervous but the atmosphere in the lab leaves me with a very warm feeling. David is very supportive of his interns and student workers. We can chat a bit, make each other laugh, and never are we made to feel ashamed to ask questions. It is the sort of environment that I hope to foster in my own digital projects lab someday.
Next, he gave me some “grunt work” (which I don’t mind doing since I tend to be meticulous!) of downloading the OCR text and jpegs that the Internet Archives produced for us. I completed 55 years today and nearly each year had four folders within it. The work as follows:
- Right click on the magazine link and open it in a new tab.
- Click on “Access this file” to check to make sure that the year in the title matched the actual files.
- Click on the full URL of the object, then on a link that takes you to a list of all files that make up the object.
- In the download folder, make a folder for each year and then a folder for each object.
- Most magazines were by month, so to make it easy to organize them, I used a two digit code which corresponds to the month’s order. For example, October is 10 so the folder name would be “10 October.”
- Right click and save the OCR text and the zipped images to its specific folder.
I did quality control on 2610 entries today. That’s not bad for a first day!
The Digital Projects lab has been working on a women veterans project for years. My job yesterday was to go through thousands of entries and edit the titles to conform to a set pattern. The main challenges were that I am unfamiliar with World War II or military terminology, so I was not always sure if something was a proper noun or not. I made a mistake at one point and then had to resort the data in order to find the entries that I needed to correct. However, that process only took half an hour.
What’s the high tech way of flattening old paper?
First you take a trash can and put an upside down flower pot in it. Then pour a gallon or two of hot warm around it. Next you take two square pieces of foam board and put them on top of the upside trash can so that all 8 vertexes are not aligned up. Finally, you put the wrinkled, curled, warped paper on top of the foam board and then put the lid of the trash can on. Leave this for 24 hours. The next day, you carefully place your now damp paper between some heavy sheets of paper then place heavy books on top of that. Leave this for a week!
How about describing a collection?
You’re given an archival file box with a bunch of unmarked folders in it and eighteen documents inside. You look through the study aid book to get an idea of how to write up a new temporary study inventory. Carefully, you go through the documents and describe them as correspondents, newspaper, reports, printed material, etc. Then you decide the inclusive and bulk dates of the materials. Inclusive is the earliest to the newest while bulk is the primary years the documents were created between. Next you write up a short description of what these documents are about and then you decide on a filing system and then describe what is going in which folder under what heading (e.g. correspondents, printed material, etc.).
Then as you are filling out the folder labels, you find some more documents between the pages of a book. You go back to square one. As you read the obituary of the person whose documents you have, you realize that the stuff between the pages of the book probably belonged to their younger brother. Cue the mystery springing to life.
Continue reading →
While working at the digitalization internship today, I was describing the contents of an 1893 literary society group at the State Normal and Industrial School (what would later become UNCG). These girls, 117 years ago, had a initiative ritual which involves branding a cow in the face with the letter “C” while chanting in Latin.
I kid you not.
Other highlights from their constitution involved fining girls for speaking or writing out of turn and making them sign the constitution while swearing oaths of loyalty and secrecy. The log of their minutes reveals that they were in a heated fight with the other literary society. First, they tried to suck up to the school’s president by naming their society after him. He thought it was nice at first and then got crept out and made them change it. Second, members were expected to stalk and harass new students and members of the faculty into joining their group and not the other one. In fact, they petitioned the school to implement a strict rule that students couldn’t change from one literary group to another. Once they had made their choice, that was it.
Once the constitution and minutes are posted online I’ll add some links to this post. In the meantime, check out the Beyond Books and Buildings. These are the plain, boring version which I’ve converted to PDFs (some OCRed) and am now writing metadata about them in CONTENTdm. Ah, that reminds me, the other programs I’m using in conjugation with CONTENTdm are Photoshop and Adobe Bridge. Previously, I had only ever heard of Photoshop, so it feels pretty good to be learning some other programs to use with my work this summer.