Digitization Internship Day 9 & 10

I finished with sorting the newspaper pages into a folder for each issue. The next step was to “bind” the pages together. Unfortunately, Adobe does not make this a simple task. Therefore the following are all the steps involved in getting the pages in the correct order in each issue:

  1. Look in the —- folder to see which was the last issue worked on
  2. Then go up a level and click on year 1925. Now select the folder that follows the last one completed.
  3. Click on the first pdf in that folder.
  4. Now go to Documents > Insert Pages > then navigate to this folder and select the second PDF > Click “Select”
  5. Just click “OK.” You just inserted the second page after the first one in the file.
  6. Click Documents > Insert Pages
  7. Now, to get the rest of the pages in the final PDF in correct order, you have to right click on the white background in the open dialog
  8. Select Arrange Icons > Name
  9. The files will now be in reverse order. Look at the two files on the bottom of the list. These two files have already been added to the PDF. So you want:
  10. Left-click on the third file from the bottom, then click and hold the shift key. Select the file at the very top of the list.
  11. Hit Select
  12. This time, you need to change the number next to the Page radio button. Change it from 1 to 2
  13. Click OK
  14. Scroll/click through the document while keeping an eye on the page numbers. It should be in numerical order now.
  15. Go back to the top of the page.
  16. Click File > Save As and give the “bound PDF” the name of whatever the issue date is in the format of YYYY.MM.DD
  17. Then make sure you save the file in the —— folder
  18. Repeat ad-nauseum

Between yesterday and today, I put together 184 issues, which is roughly half of this collection. My internship is now over, but I’ll resume work on this collection next week as a practicum student with my co-leader, another practicum student.

Digitization Internship Day 8

Today was a continuation of day 7. My boss solved the problem of combining multiple OCRed PDFs into one file which retains their OCR text. Tomorrow I’ll post about that process! I did find some interesting things in the newspaper articles. I sent a lot of them to my boss, but I tweeted about a few of my discoveries here:

  • Nice: In May 1926, it was claimed that 40% of women college graduates were atheists.
  • “The best looking, most polite, and well behaved people to be seen in Europe were Americans.” ~Leonard B. Hurley, 10/7/1926
  • Scanning these 1925-27 newspapers, you can see the hope of no more war as well as the rise of American arrogance.

The best part of the day though was going out to lunch with my lovely boyfriend.

Digitization Internship II Day 7

I got caught up in my work so I did not have time to write about what I did on days 5 and 6 last week. At this point, what I mostly remember is scanning in girls’ books from the late 1800s-1930s and then trying to color correct them. The digital projects lab has overhead florescent lighting with no windows. I tried my best, but was only able to get through about seven books because I simply could not get the scanned image and the actual book cover to match.

This week my supervisor is on vacation, so I am trying to complete some tasks he left for me. However, I had to leave early to go import the commencement video I shot last Thursday. Below is a description of the task and what I’ve done so far to combat it. Suggestions are most welcomed!

About 11 years of student newspapers were microfilmed and then recently digitized. The company we outsourced the digitization to, made every page an individual PDF. In addition, every page has a JPEG2000 image, a tif image, and a txt file. I need to rename these 2000+ files. I will then have to combine the pages of each issue.

My efforts:

  • Copy the PDFs into another folder I created which I’ll be doing my work within in order to preserve the original master copies.
  • Open Adobe Bridge in order to view the PDFs so I can see where one issue ends and where another begins.
  • In my folder, I create a folder for the year and then within it a folder for the day that newspaper was printed in yyyy-mm-dd format.
  • So my third window (Adobe Bridge, the yyyy-mm-dd folder, and this one) that I open is the master copies.
  • First, I double click on the front page of a newspaper in order to find out the date.
  • Second, I create a folder in my working copy folder with that date.
  • Third, I then highlight all the file types (PDF, JPEG2000, TIF, TXT) for those pages and copy them.
  • Fourth, I then paste these files into my working copy folder.
  • Repeat.

Digitization Internship II Day 4

Today I completed the task from last week. Since the work did not require much thought, I was able to maintain a lovely conversation with another woman in the room. My supervisor was at a meeting but came back just as she left and I finished my work.

I was then told about the project I’ll be co-leading with a UNC practicum student. He let me get a headstart on it today by studying the metadata from other digital projects about student newspapers. I researched the problem by first googling “digitalized student newspapers” but then realized Penn State had a list of historical newspaper collections which I then checked. Next I did a quick keyword search in CONTENTdm’s Collection of Collections to look for more repositories. My discoveries are as follows:

  • Most university digital projects do not bother to write much metadata for their student newspaper collections.
  • The software used is: CONTENTdm, Olive Software (OCLC), 1 dspace site, and many used one that starts with an “a” (which was not easy to navigate)
  • Some digital projects did not even bother to write any metadata and just uploaded the newspapers as PDFs that had not been OCRed.
  • Carroll University had the most extensive metadata which included the editor, staff writers, photographers, and article topics.

Digitization Internship II Day 3

My work today was pretty much a mirror image of yesterday’s work. I finished the Alumnae Magazine entries and since my supervisor was at a meeting, I started in on the student magazine. When my supervisor got back, he informed me that someone had already started working on that project, so I had to delete about 100 folders (4 per year) of content.

I look forward to Monday though because we’re having a meeting about the big project I’ll be working on with another student!

Digitization Internship II Day 2

Today was a bit of a mix thanks to my supervisor having other things to tend to and tours being led through the digital projects lab. When things settled down, my supervisor asked me for my thoughts on what metadata fields would make this one collection more useful to student researchers. I suggested:

  • Short description of content for each book.
  • OCR text to be added so it could be searched.
  • A better design layout since the giant “Click folder for access” image which leads the student off-site is off-putting.

We then discussed another project which he wants me to help lead on how to handle it. That makes me nervous but the atmosphere in the lab leaves me with a very warm feeling. David is very supportive of his interns and student workers. We can chat a bit, make each other laugh, and never are we made to feel ashamed to ask questions. It is the sort of environment that I hope to foster in my own digital projects lab someday.

Next, he gave me some “grunt work” (which I don’t mind doing since I tend to be meticulous!) of downloading the OCR text and jpegs that the Internet Archives produced for us. I completed 55 years today and nearly each year had four folders within it. The work as follows:

  • Right click on the magazine link and open it in a new tab.
  • Click on “Access this file” to check to make sure that the year in the title matched the actual files.
  • Click on the full URL of the object, then on a link that takes you to a list of all files that make up the object.
  • In the download folder, make a folder for each year and then a folder for each object.
  • Most magazines were by month, so to make it easy to organize them, I used a two digit code which corresponds to the month’s order. For example, October is 10 so the folder name would be “10 October.”
  • Right click and save the OCR text and the zipped images to its specific folder.
  • Repeat.

Digitization Internship II Day 1

I did quality control on 2610 entries today. That’s not bad for a first day!

The Digital Projects lab has been working on a women veterans project for years. My job yesterday was to go through thousands of entries and edit the titles to conform to a set pattern. The main challenges were that I am unfamiliar with World War II or military terminology, so I was not always sure if something was a proper noun or not. I made a mistake at one point and then had to resort the data in order to find the entries that I needed to correct. However, that process only took half an hour.

In praise of an offline life

From a post I left in response to a classmate on our class’ Blackboard discussion board:

I realized that I’m suffering musical ADHD from having such easy access to music online (thanks, Pandora and Youtube!). My parents bought me my first tape player in 1996 and gave me only one tape (Deanna Carter’s first album–which is amazing!). I nearly broke it from flipping it over and over, for hours, days, months on end. I loved sticking my finger into the little jagged hole and winding the tape back. It was an experience all of it’s own. Tangible, real.

Now, I have access to an unimaginable array of music for every mood, for every earworm, so much that I’ll never make my way through every single by every artist I’m interested in. And you know what? I’m not any happier for it.

In regards to paintings, Mona Lisa is a great example especially when you realize that the painting is TINY! People assume when they see it on TV that it’s like….3×5″ or something. And you can’t replace the feeling of “holy crap” when you go to the Smithsonian and you see these paintings which are wider and taller than your house. You stare in slack jaw wonder as you try to figure out how the painter did them and even more puzzling, how they were moved around!

You lose these very essential parts of experiencing the world when you do it simply online. It’s no more being a “real version” than looking at someone else’s vacation photos and trying to pretend that you were there and these photos are your memories.

Objects are more than just the image or the content that can be shared online. It is also the experience.