Just a few tricks I picked up while editing a large-ish (56,095-line) EAD file in Notepad ++. Wanted to pass along in case it’s helpful to anyone.
Personal name notes were added to the file level in a finding aid. Near the end of the project, it was determined that these would do better being somewhere else (for reasons we won’t get into here).
Removing these names through Archivists’ Toolkit would be a tedious job for our processors. It involves activating each file-level record, switching to the ‘Names & Subjects’ tab, deleting the name and saving the record. And, of course, AT would probably freeze a few times during the process, making sure everyone’s good and miserable.
I wanted to see if I could help.
Removing text within tags
Exporting the EAD from AT, I open it in Notepad++. Following the helpful answers from this Stacks Overflow question, I use
to get the personal names, replace it blank, and
to find all corporate names, replacing them again with nothing.
Because I only want to do that for the lower-levels of the finding aid, I highlight only content within the <dsc></dsc> tags. Find the opening <dsc>. Start the highlight (Edit -> begin/end Select). Find the </dsc>. End the highlight (Edit -> begin/end highlight). Ensure the “In Selection” option is checked in the replace box, and the above steps will leave the collection-level access control names where they are.
I don’t, however, want to just through away all the names (even though I’m not sure I want them all at the collection level). To save them, I step away from Notepad ++ and go back to my browser.
Following this tutorial from a few years ago, I used the Scraper Chrome extension to select an added name. I modified the XPath as needed (see screenshot), and exported the full list of names (1,963) to a Google Spreadsheet.
I then delete the collection from Archivists’ Toolkit and import the modified EAD.
This 10-15 minute project saved a lot of work and hassle for the processors. We’re looking into ArchivesSpace, and are likely to start using it in production in the near(ish) future. Until then (at least) it’s often handy to have non-Archivists’ Toolkit ways of manipulating EAD data quickly.