Text manipulation to find all the citations in a document

**Update. There is an easier way to do this here.

Thankful to have learned how to do text manipulation in my digital humanities course. I lost track of all my citations in my thesis so wrote a little algorithm to export all of them (and the few characters before the citation) into a new document. Nerd skills FTW!

There’s probably a better way to do this. But a quick and dirty trick was all I needed just to make sure I had my citations.

1) First, copy your document and paste it in a text editor (not word). Save the file as “data.txt”
2) create a new file called output.txt and paste it into the
2) In data.txt, find and replace all of the characters ( with <citation> 
3) In data.txt, find and replace all of the character ) with </citation> 
4) Open terminal in mac
5) paste this line into the terminal (don’t hit enter)

grep -E -o ".{0,10}<citation>.{0,50}"

6) once you have pasted that line in, drag and drop data.txt from its folder to the end of the line. (don’t hit enter yet)

it should now look something like this

grep -E -o ".{0,10}<citation>.{0,50}" …/…/data.txt

7) add a > sign at the end of that line, and drag and drop output.txt from its folder to the end of the line. it should now look something like this

grep -E -o ".{0,10}<citation>.{0,50}" …/…/data.txt > …/…/output.txt

8) Hit enter.

9) Now, open output.txt. It should now have a bunch of citations inside of the file. It will require some cleaning up because:
a) What happened was that it grabbed the first 10 characters before the citation, and the 50 characters that followed. This is so that it can grab situations where you wrote the authors name, followed by the year in circle brackets (for example: Hinton (2003)).
b) Sometimes, if you have quoted several people in a row (for ex: (Czaykowska-Higgins 2009; Shulist 2013; Yamada 2014)) it will not grab the entire line from the citation. Go back to your original file and do a search of your document to find that line
c) obviously, it will pick up more than just citations. but rest assured that anything that precedes a ‘(‘ is in the output.txt document

One Comment Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s