I think the toughest thing about building a corpus is the fact that if you are working alone, you are the only one who has to input the data. Actually this stage takes the most time for what I want to do.
For the project that I presented at the Temple University Japan Colloquium 2009
, the whole process involves entering data by hand, typing each essay one by one. Luckily junior high L2 writing is short, so what I do to alleviate having to type a lot of text is to look for any consistent phrases that can be seen in the essays and write them on a separate word document. Then I hit Ctrl+C(hitting C twice) to open the clipboard and save the common phrases. I think this depends on the similarity of texts. When I have to enter any new essays, I lay down the copied texts with the annotation information. Then I either add or subtract phrases according to what was written. This way the labor of inputting data is relieved somewhat and my arms can take a break.
If anyone knows of a better way to do this than the above method on Word or maybe Excel, I would be happy to hear from you.