At least twice in the last month or so, I have found myself transcribing an object that contains writing in a language other than English. Both times I was told that the best way to find out how to handle the foreign language text would be to find an earlier instance of an object with such text on it and look at the BAD file for that object. Laocoön has become the go-to source when I go looking for a precedent for transcription of foreign language text.


As can be seen in the image above, Laocoön contains just about everything that has ever existed, including writing in several different languages.


So according to this precedent the process goes something like this: 1) Transcribe the foreign text as it is, presumably copying and pasting from a site that allows you to type non-Latin letters if the language uses a non-Latin alphabet. 2) Within the <l></l> in which the text appears, add a <note>. 3) Within the <note></note> add <foreign lang=”[name of language]”></foreign>. Within the <foreign></foreign> tags provide the following information:

(a) What the foreign word or words translate to in English
(b) The direction the language reads if not apparent (ex. Right to left, left to right, etc)
(c) Any notable information about case, size, shape and orientation of any of the individual letters
(d) Technical instructions for displaying the letters on your browser (if the language is a non-Latin alphabet language)
(e) Any other information a viewer of the object in question may need or want to know

With the frequency with which foreign language text seems to appear in some objects, I’m wondering why it continues to be done simply by precedent. I’m inclined to agree with Laura Whitebell that we ought to add <foreign> to the tagset!

A topic that was broached at the BAND meeting this week was the eventual updating of our tagset to accommodate changes we’ve made to the way we transcribe since the one currently up on the WIP site was composed. It will be a long term project for one or two people, but if we’re going to start with compiling a list of tags that should be added when this happens, I think <foreign> should be among them.

Until the tag set is actually updated, I think this blog entry could be a good place for people to go to for direction on how to handle foreign language text, considering it’s a lot quicker to access the blog than it is to dig up the Laocoön BAD file.