Help for EOPAS: Frequently asked questions

How to cite a phrase, word or morpheme?

In the display page for a transcript, the URL of the current context is displayed at the top right. You can cut and paste this URL to directly link to the current context.
There are two types of context that are being tracked.

Phrase citation

The first is the context of the currently active phrase. Such a URL looks as follows:,305.008.
You will notice that there is a fragment on this URL that starts with "#t=". This means that we are looking at a time range on the video or audio file that is presented on this Web page. The time segment starts at 202.028 seconds and goes until 305.008 seconds.
As a video or audio element plays back and goes through the annotation phrases, the URL at the top right is adapted. You can pause the video or audio element and cut and paste the URL for citation purposes. Alternatively, you can also right click on the little black "play" button of a particular phrase and copy the link address.

Word/Morpheme citation

The second is the context of a single word or morpheme. This is also tightly linked to concordance searches which are activated by clicking on a word or morpheme and search that same entity in all documents in the collection.
As you click on a word or morpheme, a URL that looks as follows is displayed in the top right corner:!/p2/w5/m1.
The pattern of that URL is such that "#!" identifies this concordance linking, "/p" identifies the phrase by number, "/w" identifies the word within that phrase by position, "/m" identifies (where necessary) the morpheme within that word by position.

Where are the XML schemas of the formats?

Where are the XSL Transforms to convert between formats?

What format does a Toolbox input file have to be for import?

Some Toolbox files use camel-case on element and attribute names. Others come with a namespace of "tb:" on all the elements. These differences will be removed using a clean-up script called fixToolbox.xsl.
The following mapping of Toolbox elements to EOPAS is undertaken:

What format does a Transcriber input file have to be for import?

In EOPAS we don't know about different speakers, so speaker turns are removed. The EOPAS XML file format moves the speaker information on the phrase. Topic information and the sections are removed.
The following mapping of Transcriber elements to EOPAS is undertaken:

What format does an Elan input file have to be for import?

Elan allows a vast combination of tier types and the choice of tier names is up to the author. The EOPAS file format only supports a limited structure, so only three different types of Elan files are being supported.
Option 1: The following mapping of Elan elements with default-lt tiers is undertaken:
Option 2: The following mapping of Elan elements with utterance tiers is undertaken:
Option 3: The following mapping of Elan elements with ref tiers is undertaken:

What is involved in writing support for a new format?

EOPAS is based on the assumption that all import formats are provided in XML and thus have a XML schema. To support a new input format, one has to provide a XML Schema for that format and place it in the directory "public/SCHEMAS" on the server. Next one has to to create a XSL Transform that will convert that XML format to the EOPAS XML format. The XSLT is placed in the "public/XSLT" directory. Finally, one has to change the list of available input format in "app/models/transcript.rb" to make it an available format in the upload process.
For development of the XSLT, there are helper functions in the "bin" directory of the application for validation, transcoding and general running of a xsl transform. Further, one should add some example files into the "features/test_data" directory and add a test to "features/transcript.feature". Do not forget to finish it all off with documentation in "doc/TRANSCODING".