Loading…
Attending this event?
TPRC 2024 in beautiful Las Vegas, Nevada! June 25-27th.
Wednesday June 26, 2024 10:30am - 11:20am PDT
This paper presents the tree editor TrEd and related tools that can be used to create, modify, browse, and search treebanks - large language corpora annotated with syntactic and/or semantic structure information. This might include not only phrase structure or dependencies, but also co-reference, discourse analysis, and even inter-sentence relations.

The project started in the year 2000, and it has been in continuous use since then at various institutions all over the world. Most of the tools are written in Perl, which makes them available to all major operating systems.

As a by-product of the treebank creation, various aspects of the annotation process have been studied, e.g. inter-annotator agreement, or the influence of automatic pre-annotation on the speed and accuracy of the annotation.

For searching the treebanks, a query language was developed that describes sets of tree nodes and the relations between them. It also supports aggregation to produce quantitative outputs. There are two different implementations: one translates the queries into SQL statements, and the other searches the data directly in the editor.

Originally, TrEd supported the PML data format used for the Prague Dependency Treebank. To process data in a different format, one first needed to convert the data into the PML format, and then possibly convert the modified data back to the initial format. Later, a versatile extension system was added to TrEd which made it possible to support other data formats directly.

We will show how this works on the example of Universal Dependencies (UD), which is a framework for grammar annotation across different human languages. It has over 500 contributors, who have so far produced more than 200 treebanks in over 100 languages. This extension allows TrEd (and some other tools) to open the files in the original UD format natively, building the internal representation on the fly, and also serialize them back after editing.
Speakers
avatar for Jan Štěpánek

Jan Štěpánek

Researcher, Charles University
Wednesday June 26, 2024 10:30am - 11:20am PDT
3: Apollo 1-2
Log in to leave feedback.

Attendees (5)


Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link