Home »
» Syntacticus: A treebank of early Indo-European languages
Syntacticus: A treebank of early Indo-European languages Syntacticus provides easy access to around a million morphosyntactically annotated sentences from a range of early Indo-European languages.
Syntacticus is an umbrella project for the PROIEL Treebank, the TOROT Treebank and the ISWOC Treebank, which all use the same annotation system and share similar linguistic priorities. In total, Syntacticus contains 80,138 sentences or 936,874 tokens in 10 languages.
We are constantly adding new material to Syntacticus. The ultimate goal is to have a representative sample of different text types from each branch of early Indo-European. We maintain lists of texts we are working on at the moment, which you can find on the PROIEL Treebank and the TOROT Treebank pages, but this is extremely time-consuming work so please be patient!
The focus for Syntacticus at the moment is to consolidate and edit our documentation so that it is easier to approach. We are very aware that the current documentation is inadequate! But new features and better integration with our development toolchain are also on the horizon in the near future.
| Language | Size |
| Ancient Greek | 250,449 tokens |
| Latin | 202,140 tokens |
| Classical Armenian | 23,513 tokens |
| Gothic | 57,211 tokens |
| Portuguese | 36,595 tokens |
| Spanish | 54,661 tokens |
| Old English | 29,406 tokens |
| Old French | 2,340 tokens |
| Old Russian | 209,334 tokens |
| Old Church Slavonic | 71,225 tokens |