The Prague Dependency Treebank 3.5 is a 2018 edition of the core Prague Dependency Treebank (PDT). It contains all PDT annotation made at the Institute of Formal and Applied Linguistics under various projects between 1996 and 2018 on the original texts. There are other members of the "family" of the Prague Dependency Treebanks, available separately and described elsewhere; search for "Prague Dependency Treebank" in the LINDAT/CLARIN repository.
The Prague Dependency Treebank 3.5 contains the same texts as the previous versions since 2.0; there are 49,431 annotated sentences (over 800 thousand nodes) on all layers, from tectogrammatical to words, and additional sentences on the analytical (surface dependency syntax) and morphological layers of annotation (approx. 1.8 million words in total).
For more information about this version of the treebank, see below for a changelog and a download link and the menu tabs above for the description of data, available documentation, credits and support acknowledgements.
Sarančata jsou doposud ve stadiu larev a pohybují se pouze lezením. V tomto období je účinné bojovat proti nim chemickými postřiky, ale dožívající družstva ani soukromí rolníci nemají na jejich nákup potřebné prostředky.
Example sentences from PDT 3.5, with tectogrammatical annotation including coreference links (blue and brown arrows), MWEs (red stripes) and discourse annotation (orange arrows and attributes/lables). Lit.: Grasshoppers are still in the larvae stadium, crawling only. At this time of the year, it is efficient to fight them using chemicals, but neither the ailing cooperatives nor private farmers can afford them.
The first version of PDT has been published at LDC in 2001. Since then, various branches of PDT have been developed, adding more annotation. Most importantly, the PDT 2.0 added the tectogrammatical layer, which distinguishes the PDT family of treebanks from most other dependency treebanks available. As of January 2018, PDT 3.5 is the current version encompassing all previous versions, corrections and additional annotation. The history of the PDT editions is briefly listed below.
To download the data, please visit the PDT 3.5 item in the LINDAT/CLARIN repository.
To search the treebank please use the PML-TQ (PML Tree Query) service at LINDAT/CLARIN. Please note this leads to search in PDT 3.0, but except for the discourse annotation added later in PDiT 2.0, the data are identical. (PDT 3.5 in PML-TQ is coming soon.)
To properly acknowledge this resource, please cite the following data item in the LINDAT/CLARIN repository:
For LREC papers (separate language resources references):
@languageresource{lrPDT35,
title = {Prague Dependency Treebank 3.5},
author = {Haji\v{c}, Jan and Bej\v{c}ek, Eduard and B\'{e}mov\'{a}, Alevtina
and Bur\'{a}\v{n}ov\'{a}, Eva and Haji\v{c}ov\'{a}, Eva and Havelka, Ji\v{r}\'{\i}
and Homola, Petr and K\'{a}rn\'{\i}k, Ji\v{r}\'{\i} and Kettnerov\'{a}, V\'{a}clava
and Klyueva, Natalia and Kol\'{a}\v{r}ov\'{a}, Veronika and Ku\v{c}ov\'{a}, Lucie
and Lopatkov\'{a}, Mark\'{e}ta and Mikulov\'{a}, Marie and M\'{\i}rovský, Ji\v{r}\'{\i}
and Nedoluzhko, Anna and Pajas, Petr and Panevov\'{a}, Jarmila
and Pol\'{a}kov\'{a}, Lucie and Rysov\'{a}, Magdal\'{e}na and Sgall, Petr
and Spoustov\'{a}, Johanka and Stra\v{n}\'{a}k, Pavel and Synkov\'{a}, Pavl\'{\i}na
and Šev\v{c}\'{\i}kov\'{a}, Magda and Štěp\'{a}nek, Jan and Urešov\'{a}, Zde\v{n}ka
and Vidov\'{a} Hladk\'{a}, Barbora and Zeman, Daniel and Zik\'{a}nov\'{a}, {\v{S}}\'{a}rka
and {\v{Z}}abokrtsk\'{y}, Zden\v{e}k},
url = {http://hdl.handle.net/11234/1-2621},
publisher={Institute of Formal and Applied Linguistics, LINDAT/CLARIN, Charles University},
address={Prague, Czech Republic},
lindat={http://hdl.handle.net/11234/1-2621},
year = {2018} }
For general papers and citations:
@misc{11234/1-2621,
title = {Prague Dependency Treebank 3.5},
author = {Haji\v{c}, Jan and Bej\v{c}ek, Eduard and B\'{e}mov\'{a}, Alevtina
and Bur\'{a}\v{n}ov\'{a}, Eva and Haji\v{c}ov\'{a}, Eva and Havelka, Ji\v{r}\'{\i}
and Homola, Petr and K\'{a}rn\'{\i}k, Ji\v{r}\'{\i} and Kettnerov\'{a}, V\'{a}clava
and Klyueva, Natalia and Kol\'{a}\v{r}ov\'{a}, Veronika and Ku\v{c}ov\'{a}, Lucie
and Lopatkov\'{a}, Mark\'{e}ta and Mikulov\'{a}, Marie and M\'{\i}rovský, Ji\v{r}\'{\i}
and Nedoluzhko, Anna and Pajas, Petr and Panevov\'{a}, Jarmila
and Pol\'{a}kov\'{a}, Lucie and Rysov\'{a}, Magdal\'{e}na and Sgall, Petr
and Spoustov\'{a}, Johanka and Stra\v{n}\'{a}k, Pavel and Synkov\'{a}, Pavl\'{\i}na
and Šev\v{c}\'{\i}kov\'{a}, Magda and Štěp\'{a}nek, Jan and Urešov\'{a}, Zde\v{n}ka
and Vidov\'{a} Hladk\'{a}, Barbora and Zeman, Daniel and Zik\'{a}nov\'{a}, {\v{S}}\'{a}rka
and {\v{Z}}abokrtsk\'{y}, Zden\v{e}k},
url = {http://hdl.handle.net/11234/1-2621},
note = {{LINDAT}/{CLARIN} digital library at the Institute of Formal and Applied Linguistics ({{\\'U}FAL}), Faculty of Mathematics and Physics, Charles University},
copyright = {Creative Commons - Attribution-{NonCommercial}-{ShareAlike} 4.0 International ({CC} {BY}-{NC}-{SA} 4.0)},
year = {2018} }
For "plaintext" reference:
(Hajič et al., 2018)
Hajič, J., Bejček, E., Bémová, A., Buráňová, E., Hajičová, E., Havelka, J., Homola, P., Kárník, J., Kettnerová, V., Klyueva, N., Kolářová, V., Kučová, L., Lopatková, M., Mikulová, M., Mírovský, J., Nedoluzhko, A., Pajas, P., Panevová, J., Poláková, L., Rysová, M., Sgall, P., Spoustová, J., Straňák, P., Synková, P., Ševčíková, M., Štěpánek, J., Urešová, Z., Vidová Hladká, B., Zeman, D., Zikánová, Š. and Žabokrtský, Z. (2018). Prague Dependency Treebank 3.5. Institute of Formal and Applied Linguistics, LINDAT/CLARIN, Charles University, LINDAT/CLARIN PID: http://hdl.handle.net/11234/1-2621.
For footnote references, the following is sufficient in LaTeX papers:
\url{http://hdl.handle.net/11234/1-2621}