pycantonese.pos_tagging.hkcancor_to_ud
- pycantonese.pos_tagging.hkcancor_to_ud(tag: Optional[str] = None)[source]
Map a part-of-speech tag from HKCanCor to Universal Dependencies.
HKCanCor uses a part-of-speech tagset of over 100 tags (46 of which are described at http://compling.hss.ntu.edu.sg/hkcancor/). For applications that would benefit from a less granular part-of-speech tagset (e.g., cross-linguistic natural language processing tasks), we can map the HKCanCor tagset to the Universal Dependencies v2 tagset with 17 tags (https://universaldependencies.org/u/pos/index.html) – the purpose of this function.
Any unrecognized tag is mapped to
"X"
.New in version 3.1.0.
- Parameters
- tagstr, optional
A tag from the original HKCanCor annotated data. If not provided or
None
, this function returns the entire dictionary of the tagset mapping from HKCanCor to UD.
- Returns
- str or dict[str, str]
A tag from the Universal Dependencies v2 tagset, or a dictioary from HKCanCor to UD tags if no input is given.
Examples
>>> hkcancor_to_ud("V") 'VERB'