pycantonese.pos_tagging.hkcancor_to_ud

pycantonese.pos_tagging.hkcancor_to_ud(tag: str = None)[source]

Map a part-of-speech tag from HKCanCor to Universal Dependencies.

HKCanCor uses a part-of-speech tagset of over 100 tags (46 of which are described at http://compling.hss.ntu.edu.sg/hkcancor/). For applications that would benefit from a less granular part-of-speech tagset (e.g., cross-linguistic natural language processing tasks), we can map the HKCanCor tagset to the Universal Dependencies v2 tagset with 17 tags (https://universaldependencies.org/u/pos/index.html) – the purpose of this function.

Any unrecognized tag is mapped to "X".

New in version 3.1.0.

Parameters
tagstr, optional

A tag from the original HKCanCor annotated data. If not provided or None, this function returns the entire dictionary of the tagset mapping from HKCanCor to UD.

Returns
str or dict[str, str]

A tag from the Universal Dependencies v2 tagset, or a dictioary from HKCanCor to UD tags if no input is given.

Examples

>>> hkcancor_to_ud("V")
'VERB'