pycantonese.pos_tagging.hkcancor_to_ud(tag: str = None)[source]

Map a part-of-speech tag from HKCanCor to Universal Dependencies.

HKCanCor uses a part-of-speech tagset of over 100 tags (46 of which are described at For applications that would benefit from a less granular part-of-speech tagset (e.g., cross-linguistic natural language processing tasks), we can map the HKCanCor tagset to the Universal Dependencies v2 tagset with 17 tags ( – the purpose of this function.

Any unrecognized tag is mapped to "X".

New in version 3.1.0.

tagstr, optional

A tag from the original HKCanCor annotated data. If not provided or None, this function returns the entire dictionary of the tagset mapping from HKCanCor to UD.

str or dict[str, str]

A tag from the Universal Dependencies v2 tagset, or a dictioary from HKCanCor to UD tags if no input is given.


>>> hkcancor_to_ud("V")