PyCantonese: Cantonese Linguistics and NLP in Python

PyCantonese is a Python library for Cantonese linguistics and natural language processing (NLP). The goal of PyCantonese is to provide general-purpose tools and other functionality to work with Cantonese data. They include corpus search functions as well as various analytic and annotation tools; these and other possibilities are gradually added as the library grows and evolves.

How to cite

PyCantonese is maintained by Jackson Lee.

A talk introducing PyCantonese:

Jackson L. Lee. 2015. PyCantonese: Cantonese linguistic research in the age of big data. Talk at the Childhood Bilingualism Research Centre, Chinese University of Hong Kong. September 15. 2015. [Notes+slides]

See Research outputs for a running list of our work.

Technical support, library development, etc.

Questions, bug reports and suggested features are more than welcome. Please create issues on the GitHub page. Alternatively, you may contact Jackson Lee.

Changelog on GitHub