PyCantonese: Cantonese Linguistics and NLP in Python¶
PyCantonese is a Python library for Cantonese linguistics and natural language processing (NLP). The goal of PyCantonese is to provide general-purpose tools and other functionality to work with Cantonese data. They include corpus search functions as well as various analytic and annotation tools; these and other possibilities are gradually added as the library grows and evolves.
Table of Contents¶
- Download and Install
- Corpus Data
- Stop Words
- Corpus Reader Methods
- Jyutping Romanization: Parsing and Conversion
- Search Queries
- Research Outputs
How to Cite¶
PyCantonese is maintained by Jackson Lee.
A talk introducing PyCantonese:
Jackson L. Lee. 2015. PyCantonese: Cantonese linguistic research in the age of big data. Talk at the Childhood Bilingualism Research Centre, Chinese University of Hong Kong. September 15. 2015. [Notes+slides]
See Research Outputs for a running list of our work.
Technical Support, Library Development, etc.¶
For updates, tips, and more: