grammaCy
github | grammacy.com
Python, spaCy, Flask, Docker, AWS EC2, SvelteKit, Firebase
Our proposal for a new method of grammar checking; a hybrid approach that prioritizes speed, explainability, and resource efficiency while maintaining high accuracy. Aims to improve the accuracy of rule-based grammar checking by using dependency parsing. NMT-based systems are highly accurate, but are slow, require significant development time, and often use resource-intensive transformers. Traditional rule-based systems are fast and lightweight, but often inaccurate. We leverage all the benefits of rule-based systems, while improving accuracy to a level that is competitive with NMT-based grammar checkers.
Developed by me (library, model, rules, API, website, deployment), Isaac Nguyen (library, API, website, deployment), and Pranshu Sarin (library, rules).
- A customizable, multi-language grammar checking library built with spaCy.
- Builds fast, lightweight, CPU-optimized spaCy pipelines for dependency parser-based grammar checkers.
- Preprocessing tools to work with either constituency or dependency parse data.
- Multithreaded + multiprocessed CoNLL-U augmentor to inject grammar errors based on linguistic knowledge. Significant improvement over random error injection. Parallel and concurrent processing decreased processing time on OntoNotes 5.0 from >220s to 20-30s.
- Prepackaged English model trained on GUM corpus with 34 augmentations. 98% tagger accuracy, 93% parser LAS, and 98% morphologizer accuracy. Total model size is only 10.7MB.
- Created over 20 dependency parse-based rules for English grammar checking including subject-verb agreement, proper usage of subjective and objective pronouns, proper usage of gerunds, copulas, and much more.
- Integrated with symspellpy for fast spelling correction.
- Built a Flask API for English grammar and spell checking running on Gunicorn. Average prediction time is 25ms.
- Used Certbot and a cron job to get and automatically renew SSL certificates.
- Used Nginx for SSL termination, rate limiting, and load balancing.
- Deployed the Gunicorn, Nginx, and Certbot services on AWS EC2 with Docker Compose.
- Deployed a Svelte website for library documentation, API usage, and a dev blog on Firebase Hosting.