grammaCy

github | grammacy.com

Python, spaCy, Flask, Docker, AWS EC2, SvelteKit, Firebase

Our proposal for a new method of grammar checking; a hybrid approach that prioritizes speed, explainability, and resource efficiency while maintaining high accuracy. Aims to improve the accuracy of rule-based grammar checking by using dependency parsing. NMT-based systems are highly accurate, but are slow, require significant development time, and often use resource-intensive transformers. Traditional rule-based systems are fast and lightweight, but often inaccurate. We leverage all the benefits of rule-based systems, while improving accuracy to a level that is competitive with NMT-based grammar checkers.

Developed by me (library, model, rules, API, website, deployment), Isaac Nguyen (library, API, website, deployment), and Pranshu Sarin (library, rules).

  • A customizable, multi-language grammar checking library built with spaCy.
  • Builds fast, lightweight, CPU-optimized spaCy pipelines for dependency parser-based grammar checkers.
  • Preprocessing tools to work with either constituency or dependency parse data.
  • Multithreaded + multiprocessed CoNLL-U augmentor to inject grammar errors based on linguistic knowledge. Significant improvement over random error injection. Parallel and concurrent processing decreased processing time on OntoNotes 5.0 from >220s to 20-30s.
  • Prepackaged English model trained on GUM corpus with 34 augmentations. 98% tagger accuracy, 93% parser LAS, and 98% morphologizer accuracy. Total model size is only 10.7MB.
  • Created over 20 dependency parse-based rules for English grammar checking including subject-verb agreement, proper usage of subjective and objective pronouns, proper usage of gerunds, copulas, and much more.
  • Integrated with symspellpy for fast spelling correction.
  • Built a Flask API for English grammar and spell checking running on Gunicorn. Average prediction time is 25ms.
  • Used Certbot and a cron job to get and automatically renew SSL certificates.
  • Used Nginx for SSL termination, rate limiting, and load balancing.
  • Deployed the Gunicorn, Nginx, and Certbot services on AWS EC2 with Docker Compose.
  • Deployed a Svelte website for library documentation, API usage, and a dev blog on Firebase Hosting.