Statistical Grammar Correction or Not…

Posted in NLP Research by rsmudge on September 25, 2009

One thing that makes AtD so powerful is using the right tool for the right job. AtD uses a rule-based approach to find some errors and a statistical approach to find others. The misused word detection feature is the heaviest user of the statistical approach.

Last week I made great progress on the misused word detection making it look at the two words (trigrams) before the phrase in question instead of just one (bigrams). There were several memory challenges to get past to make this work in production, but it happened and I was happy with the results.

Before this, I just looked at the one word before and after the current word. All well and good but for certain kinds of words (too/to, their/there, your/you’re) this proved useless as it almost always guessed wrong. For these words I moved to a rule-based approach. Rules are great because they work and they’re easy to make. The disadvantage of rules is a lot of them are required to get broad coverage of the errors people really make. The statistical approach is great because it can pick up more cases.

The Experiment

So with the advent of the trigrams I decided to revisit the statistical technology for detecting errors I use rules for. I focused on three types of errors just to see what kind of difference there was. These types were:

Wrong verb tense (irregular verbs only) – Writers tend to have trouble with irregular verbs as each case has to be memorized. Native English speakers aren’t too bad but those just learning have a lot of trouble with these. An example set would be throw vs. threw vs. thrown vs. throws vs. throwing.
Agreement errors (irregular nouns only) – Irregular nouns are nouns that have plural and singular cases one has to memorize. You can’t just add an s to convert them. An example is die vs. dice.
Confused words – And finally there are several confused words that the statistical approach just didn’t do much good for. These include it’s/its, their/there, and where/were.

You’ll notice that each of these types of errors relies on fixed words with a fixed set of alternatives. Perfect for use with the misused word detection technology.

My next step was to generate datasets for training and evaluating a neural network for each of these errors. With this data I then trained the neural network and compared the test results of this neural network (Problem AI) to how the misused word detector (Misused AI) did as-is against these types of errors. The results surprised me.

The Results

	Problem AI		Misused AI
	Precision	Recall	Precision	Recall
Irregular Verbs	86.28%	86.43%	84.74%	84.68%
Irregular Nouns	93.91%	94.36%	93.77%	94.65%
Confused Words	85.38%	83.64%	83.85%	81.90%

Table 1. Statistical Grammar Checking with Trigrams

First, you’ll notice the results aren’t that great. Imagine what they were like before I had trigrams. 🙂 Actually, I ran those numbers too. Here are the results:

	Problem AI		Misused AI
	Precision	Recall	Precision	Recall
Irregular Verbs	69.02%	68.76%	64.84%	64.43%
Irregular Nouns	83.66%	83.66%	82.93%	82.93%
Confused Words	80.15%	77.82%	76.15%	73.27%

Table 2. Statistical Grammar Checking with Bigrams

So, my instinct was at least validated. There was a big improvement going from bigrams to trigrams… just not big enough. What surprised me from these results is how close the AI trained for detecting the misused words performed to the AI trained for the problem at hand.

So one this experiment showed there is no need to train a separate model for dealing with these other classes of errors. There is a slight accuracy difference but it’s not significant enough to justify a separate model.

Real World Results

The experiment shows that all I need (in theory) is a higher bias to protect against false positives. With higher biasing in place (my target is always <0.05% false positives), I decided to run this feature against real writing to see what kind of errors it finds. This is an important step because the tests come from fabricated data. If the mechanism fails to find meaningful errors in a real corpus of writing then it’s not worth adding to AtD.

The irregular verb rules found 500 “errors” in my writing corpus. Some of the finds were good and wouldn’t be picked up by rules, for example:

Romanian justice had no saying whatsoever
[ACCEPT] no, saying -> @('say')
However it will mean that once 6th May 2010 comes around the thumping that Labour get will be historic
[ACCEPT] Labour, get -> @('getting', 'gets', 'got', 'gotten')

Unfortunately, most of the errors are noise. The confused words and nouns have similar stories.

Conclusions

The misused word detection is my favorite feature in AtD. I like it because it catches a lot of errors with little effort on my part. I was hoping to get the same win catching verb tenses and other errors. This experiment showed misused word detection is fine for picking a better fit from a set and I don’t need to train separate models for different types of errors. This gives me another tool to use when creating rules and also gives me a flexible tool to use when trying to mine errors from raw text.

5 comments

5 Responses

Subscribe to comments with RSS.

Joey Mavity said, on October 26, 2009 at 5:51 pm

“Rules are great because they work and there easy to make.”
- rsmudge said, on October 26, 2009 at 5:57 pm
  
  Thanks. I don’t claim to catch 100% of the errors 100% of the time.
  - Joey Mavity said, on October 28, 2009 at 3:06 pm
    
    Sure. It was just triply incongruous as: 1) I assume your posts are checked by “the best in Natural Language Processing”; 2) It’s in a line about how easy it is to make rules; 3) It’s in a post about common word errors (their/there is listed, the triad is incomplete without ‘they’re’).
    
    But fundamentally I made the comment so that such an error might be caught by your software in the future. I’d like to see it be better than Word’s check, someday.
  - rsmudge said, on October 28, 2009 at 3:09 pm
    
    @Joey I actually took down some examples from my own writing where I inadvertently used their when I meant they’re. I did it again today in an internal blog post. I’m going to work on creating some rules for these today.
Measuring the Real Word Error Corrector « After the Deadline said, on April 9, 2010 at 11:47 pm

[…] are real word errors. You may ask, can real word error detection be applied to grammar checking? Yes, others are working on it. It makes sense to test how well After the Deadline as a complete system […]

Comments are closed.

After the Deadline