Attempting to Detect and Correct Out of Place Plurals and Singulars

Posted in HOWTO, NLP Research by rsmudge on September 22, 2009

Welcome to After the Deadline Sing-A-Long. I’m constantly trying different experiments to catch my (and occasionally your) writing errors. When AtD is open sourced, you can play along at home. Read on to learn more.

Today, I’m starting a new series on this blog where I will show off experiments I’m conducting with AtD and sharing the code and ideas I use to make them happen. I call this After the Deadline Sing-A-Long. My hope is when I finally tar up the sourcecode and make AtD available, you can replicate some of these experiments, try your own ideas, have a discovery, and send them to me.

Detecting and Correcting Plural or Singular Words

One of the mistakes I make most is to accidentally add an ‘s’ to a word making it plural when I really wanted it to be singular. My idea? Why not take every plural verb and noun and convert them to their singular form and see if the singular form is a statistically better fit.

AtD can do this. The grammar and style checker are rule-based but they also use the statistical language model to filter out false positives. To set up this experiment, I created a rule file with:

.*/NNS::word=:singular::pivots=,:singular
.*/VBZ::word=:singular::pivots=,:singular

AtD rules each exist on their own line and consist of declarations separated by two colons. The first declaration is a pattern that represents the phrase this rule should match. It can consist of one or more [word pattern]/[tag pattern] sequences. The tag pattern is the part-of-speech (think noun, verb, etc.). After the pattern comes the named declarations. The word= declaration is what I’m suggesting in place of the matched phrase. Here I’m converting any plural noun or verb to a singular form. The pivots specify the parts of the phrase that have changed to inform that statistical filtering I mentioned earlier.

The next step is to create a file with some examples to test with. I generally do this to see if the rule does what I want it to do. Here are two of the sentences I tried:

There are several people I never had a chance to thanks publicly.
After the Deadline is a tool to finds errors and correct them.

So, with these rules in place, here is what happened when I tested them:

atd@build:~/atd$ ./bin/testr.sh plural_to_singular.rules examples.txt
Warning: Dictionary loaded: 124264 words at dictionary.sl:50
Warning: Looking at: several|people|I = 0.003616585140061973 at testr.sl:24
Warning: Looking at: several|person|I = 1.1955063236931511E-4 at testr.sl:24
Warning: Looking at: to|thanks|publicly = 1.25339251574261E-6 at testr.sl:24
Warning: Looking at: to|thank|publicly = 1.7004358463574743E-4 at testr.sl:24
There are several people I never had a chance to thanks publicly.
There/EX are/VBP several/JJ people/NNS I/PRP never/RB had/VBD a/DT chance/NN to/TO thanks/NNS publicly/RB
   0) [REJECT] several, people -> I
        id         => 3095c361e8beeb60abebed29fe5657be
        pivots     => ,:singular
        path       => @('.*', 'NNS')
        word       => :singular
   1) [ACCEPT] to, thanks -> @('thank')
        id         => 3095c361e8beeb60abebed29fe5657be
        pivots     => ,:singular
        path       => @('.*', 'NNS')
        word       => :singular
Warning: Looking at: Deadline|is|a = 0.05030783620533642 at testr.sl:24
Warning: Looking at: Deadline|be|a = 0.00804979134152438 at testr.sl:24
Warning: Looking at: to|finds|errors = 0.0 at testr.sl:24
Warning: Looking at: to|find|errors = 0.0024240611254462076 at testr.sl:24
Warning: Looking at: finds|errors|and = 2.5553084536790455E-5 at testr.sl:24
Warning: Looking at: finds|error|and = 3.14416280096839E-4 at testr.sl:24
After the Deadline is a tool to finds errors and correct them.
After/IN the/DT Deadline/NN is/VBZ a/DT tool/NN to/TO finds/NNS errors/NNS and/CC correct/NN them/PRP
   0) [REJECT] Deadline, is -> a
        id         => 1ab5cd35b6146cbecbc31c8b2a6d8e96
        pivots     => ,:singular
        path       => @('.*', 'VBZ')
        word       => :singular
   1) [ACCEPT] to, finds -> @('find')
        id         => 3095c361e8beeb60abebed29fe5657be
        pivots     => ,:singular
        path       => @('.*', 'NNS')
        word       => :singular
   2) [ACCEPT] finds, errors -> @('error')
        id         => 3095c361e8beeb60abebed29fe5657be
        pivots     => ,:singular
        path       => @('.*', 'NNS')
        word       => :singular

And that was that. After I ran the experiment against a more substantial amount of text, I found too many phrases were flagged incorrectly and I didn’t find many legitimate errors of this type. If I don’t see many obvious mistakes when applying a rule against several books, blogs, and online discussions–I ignore the rule.

In this case, the experiment showed this type of rule fails. There are options to make it better:

Set a directive to raise the statistical threshold
Try looking at more context (two words out, instead of one)
Look at the output and try to create rules that look for more specific situations where this error occurs

1 comment

One Response

Subscribe to comments with RSS.

Can we use crowd sourcing to improve AtD? « After the Deadline said, on May 27, 2010 at 6:02 pm

[…] have any ideas for magically learning from these suggestions. Right now I’d have to analyze each of them and develop rules to catch these […]

Comments are closed.

After the Deadline