After the Deadline

AtD Updates – Split Words and Possessive Errors

Posted in News by rsmudge on October 6, 2009

I’m back from my vacation.  Ok, I’ve been back for over a week now.  I’m working on some stuff that will make it easier to add AtD to more applications.  In the mean time, from your suggestions I’ve added some new rules to AtD.  Besides the usual tweaking here is what you get:

The spell checker now splits the misspelled word and rates these split phrases with the other suggestions.  This means typing alot, you’ll now see a lot as the top suggestion.  It also means you’ll see the right thing when you accidentally run two words together such as atleast -> at least.  I was notified about the latter error on Twitter.

Our local Media Engineer, Raanan, noted that AtD didn’t find an error in the phrase “if this is your companies way of doing support”.  If you’re curious, here is the what the process looks like:

<raanan> raffi: tried “I wonder if this is your companies way of providing support”
<raanan> AtD didn’t flag it
<raffi> I can add some grammar rules for your followed by a plural noun and try to correct it to a possessive noun
<raffi> your .*/NNS::word=your \1:base’s::pivots=\1,\1:base’s
<raffi> whether that catches some real errors or not is up for debate, can try it though
<raanan> could be interesting
<raffi> I’m trying it now
<raffi> rule development is taking an error like you gave me, going through some steps to make a rule for it, and testing it on a bunch of text I have laying around
<raffi> if it finds errors, it gets included–if it doesn’t… it goes nowhere
<raffi> or gets tweaked
<raffi> if it flags too many things as correct, I keep trying to refine it (can usually be done) until it just catches errors (although it might miss some errors to keep from flagging correctly written stuff)
<raffi> when I first made the grammar checker, I spent 3 weeks straight doing that, it got very tedious
<raffi> $ ./bin/testr.sh test.r raanan.txt
<raffi> Warning: Dictionary loaded: 124314 words at dictionary.sl:50
<raffi> Warning: Looking at: your|companies|way = 3.8133735008675426E-5 at testr.sl:24
<raffi> Warning: Looking at: your|company’s|way = 2.955364463172345E-4 at testr.sl:24
<raffi> I wonder if this is your companies way of providing support
<raffi> I/PRP wonder/VBP if/IN this/DT is/VBZ your/PRP$ companies/NNS way/NN of/IN providing/VBG support/NN
<raffi> 0) [ACCEPT] is, your companies -> @(‘your company’s’)
<raffi> id         => c17ed0984ed4d01ac172f0afd95ee00c
<raffi> pivots     => \1,\1:possessive
<raffi> path => @(‘your’, ‘.*’) @(‘.*’, ‘NNS’)
<raffi> word       => your \1:possessive
<raffi> ding
<raffi> let’s see if that works against a bunch of written text

And it did.  I tested a broader version of the rule against a big corpus of written text and found several errors with only one false positive.  Here are the final rules:

That|The|the|that|Your|your|My|my|Their|their|Her|her|His|his .*/NNS .*/NN::word= \1:possessive \2::pivots=\1,\1:possessive

If you have ideas to enhance After the Deadline or found a clear cut error that it doesn’t catch, let me know.  I’m happy to look at it.

%d bloggers like this: