<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments for After the Deadline</title>
	<atom:link href="http://blog.afterthedeadline.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.afterthedeadline.com</link>
	<description>Natural language processing blog.</description>
	<lastBuildDate>Tue, 10 Aug 2010 21:54:09 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>Comment on Generating a Plain Text Corpus from Wikipedia by Aly</title>
		<link>http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/#comment-1405</link>
		<dc:creator><![CDATA[Aly]]></dc:creator>
		<pubDate>Tue, 10 Aug 2010 21:54:09 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=324#comment-1405</guid>
		<description><![CDATA[Hmm, it worked fine on a different version of wikipedia (the latest one: 20100728). Not sure what happened. Thanks for your quick reply anyway.]]></description>
		<content:encoded><![CDATA[<p>Hmm, it worked fine on a different version of wikipedia (the latest one: 20100728). Not sure what happened. Thanks for your quick reply anyway.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Generating a Plain Text Corpus from Wikipedia by rsmudge</title>
		<link>http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/#comment-1396</link>
		<dc:creator><![CDATA[rsmudge]]></dc:creator>
		<pubDate>Tue, 10 Aug 2010 02:38:28 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=324#comment-1396</guid>
		<description><![CDATA[File not found eh... make sure you have enough disk space and enough free inodes on that disk to create all the files you will need. This script creates a lot of them. It&#039;s been a long time since I&#039;ve run this process, but I was able to pull it off for 10 wikipedias (including en). I have faith you&#039;ll get it too.]]></description>
		<content:encoded><![CDATA[<p>File not found eh&#8230; make sure you have enough disk space and enough free inodes on that disk to create all the files you will need. This script creates a lot of them. It&#8217;s been a long time since I&#8217;ve run this process, but I was able to pull it off for 10 wikipedias (including en). I have faith you&#8217;ll get it too.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Generating a Plain Text Corpus from Wikipedia by Aly</title>
		<link>http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/#comment-1391</link>
		<dc:creator><![CDATA[Aly]]></dc:creator>
		<pubDate>Mon, 09 Aug 2010 02:41:19 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=324#comment-1391</guid>
		<description><![CDATA[I failed with your script on step 3.
Any ideas what&#039;s wrong? Thanks!

Traceback (most recent call last):
  File &quot;./xmldump2files.py&quot;, line 93, in 
    xml.sax.parse(sys.argv[1], WikiPageSplitter(sys.argv[2]))
  File &quot;/tmp/python.6884/usr/lib/python2.5/xml/sax/__init__.py&quot;, line 33, in parse
  File &quot;/tmp/python.6884/usr/lib/python2.5/xml/sax/expatreader.py&quot;, line 107, in parse
  File &quot;/tmp/python.6884/usr/lib/python2.5/xml/sax/xmlreader.py&quot;, line 123, in parse
  File &quot;/tmp/python.6884/usr/lib/python2.5/xml/sax/expatreader.py&quot;, line 207, in feed
  File &quot;/tmp/python.6884/usr/lib/python2.5/xml/sax/expatreader.py&quot;, line 304, in end_element
  File &quot;./xmldump2files.py&quot;, line 79, in endElement
    writeArticle(self.root, self.title, self.text)
  File &quot;./xmldump2files.py&quot;, line 41, in writeArticle
    out = open(filename, &quot;w&quot;)
IOError: [Errno 2] No such file or directory: &#039;../enwiki-20100130-out/90/f9/Con.txt&#039;]]></description>
		<content:encoded><![CDATA[<p>I failed with your script on step 3.<br />
Any ideas what&#8217;s wrong? Thanks!</p>
<p>Traceback (most recent call last):<br />
  File &#8220;./xmldump2files.py&#8221;, line 93, in<br />
    xml.sax.parse(sys.argv[1], WikiPageSplitter(sys.argv[2]))<br />
  File &#8220;/tmp/python.6884/usr/lib/python2.5/xml/sax/__init__.py&#8221;, line 33, in parse<br />
  File &#8220;/tmp/python.6884/usr/lib/python2.5/xml/sax/expatreader.py&#8221;, line 107, in parse<br />
  File &#8220;/tmp/python.6884/usr/lib/python2.5/xml/sax/xmlreader.py&#8221;, line 123, in parse<br />
  File &#8220;/tmp/python.6884/usr/lib/python2.5/xml/sax/expatreader.py&#8221;, line 207, in feed<br />
  File &#8220;/tmp/python.6884/usr/lib/python2.5/xml/sax/expatreader.py&#8221;, line 304, in end_element<br />
  File &#8220;./xmldump2files.py&#8221;, line 79, in endElement<br />
    writeArticle(self.root, self.title, self.text)<br />
  File &#8220;./xmldump2files.py&#8221;, line 41, in writeArticle<br />
    out = open(filename, &#8220;w&#8221;)<br />
IOError: [Errno 2] No such file or directory: &#8216;../enwiki-20100130-out/90/f9/Con.txt&#8217;</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Generating a Plain Text Corpus from Wikipedia by rsmudge</title>
		<link>http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/#comment-1379</link>
		<dc:creator><![CDATA[rsmudge]]></dc:creator>
		<pubDate>Wed, 28 Jul 2010 14:29:07 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=324#comment-1379</guid>
		<description><![CDATA[And so I did. Thanks for pointing this out. I&#039;ve corrected the post.]]></description>
		<content:encoded><![CDATA[<p>And so I did. Thanks for pointing this out. I&#8217;ve corrected the post.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Generating a Plain Text Corpus from Wikipedia by yhj</title>
		<link>http://blog.afterthedeadline.com/2009/12/04/generating-a-plain-text-corpus-from-wikipedia/#comment-1377</link>
		<dc:creator><![CDATA[yhj]]></dc:creator>
		<pubDate>Wed, 28 Jul 2010 07:34:53 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=324#comment-1377</guid>
		<description><![CDATA[thanks for your post. it is very useful for me. there is just a small issue on the last step. i think you have missed a sleep.jar after &quot;-jar&quot; : )]]></description>
		<content:encoded><![CDATA[<p>thanks for your post. it is very useful for me. there is just a small issue on the last step. i think you have missed a sleep.jar after &#8220;-jar&#8221; : )</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on After the Deadline Bigram Corpus &#8211; Our Gift to You&#8230; by tszming</title>
		<link>http://blog.afterthedeadline.com/2010/07/20/after-the-deadline-bigram-corpus-our-gift-to-you/#comment-1373</link>
		<dc:creator><![CDATA[tszming]]></dc:creator>
		<pubDate>Tue, 27 Jul 2010 01:41:24 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=797#comment-1373</guid>
		<description><![CDATA[I agree more isn&#039;t better, so currently I am doing research to mine those set from Wikipedia&#039; edit history.]]></description>
		<content:encoded><![CDATA[<p>I agree more isn&#8217;t better, so currently I am doing research to mine those set from Wikipedia&#8217; edit history.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on After the Deadline Bigram Corpus &#8211; Our Gift to You&#8230; by rsmudge</title>
		<link>http://blog.afterthedeadline.com/2010/07/20/after-the-deadline-bigram-corpus-our-gift-to-you/#comment-1372</link>
		<dc:creator><![CDATA[rsmudge]]></dc:creator>
		<pubDate>Tue, 27 Jul 2010 00:08:36 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=797#comment-1372</guid>
		<description><![CDATA[I maintain the set by hand. There are other sets (for example, http://www.dcs.bbk.ac.uk/~jenny/resources.html) that contain more entries. More isn&#039;t always better though. More entries (especially arbitrary ones that haven&#039;t been tested) may lead to more false positives.]]></description>
		<content:encoded><![CDATA[<p>I maintain the set by hand. There are other sets (for example, <a href="http://www.dcs.bbk.ac.uk/~jenny/resources.html" rel="nofollow">http://www.dcs.bbk.ac.uk/~jenny/resources.html</a>) that contain more entries. More isn&#8217;t always better though. More entries (especially arbitrary ones that haven&#8217;t been tested) may lead to more false positives.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on After the Deadline Bigram Corpus &#8211; Our Gift to You&#8230; by tszming</title>
		<link>http://blog.afterthedeadline.com/2010/07/20/after-the-deadline-bigram-corpus-our-gift-to-you/#comment-1370</link>
		<dc:creator><![CDATA[tszming]]></dc:creator>
		<pubDate>Mon, 26 Jul 2010 09:39:21 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=797#comment-1370</guid>
		<description><![CDATA[How do you generate the &quot;confusion set&quot; and keep it updated? 

To me, this seems to be not a big set, e.g. bar, bra not appear in the file.]]></description>
		<content:encoded><![CDATA[<p>How do you generate the &#8220;confusion set&#8221; and keep it updated? </p>
<p>To me, this seems to be not a big set, e.g. bar, bra not appear in the file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on After the Deadline Bigram Corpus &#8211; Our Gift to You&#8230; by rsmudge</title>
		<link>http://blog.afterthedeadline.com/2010/07/20/after-the-deadline-bigram-corpus-our-gift-to-you/#comment-1361</link>
		<dc:creator><![CDATA[rsmudge]]></dc:creator>
		<pubDate>Wed, 21 Jul 2010 03:33:29 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=797#comment-1361</guid>
		<description><![CDATA[Thanks for the comment. It&#039;s always good to read something constructive from a member of the community. Yes, I should not make mistakes. I&#039;m trying to make software that will help writers break their bad habits. One of those bad habits, unfortunately, is forgetting to run the tool.]]></description>
		<content:encoded><![CDATA[<p>Thanks for the comment. It&#8217;s always good to read something constructive from a member of the community. Yes, I should not make mistakes. I&#8217;m trying to make software that will help writers break their bad habits. One of those bad habits, unfortunately, is forgetting to run the tool.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on After the Deadline Bigram Corpus &#8211; Our Gift to You&#8230; by Dick Jenkin</title>
		<link>http://blog.afterthedeadline.com/2010/07/20/after-the-deadline-bigram-corpus-our-gift-to-you/#comment-1360</link>
		<dc:creator><![CDATA[Dick Jenkin]]></dc:creator>
		<pubDate>Wed, 21 Jul 2010 00:00:40 +0000</pubDate>
		<guid isPermaLink="false">http://blog.afterthedeadline.com/?p=797#comment-1360</guid>
		<description><![CDATA[There&#039;s only one trouble with doing your sort of work - you cannot afford to make any spelling or grammar mistakes yourselves!  Have a look at the &quot;it&#039;s&quot; in the last line of this paragraph!

bigrams.txt.gz
This file is a dump of each two-word sequence that occurs in our corpus. It has 5,612,483 word pairs associated with a count. You can use this information to calculate the probability of a word given it&#039;s next or previous words. 

HOWEVER - I see that it has been fixed already (apostrophe removed)!!  Full credit for that!
  Thanks - Dick Jenkin.]]></description>
		<content:encoded><![CDATA[<p>There&#8217;s only one trouble with doing your sort of work &#8211; you cannot afford to make any spelling or grammar mistakes yourselves!  Have a look at the &#8220;it&#8217;s&#8221; in the last line of this paragraph!</p>
<p>bigrams.txt.gz<br />
This file is a dump of each two-word sequence that occurs in our corpus. It has 5,612,483 word pairs associated with a count. You can use this information to calculate the probability of a word given it&#8217;s next or previous words. </p>
<p>HOWEVER &#8211; I see that it has been fixed already (apostrophe removed)!!  Full credit for that!<br />
  Thanks &#8211; Dick Jenkin.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

