ASpell

From: Mikee 4 Aug 2011 22:17
To: ALL1 of 3
I could do with a bit of help with Aspell...

Background (skip unless bored):

A big client of ours has been #1 on a 'website checking' system for a very long time, and recently the site has dropped down a few rankings because the site has grown (30k+ urls) and we're doing go-lives on an almost daily basis now.

The system checks for things like:

HTML Validation, html headers not being in the correct order, CSS linking to images that don't exist (even if the css classes arent being used!), links with the same text linking to different urls, PDF accessibility validation, response times..etc..

This particular website we develop not only has daily code/database deployments, it also has a large timing system built into the site, which automatically changes many parts of the site over the weekend or in the evenings. So, it can be pretty hard to manage considering it's constantly under development and it's a pretty large website.

Obviously you may think that all is fine, because the site checking system is there to tell us of any problems, but both us and the client wish to keep that #1 ranking as it's very useful.

So, to solve these problems, I replicated this 'site checking' system internally over a couple of weekends. It doesn't FULLY replicate it, but it checks for the most common problems that trip us up, so we can always quickly run it before a release to make sure things are fine.

It's built on Node.js, it uses 'phing' to grab an up-to-date copy of the database and codebase, spiders the entire site - checking for a range of different problems, stores the results in a database and also streams them via websockets down to the browser.

Lovely. Works real nice. But..

The question:

I wish to now implement spell checking. The 'website checker' spellchecks the site, and allows us to add additional terms to our dictionary for things like product names.

Ideally, I'd like to maintain a list of words locally in my own scanning system, scan against my own dictionary, then upload the dictionary to the other site.

Looking at command line apps, it seems that "aspell" may be the thing to use. Something like this, even:

aspell --mode=html list < page.html

But, when I call this from node I'll have access to both the URL and the raw contents of the HTML output.

I'm thinking that I'll probably have to save the contents to a temporary HTML file before running the command, but can anyone think of a better way to do it? My linux skills are teh suck.
From: Matt 4 Aug 2011 23:29
To: Mikee 2 of 3
If you were using PHP and assuming aspell accepts input from stdin and can write output to stdout, you could use proc_open() to pipe the content to and from aspell without having to write to a temporary file.
From: Mikee 4 Aug 2011 23:50
To: Matt 3 of 3
Hrm, looks like I should be able to do similar with node.js. Thanks matt.