Wednesday, March 25, 2009

On XML Parsers.

Ruby has recently become mature in the world of XML processing, with libxml-ruby finally hitting 1.x. Of course, we've already got quite a few, including Nokogiri, which is also based off of libxml2. Then there's the old standbys, Hpricot and REXML, which is typically installed with Ruby.

That's awesome news, right? Of course it is. And the benchmarks for libxml-ruby and Nokogiri are astounding for XML documents. Hpricot will always hold a place in my heart for HTML documents, as its especially good at picking up silly HTML/CSS markup.

However, there's one concern that's bugged me, and that's gems that rely on one of these parsers.

First of all, each one is changing all of the time, and not all are compatible with Ruby 1.9.x; one may even end up not being supported at some time in the future. Plus, some Ruby hosts (particularly the inexpensive ones) don't let you install any C-based gem willy-nilly. You have to go through the sysadmin, they make sure the code is safe, it's a big mess.

So why not support all of them? That's what I suggest.

In a soon-to-be-released little toy project of mine, a wrapper for the fmylife.com API, I needed to parse XML. So I wrote a tiny module called CanParse:

 module CanParse
def xml_doc(body)
case FMyLife.parser
when :nokogiri
Nokogiri::XML(body)
when :hpricot
Hpricot(body)
when :rexml
REXML::Document.new(body)
when :libxml
LibXML::XML::Parser.string(body).parse
end
end

def xpath(element,path)
case FMyLife.parser
when :nokogiri
element.xpath(path)
when :hpricot
puts "in hpricot"
element/path
when :rexml
REXML::XPath.match(element,path)
when :libxml
element.find(path)
end
end
  #gets content of a node
def xml_content(element)
case FMyLife.parser
when :nokogiri
element.content
when :hpricot
element.inner_text
when :rexml
element.text
when :libxml
element.content
end
end

def xml_attribute(element,attribute)
case FMyLife.parser
when :nokogiri
element[attribute]
when :hpricot
element.get_attribute(attribute)
when :rexml
element.attributes[attribute]
when :nokogiri
element.attributes[attribute]
end
end
end
I include that method in my classes that require parsing, and bam - I can use any of the big 4 XML parsers. If you need different functions, then just add a new method, pull up the RDocs of the respective parsers - it literally takes about 10 minutes for each one. And you don't need to change any code, anywhere.

Note: the funny gsub I use for Hpricot's XPath is because it will assume it is a regular tag, and that can cause a little fruitiness with FMyLife's XML documents. Feel free to tweak it as necessary.

blogger version 0.5.1 has been released!

The Blogger module provides services related to Blogger, and only blogger. The GData gem is great, but it provides a much lower-level interface to Google's Blogger API. With the Blogger gem, you have full access to the Blogger API, with easy to use classes, and it integrates with 6 different markup/markdown gems! What's more, you won't have to muck around with XML.

Sure, XML is easy. But why waste time messing around with it? With just 3 or 4 lines of Blogger.gem code, you'll be able to take a markdown-formatted string and post it as a blog post, with categories, and comments.

You can also search through all of your comments, old posts, and pretty much anything you can do at the blogger.com website, you can do with this gem.

Changes:

0.5.1 / 2009-03-25

  • Fixed a bug with Blogger::Account#post when userid is not provided
  • Fixed typo in the 3-parameter constructor of Blogger::Account

profanalyzer version 0.2.1 has been released!

Profanalyzer has one purpose: analyze a block of text for profanity. It is able to filter profane words as well.

What sets it slightly apart from other filters is that it classifies each blocked word as "profane", "racist", or "sexual" - although right now, each word is considered "profane". It also rates each word on a scale from 0-5, which is based on my subjective opinion, as well as whether the word is commonly used in non-profane situations, such as "ass" in "assess".

The Profanalyzer will default to a tolerance of of 2, which will kick back the arguably non-profane words. It will also test against all words, including racist or sexual words.

Lastly, it allows for custom substitutions! For example, the filter at the website http://www.fark.com/ turns the word "fuck" into "fark", and "shit" into "shiat". You can specify these if you want.

Changes:

0.2.1 / 2009-03-25

  • Fixed some wordlist errors.

ipgeolocation version 0.1.0 has been released!



Remote, IP-Based Geolocation for everyone!

Seriously, this gem gives you 3 different IP-location services as
1-liners. Example:

IPGeolocation.locate("12.345.678.90", :blogama)

That's all there is to it! And you get a bunch of information, especially
from blogama's API.

Changes:

### 0.1.0 / 2009-03-24

* First release

* Supports 3 remote IP Geolocation services: blogama.org, netgeo, and iphost

blogger version 0.5.0 has been released!



The Blogger module provides services related to Blogger, and only blogger. The
GData gem is great, but it provides a much lower-level interface to Google's
Blogger API. With the Blogger gem, you have full access to the Blogger API,
with easy to use classes, and it integrates with 6 different markup/markdown
gems! What's more, you won't have to muck around with XML.

Sure, XML is easy. But why waste time messing around with it? With just 3 or 4
lines of Blogger.gem code, you'll be able to take a markdown-formatted string
and post it as a blog post, with categories, and comments.

You can also search through all of your comments, old posts, and pretty much
anything you can do at the blogger.com website, you can do with this gem.

Changes:

### 0.5.0 / 2009-03-24

* First release

* Full access to the Blogger API
* 6 markup/markdown gems supported
* punch + pie

Monday, March 23, 2009

validates_not_profane 0.1 released!

This was an internal project, and now it's public. Hooray! From the readme:

validates_not_profane provides a hook into the Profanalyzer gem as a validation for your ActiveRecord models. It's use is simple:

validates_not_profane :column_name, :column_2

Will cause the model to produce errors if that column contains profanity in its value. Of course, since the Profanalyzer gem is customizable, so is the validation:

validates_not_profane :post_title, :racist => true, :sexual => false

will block racial slurs, but nothing else.

validates_not_profane :post_title, :tolerance => 5

will block only the most vile of profanity. The scale for tolerance is from 0-5, which is mostly subjective, in all honesty.

You can also use the :label option to customize the error message - see the examples in the README.

profanalyzer version 0.2.0 has been released!



Profanalyzer has one purpose: analyze a block of text for profanity. It is able to filter profane words as well.

What sets it slightly apart from other filters is that it classifies each blocked word as "profane", "racist", or "sexual" - although right now, each word is considered "profane". It also rates each word on a scale from 0-5, which is based on my subjective opinion, as well as whether the word is commonly used in non-profane situations, such as "ass" in "assess".

The Profanalyzer will default to a tolerance of of 2, which will kick back the arguably non-profane words. It will also test against all words, including racist or sexual words.

Lastly, it allows for custom substitutions! For example, the filter at the website http://www.fark.com/ turns the word "fuck" into "fark", and "shit" into "shiat". You can specify these if you want.

Changes:

### 0.2.0 / 2009-03-23

* Added an options hash to Profanalyzer#filter and Profanalyzer#profane?, letting you change settings but only within the scope of that call - using this hash won't change the global settings.