Archive for the ‘popular’ Category

Rails + Tidy + REXML

Thursday, March 1st, 2007

It wasn’t totally straight forward to get Tidy, REXML and Rails to play together, so I thought I would write down what and how I did it to save time for others.

The reason for doing this is that I get text in (X)HTML format through RSS feeds and I want to make excerpts of it. So given a long text as input I want to make a short extract of it.

After a bit of thinking and googling I figured out that slicing a HTML document after a given amount of characters is not super trivial to do. Because of the tags you need to actually parse the HTML document and keep track of which tags you need to close then reaching the given amount of characters. Luckily for us Mike Burns has already written a function for Truncating HTML in Ruby. Perfect!

However, after adding that piece of code (and unit tests for that of course) you will find out that REXML barfs if the input is not well-formed HTML and naturally having no control of the content of the RSS feeds there is no way you can guarantee that.

Luckily Tidy comes to the rescue. Tidy is a library that corrects invalid HTML. Install the tidy library and then the tidy ruby gem.

gem install tidy

Unfortunately you have to manually set the path to the library before you can use it with

Tidy.path = '/usr/lib/tidylib.so'

If you, like me use an apple laptop for development and linux on the server that path is going to be different between the environments. So what I did was to introduce a constant in the rails environment files. In the config/environments/production.rb file I put:

TIDY_LIB_PATH = '/usr/lib/libtidy.so'

And naturally I set it to the correct path for my powerbook in the config/environments/development.rb file. Then I just do

Tidy.path = TIDY_LIB_PATH

before using Tidy and everything is good.

To make Tidy behave decently you need to set the following options:

  • tidy.options.show_body_only = true – don’t output body and html tags
  • tidy.options.output_xhtml = true – output xhtml
  • tidy.options.wrap = 0 – don’t write newlines all over the place
  • tidy.options.char_encoding = ‘utf8′ – use utf8 to play nice with rails

so in the end this is what I ended up with:

require 'rexml/parsers/pullparser'
require 'tidy'</p>

<p>def make_excerpt
excerpt = slice(tidy_up_html(content), 2000)
end</p>

<p>def tidy_up_html(html)
Tidy.path = TIDY_LIB_PATH</p>

<p>cleaned_up = Tidy.open do |tidy|
tidy.options.show_body_only = true
tidy.options.output_xhtml = true
tidy.options.wrap = 0
tidy.options.char_encoding = 'utf8'
cleaned_up = tidy.clean(html)
cleaned_up
end
end</p>

<p>def slice(string, length, ellipsis = '...')
p = REXML::Parsers::PullParser.new(string)
tags = []
new_len = length
results = ''
while p.has_next? &amp;&amp; new_len &gt; 0
p_e = p.pull
case p_e.event_type
when :start_element
tags.push p_e[0]
results &lt;&lt; &quot;&lt;#{tags.last} #{attrs_to_s(p_e[1])}&gt;&quot;
when :end_element
results &lt;&lt; &quot;&lt;!--#{tags.pop}--&gt;&quot;
when :text
results &lt;&lt; p_e[0].first(new_len)
current_len = new_len
new_len -= p_e[0].length
if new_len &lt; 0</p>

<h1>find next dot</h1>

<p>i = p_e[0].index('.', current_len)
results &lt;&lt; p_e[0].slice(current_len, i-current_len) if i
results &lt;&lt; p_e[0].slice(current_len, p_e[0].length) unless i
results &lt;&lt; ellipsis
end
else
results &lt;&lt; &quot;&lt;!-- #{p_e.inspect} --&gt;&quot;
end
end
tags.reverse.each do |tag|
results &lt;&lt; &quot;&lt;!--#{tag}--&gt;&quot;
end
results
end

I modified Mike Burns’ method so that after the given number of characters has been reached it will still include text until the next ‘.’ character. I figured it’s much nicer with an excerpt that ends with a complete sentence.

Feel free to use this code if you want.

Make your Sony Ericsson K800 look like an iPhone

Monday, February 5th, 2007

Last week I mentioned the iPhone-theme for Motorola RAZR now there is an iPhone theme for K800i phones as well.

I’m still waiting for a theme for my Nokia N70…

Update: The above link seems to have gone broken. Here is another working link.

via macfeber.

Could not find rails (> 0) in the repository

Tuesday, October 24th, 2006

If you get this when trying to install rails with rubygem you apparently need to remove your source cache.

Not totally obvious.

Update: As you can see from the comments re-running the command should solve it for most.

Chris Sharma’s arch project

Tuesday, September 26th, 2006

Via joost.climbing.nl we get the awesome videos below of Chris Sharma working on his deep water solo project in Mallorca. Watch them!

Fartkameravarnare

Monday, July 3rd, 2006

Efter att ha läst en artikel i Aftonbladet om en liten apparat som varnar för fartkameror så beställde jag en ActiveGPS från Ad-Teknik AB för någon månad sedan.

I fredags kom den äntligen på posten och efter att ha använt den igår när vi åkte och klättrade så måste jag tillstå att den fungerar precis så bra som jag hoppades att den skulle. Man ser exakt hur fort man kör, våran Audi 80 visade runt 5 km/h för fort vid 100 km/h, man kan se hur högt över havet man befinner sig (alltid väldigt användbart!) och viktigast av allt, den piper när man närmar sig en kamera om man råkar köra för fort.

Nu återstår bara att montera den lite snyggare. Jag funderar på att försöka bygga in den i askkoppslådan, men det kanske är lättare sagt än gjort.

MacBook recension som inte är helt igenom positiv

Friday, June 9th, 2006

MacPro.se har testat nya MacBook och är inte helt nöjda. Intressant att läsa om någon som inte är helt bländad av Apples glans och vågar vara kritisk. Men jag vill fortfarande ha en MacBook.

Min gamla iBook G3 trotjänare var nästintill perfekt förutom att 1024×768 är lite i minsta laget för att jobba effektivt med så jag tror att MacBookens 13.3″ display är ett steg i rätt riktning.

REXML Tutorial

Thursday, February 16th, 2006

REXML is a superb way to parse XML in Ruby, however it can might be a bit hard to grok at first. This tutorial might make it easier to understand how it works.

Kim Hartman citat

Friday, April 29th, 2005

Om du tittar på snooker på svenska eurosport så bara måste du gå till denna sida som har samlat ihop en stor mängd fullkompligt underbara kommentarer från allas vår älskade kommentator Kim Hartman.

Jag skrattade så otroligt mycket när jag läste igenom dom första gången.