To the newbie following examples, who has not poured through the docs, and is not a ruby expert (me), Hpricot does give some surprises.

I spent a lot of time figuring this out.

check= trrow.search("//td[@width='30']//img[@alt='Winner']")

I need to see if the html row contains this image or not.

On some rows check is blank, on some it has the entire html as expected.

However, if i do as follows: if check != "" then this always evaluates to true.

I looked everywhere else before i found this out. There was no way for me to differentiate between the check which was blank, and the check which contained the td.

In the case of the blank check, print " #{check}" always printed nothing.

Finally i had to do this, which I don’t like: if "#{check}" != "" then . Reminds me of unix shell scripting.

I had problems cleanly separating text inside nested html such as (see source, search Rafael):

< td width = "268" align = "left" valign = "middle" >& nbsp; & nbsp; < a href = "/en_US/bios/overview/atpn409.html" class = "alt2" >< b >Rafael Nadal</ b ></ a > & nbsp; ESP& nbsp; (1)</ td >



inner_text on the entire element gives me both Rafael Nadal and ESP with “?” inside.

inner_text on the a block gives me the name, but no way to extract just ESP.

There are lots of “??”‘s that come in the text. So in some cases, I just had to parse the inner_text and split on the “?”‘s.

Finally, I did get my program running. It is extremely dependent on the html, the slightest change will make this program inoperable. However, i was able to transform a difficult to visually process format to an easy one.

My output comes out like this:

Rafael Nadal ESP (1) def. Ryler DeHeart USA 6-1 6-2 6-4 James Blake USA (9) def. Steve Darcis BEL 4-6 6-3 1-0 (Retired) Mardy Fish USA def. Paul-Henri Mathieu FRA (24) 6-2 3-6 6-3 6-4 Gael Monfils FRA (32) def. Evgeny Korolev RUS 6-2 6-3 3-6 6-4 Stanislas Wawrinka SUI (10) def. Wayne Odesnik USA 6-4 7-6 (8-6) 6-2

The original page is here, see how different it is. I have put the winner on the left side always. The program tennsc.rb lies here. Sample usage:

./tennsc.rb http://www.usopen.org/en_US/scores/cmatch/10ms.html

—

Tennis scores in an easy to read format:

http://sports.yahoo.com/ten/matches