Skip to content

Commit

Permalink
Added #html_body support and fixed how #body works
Browse files Browse the repository at this point in the history
  • Loading branch information
peterc committed Jun 20, 2010
1 parent 1b444f0 commit 8694bde
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 4 deletions.
4 changes: 3 additions & 1 deletion README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,9 @@ There's also a shorter "convenience" method which might be handy in IRB - it doe

Pismo['http://www.rubyflow.com/items/4082'].title # => "Install Ruby as a non-root User"

The current metadata methods are #title, #titles, #author, #authors, #lede, #keywords, #sentences(qty), #body, #feed, #feeds, #favicon, #description and #datetime. These are not fully documented here yet, you'll just need to try them out. The plural methods like #titles, #authors, and #feeds will return multiple matches in an array, if present. This is so you can use your own techniques to choose a "best" result in ambiguous cases.
The current metadata methods are #title, #titles, #author, #authors, #lede, #keywords, #sentences(qty), #body, #html_body, #feed, #feeds, #favicon, #description and #datetime. These are not fully documented here yet, you'll just need to try them out. The plural methods like #titles, #authors, and #feeds will return multiple matches in an array, if present. This is so you can use your own techniques to choose a "best" result in ambiguous cases.

#html_body and #body will be of particular interest. They return the "body" of the page as determined by Pismo's "Reader" (like Arc90's Readability or Safari Reader) algorithm. #body returns it as plain-text, #html_body maintains some basic HTML styling.

## CAUTIONS / WARNINGS:

Expand Down
9 changes: 6 additions & 3 deletions lib/pismo/internal_attributes.rb
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,6 @@ def lede(all = false)
'.post-text p',
'#blogpost p',
'.story-teaser',
'.subhead',
'//div[@class="entrytext"]//p[string-length()>10]', # Ruby Inside / Kubrick style
'section p',
'.entry .text p',
Expand All @@ -206,7 +205,6 @@ def lede(all = false)
'#article p',
'.post-body',
'.entry-content',
'.body p',
'.document_description_short p', # Scribd
'.single-post p'
], all)
Expand Down Expand Up @@ -268,7 +266,12 @@ def reader_doc

# Returns body text as determined by Reader algorithm
def body
@body ||= reader_doc.content.strip
@body ||= reader_doc.content(true).strip
end

# Returns body text as determined by Reader algorithm WITH basic HTML formatting intact
def html_body
@html_body ||= reader_doc.content.strip
end

# Returns URL to the site's favicon
Expand Down

0 comments on commit 8694bde

Please sign in to comment.