Tags · coryhacking/goose

1.4.1

Resolving goofy maven issue. it required a new version to fully update.

Jun 14, 2011
1f222a2
zip
tar.gz

1.4.0

Major: DefaultOutputFormatter#getFormattedText now unescapes HTML inc…

…luding all HTML Entities

Minor: I have begun to convert the usage of DefaultOutputFormatter so that you only use a single method: getFormattedText(Element topNode)

Bug fixes:
  * clean by class name was too restrictive and removed actual content elements, modified the list of names to only remove classes
    that end in "meta" instead of just containing the word "meta"

  * Modified DefaultDocumentCleaner#cleanBadTags to only select from within the body element to avoid removing it.

  * Added a helper method for removing nodes to handle cases where the node's parentNode is null (already removed). This was previously
    throwing an IllegalArgumentException from within jSoup and thus failing the extraction.

Jun 13, 2011
765927a
zip
tar.gz

1.3.14

Version 1.3.14

Jun 9, 2011
0085a9c
zip
tar.gz

1.3.13

upping to version 1.3.13 that contains a minor fix to tag extraction

May 20, 2011
b2df435
zip
tar.gz

1.3.12

Adding tag 1.3.12

May 20, 2011
804d434
zip
tar.gz

1.3.11

including ability to define custom extractors as well as regex clean ups

May 20, 2011
9ac4e9d
zip
tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1.4.1

1.4.0

1.3.14

1.3.13

1.3.12

1.3.11

Tags: coryhacking/goose

1.4.1

1.4.0

1.3.14

1.3.13

1.3.12

1.3.11