Skip to content

Strips HTML from the input, outputs plain text, streamed in realtime without preloading the whole document

License

Notifications You must be signed in to change notification settings

millerlogic/htmlstrip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

htmlstrip

Strips HTML from the input, outputs plain text. It is streamed in realtime without preloading the whole document.

  • Easy to use Writer interface:
    io.Copy(&htmlstrip.Writer{W: os.Stdout}, os.Stdin)

  • All it does is strip HTML into plain text.

  • Should never use excessive memory as it does not buffer the whole document.

  • Script, style and head tags are removed entirely, as they are not part of the page's text.

  • The provided command strips HTML from standard input or specified files, writes plain text to standard output.
    go install github.com/millerlogic/htmlstrip/cmd/htmlstrip

  • Could be used as an extremely basic, non-interactive text browser:
    curl -s -S https://en.wikipedia.org/wiki/Chinchilla | htmlstrip | less

About

Strips HTML from the input, outputs plain text, streamed in realtime without preloading the whole document

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages