GitHub - GSonofNun/ReadSharp: :rooster: Extract meaningful website contents using a port of NReadability

ReadSharp was previously PocketSharp.Reader and is now hosted without the PocketSharp dependency.

Install ReadSharp using NuGet

Install-Package ReadSharp

What's it all about?

The library extracts the main content of a website and returns the article as HTML with it's associated title, description, favicon and all included images.

The content can be encapsulated in a <body>-Tag and displayed as a readable website with a custom CSS (it's up to you!).

ReadSharp is based on a custom PCL port of NReadability and SgmlReader, which are included in the solution.

Association with Pocket

This library is a replacement for the Article View API by Pocket which is limited by usage and privacy.

With ReadSharp you won't hit any usage limits, as you are extracting the content directly. And it's open source.

Example

using ReadSharp;

Reader reader = new Reader();
Article article;

try
{
  article = await reader.Read(new Uri("http://frontendplay.com/story/4/http-caching-demystified-part-2-implementation"));
}
catch (ReadException exc)
{
  // handle exception
}

Options

HttpOptions

You can pass HttpOptions to the Reader constructor, which count for all requests:

HttpMessageHandler CustomHttpHandler
Use your own HTTP handler
int? RequestTimeout
Define a custom timeout in seconds, after which requests should cancel
bool UseMobileUserAgent
Gets or sets a value indicating whether [use mobile user agent]
string UserAgent
Override the user agent, which is passed to the destination server
string UserAgentMobile
Override the mobile user agent, which is passed to the destination server
bool UseMobileUserAgent
There are desktop and mobile default user agents. By enabling this property, the mobile user agent is used. If you pass a custom user agent, this property is ignored!
int MultipageLimit
Gets or sets the download limit for articles with multiple pages (default: 10)

ReadOptions

There are also ReadOptions available, which are passed on every request:

bool HasHeaderTags
Return complete HTML document or just the body part
bool HasNoHeadline
Removes <h1> title from the article
bool UseDeepLinks
If you check this option, deep-links (containing hashes, e.g. href="#article") are not transformed into absolute URIs
bool PrettyPrint
Determines whether the HTML output will be formatted
bool PreferHTMLEncoding
Determines whether to prefer the encoding found in the HTML or the one found in the HTTP Header (default: true)
bool MultipageDownload
Download all pages for articles with multiple pages (default: false)
bool ReplaceImagesWithPlaceholders
If true, replace all img-tags with placeholders

Article Model

The Article contains following fields:

string Title (the title of the page)
string Description (description of the page, extracted from meta information)
string Content (contains the article)
Uri FrontImage (main page image extracted from meta tags like apple-touch-icon and others)
Uri Favicon (the favicon of the page)
List<ArticleImage> Images (contains all images found in the text)
string NextPage (contains the next page URI, if available)

Article Image

Uri Uri
string Title (extracted from the title attribute)
string AlternativeText (extracted from the alt attribute)

Supported platforms

ReadSharp is a Portable Class Library, therefore it's compatible with multiple platforms and Universal Apps:

.NET >= 4.5 (including WPF)
Windows Phone (Silverlight + WinPRT) >= 8
Windows Store >= 8
Xamarin iOS + Android
WP7 and Silverlight are dropped in 6.0, use ReadSharp < 6.0, if you want to support them

Forked Dependencies

forks are included in the primary source code

Contributors


ceee

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.nuget		.nuget
Assets		Assets
PortablePorts		PortablePorts
ReadSharp.Tests		ReadSharp.Tests
ReadSharp		ReadSharp
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE-MIT		LICENSE-MIT
README.md		README.md
ReadSharp.sln		ReadSharp.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Install ReadSharp using NuGet

What's it all about?

Association with Pocket

Example

Options

HttpOptions

ReadOptions

Article Model

Article Image

Supported platforms

Forked Dependencies

Contributors

License

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

GSonofNun/ReadSharp

Folders and files

Latest commit

History

Repository files navigation

Install ReadSharp using NuGet

What's it all about?

Association with Pocket

Example

Options

HttpOptions

ReadOptions

Article Model

Article Image

Supported platforms

Forked Dependencies

Contributors

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages