Using AdBlock rules to remove elements

AdBlock Plus element hiding rules specify elements to exclude and are specified by CSS selectors. This is easily implemented in lxml, if somewhat slowly.

I'm using this in my own code to automatically remove social media share links from pages. You may want to consider including something similar in python-readablity.

EasyList is [dual licensed](https://easylist.adblockplus.org/en/about) Creative Commons Attribution-ShareAlike 3.0 Unported and GNU General Public License version 3. CC-BY-SA [looks compatible](http://www.apache.org/legal/resolved.html#cc-sa) with Apache licensed projects.
## Example

First download the rules:

```
$ wget https://easylist-downloads.adblockplus.org/fanboy-annoyance.txt
```

Then you can simply extract the CSS selectors to match against a document tree.

``` python
from lxml import html
from lxml.cssselect import CSSSelector

RULES_PATH = 'fanboy-annoyance.txt'
with open(RULES_PATH, 'r') as f:
    lines = f.read().splitlines()

# get elemhide rules (prefixed by ##) and create a CSSSelector for each of them
rules = [CSSSelector(line[2:]) for line in lines if line[:2] == '##']

def remove_ads(tree):
    for rule in rules:
        for matched in rule(tree):
            matched.getparent().remove(matched)

doc = html.document_fromstring("<html>...</html>")
remove_ads(doc)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using AdBlock rules to remove elements #43

Example

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using AdBlock rules to remove elements #43

Description

Example

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions