Skip to content

FullText RSS issue with php links containing arguments #1185

@sonicnkt

Description

@sonicnkt

Hi,
Im not sure if this is a issue with selfoss itself or the graby library but i have an issue with scraping webpages that have php links to the articles like the following:
https://webpage.com/news.php?item=news&article=test

For some reason the page that will be scraped is the base php page https://webpage.com/news.php and the arguments are not really used for processing.

One example webpage would be phoronix.com:
https://www.phoronix.com/scan.php?page=news_item&px=Steam-Survey-April-2020

looking at the debug logs shows that graby tries to scrape the following link where it replaces the & symbol with the xml escape &:
https://www.phoronix.com/scan.php?page=news_item&px=Steam-Survey-April-2020

Im not sure if this "escaped" version is just displayed in the logs or if this is the url which gets processed but if it is that could be the issues im facing as trying to open this escaped link in a browser has the same results as im experiencing.

Like i said im not sure if this is a issue with selfoss or graby but i have to start investigating somwhere :)

PS: im currently not running the latest 2.19 Snapshots as i could not get this running on my system (Linux, NGINX), my current build is a few month older. I am really sorry if this issues was fixed allready.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions