Hi,
Im not sure if this is a issue with selfoss itself or the graby library but i have an issue with scraping webpages that have php links to the articles like the following:
https://webpage.com/news.php?item=news&article=test
For some reason the page that will be scraped is the base php page https://webpage.com/news.php and the arguments are not really used for processing.
One example webpage would be phoronix.com:
https://www.phoronix.com/scan.php?page=news_item&px=Steam-Survey-April-2020
looking at the debug logs shows that graby tries to scrape the following link where it replaces the & symbol with the xml escape &:
https://www.phoronix.com/scan.php?page=news_item&px=Steam-Survey-April-2020
Im not sure if this "escaped" version is just displayed in the logs or if this is the url which gets processed but if it is that could be the issues im facing as trying to open this escaped link in a browser has the same results as im experiencing.
Like i said im not sure if this is a issue with selfoss or graby but i have to start investigating somwhere :)
PS: im currently not running the latest 2.19 Snapshots as i could not get this running on my system (Linux, NGINX), my current build is a few month older. I am really sorry if this issues was fixed allready.
Hi,
Im not sure if this is a issue with selfoss itself or the graby library but i have an issue with scraping webpages that have php links to the articles like the following:
https://webpage.com/news.php?item=news&article=testFor some reason the page that will be scraped is the base php page
https://webpage.com/news.phpand the arguments are not really used for processing.One example webpage would be phoronix.com:
https://www.phoronix.com/scan.php?page=news_item&px=Steam-Survey-April-2020looking at the debug logs shows that graby tries to scrape the following link where it replaces the & symbol with the xml escape
&:https://www.phoronix.com/scan.php?page=news_item&px=Steam-Survey-April-2020Im not sure if this "escaped" version is just displayed in the logs or if this is the url which gets processed but if it is that could be the issues im facing as trying to open this escaped link in a browser has the same results as im experiencing.
Like i said im not sure if this is a issue with selfoss or graby but i have to start investigating somwhere :)
PS: im currently not running the latest 2.19 Snapshots as i could not get this running on my system (Linux, NGINX), my current build is a few month older. I am really sorry if this issues was fixed allready.