Skip to content
This repository has been archived by the owner on Mar 5, 2022. It is now read-only.

Fix metadata extraction of YouTube results #302

Merged
merged 1 commit into from
Nov 16, 2019

Conversation

zmwangx
Copy link
Collaborator

@zmwangx zmwangx commented Nov 16, 2019

Recently noticed that metadata in YouTube results isn't parsed correctly. To add insult to injury, the semantically important <br> tag is dropped amongst other semantically unimportant ones, causing metadata and abstract to be joined without a space in between.

Before:

$ googler --np gangnam style

 1.  PSY- Gangnam Style (Official Music Video) - YouTube
     https://www.youtube.com/watch?v=CH1XGdu-hzQ
     23 Oct 2012 - 5 min - Uploaded by DanceGangnamStyle50+ videos Play all Mix - PSY- Gangnam Style (Official Music Video)YouTube ·  Darci Lynne ...

After:

$ googler --np gangnam style

 1.  PSY- Gangnam Style (Official Music Video) - YouTube
     https://www.youtube.com/watch?v=CH1XGdu-hzQ
     23 Oct 2012, 5 min, Uploaded by DanceGangnamStyle
     50+ videos Play all Mix - PSY- Gangnam Style (Official Music Video)YouTube ·  Darci Lynne ...

This is a slightly risky change since metadata criterion changed from .slp to .f (metadata in the traditional setting is always .f.slp in my experience). Not sure if there's regional variance.

Before:

    $ googler --np gangnam style

     1.  PSY- Gangnam Style (Official Music Video) - YouTube
         https://www.youtube.com/watch?v=CH1XGdu-hzQ
         23 Oct 2012 - 5 min - Uploaded by DanceGangnamStyle50+ videos Play all Mix - PSY- Gangnam Style (Official Music Video)YouTube ·  Darci Lynne ...

After:

    $ googler --np gangnam style

     1.  PSY- Gangnam Style (Official Music Video) - YouTube
         https://www.youtube.com/watch?v=CH1XGdu-hzQ
         23 Oct 2012, 5 min, Uploaded by DanceGangnamStyle
         50+ videos Play all Mix - PSY- Gangnam Style (Official Music Video)YouTube ·  Darci Lynne ...
@jarun
Copy link
Owner

jarun commented Nov 16, 2019

Awesome!

@jarun jarun merged commit a523c12 into jarun:master Nov 16, 2019
@zmwangx
Copy link
Collaborator Author

zmwangx commented Nov 16, 2019

Hope it doesn't cause any regression.

@zmwangx zmwangx deleted the youtube-metadata-extraction branch November 16, 2019 04:12
@lock lock bot locked and limited conversation to collaborators Jan 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants