Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Commit

Permalink
GeniusReader loader (#819)
Browse files Browse the repository at this point in the history
  • Loading branch information
ThibaudARoy authored Feb 7, 2024
1 parent b066238 commit 5263112
Show file tree
Hide file tree
Showing 5 changed files with 361 additions and 481 deletions.
90 changes: 90 additions & 0 deletions llama_hub/genius/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Genius Loader

This loader connects to the Genius API and loads lyrics, metadata, and album art into `Documents`.

As a prerequisite, you will need to register with [Genius API](https://genius.com/api-clients) and create an app in order to get a `client_id` and a `client_secret`. You should then set a `redirect_uri` for the app. The `redirect_uri` does not need to be functional. You should then generate an access token as an instantiator for the GeniusReader.

## Usage

Here's an example usage of the GeniusReader. It will retrieve songs that match specific lyrics. Acceptable agruments are lyrics (str): The lyric snippet you're looking for and will return List[Document]: A list of documents containing songs with those lyrics.

## GeniusReader Class Methods

### `load_artist_songs`

- **Description**: Fetches all or a specified number of songs by an artist.
- **Arguments**:
- `artist_name` (str): The name of the artist.
- `max_songs` (Optional[int]): Maximum number of songs to retrieve.
- **Returns**: List of `Document` objects with song lyrics.

### `load_all_artist_songs`

- **Description**: Fetches all songs of an artist and saves their lyrics.
- **Arguments**:
- `artist_name` (str): The name of the artist.
- **Returns**: List of `Document` objects with the artist's song lyrics.

### `load_artist_songs_with_filters`

- **Description**: Loads the most or least popular song of an artist based on filters.
- **Arguments**:
- `artist_name` (str): The artist's name.
- `most_popular` (bool): `True` for most popular song, `False` for least popular.
- `max_songs` (Optional[int]): Max number of songs to consider for popularity.
- `max_pages` (int): Max number of pages to fetch.
- **Returns**: `Document` with lyrics of the selected song.

### `load_song_by_url_or_id`

- **Description**: Loads a song by its Genius URL or ID.
- **Arguments**:
- `song_url` (Optional[str]): URL of the song on Genius.
- `song_id` (Optional[int]): ID of the song on Genius.
- **Returns**: List of `Document` objects with the song's lyrics.

### `search_songs_by_lyrics`

- **Description**: Searches for songs by a snippet of lyrics.
- **Arguments**:
- `lyrics` (str): Lyric snippet to search for.
- **Returns**: List of `Document` objects with songs matching the lyrics.

### `load_songs_by_tag`

- **Description**: Loads songs by a specific tag or genre.
- **Arguments**:
- `tag` (str): Tag or genre to search for.
- `max_songs` (Optional[int]): Max number of songs to fetch.
- `max_pages` (int): Max number of pages to fetch.
- **Returns**: List of `Document` objects with song lyrics.

```python
from llama_index import download_loader

GeniusReader = download_loader('GeniusReader')

access_token = "your_generated_access_token"

loader = GeniusReader(access_token)
documents = loader.search_songs_by_lyrics("Imagine")
```

## Example

This loader is designed to be used as a way to load data into [LlamaIndex](https://github.com/run-llama/llama_index/tree/main/llama_index) and/or subsequently used as a Tool in a [LangChain](https://github.com/hwchase17/langchain) Agent.

### LlamaIndex

```python
from llama_index import VectorStoreIndex, download_loader

GeniusReader = download_loader('GeniusReader')

access_token = "your_generated_access_token"

loader = GeniusReader(access_token)
documents = loader.search_songs_by_lyrics("Imagine")
index = VectorStoreIndex.from_documents(documents)
index.query('What artists have written songs that have the lyrics imagine in them?')
```
6 changes: 6 additions & 0 deletions llama_hub/genius/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Init file."""
from llama_hub.genius.base import (
GeniusReader,
)

__all__ = ["GeniusReader"]
153 changes: 153 additions & 0 deletions llama_hub/genius/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
"""Genius Reader."""
from typing import List, Optional
from llama_index.readers.base import BaseReader
from llama_index.readers.schema.base import Document


class GeniusReader(BaseReader):
"""GeniusReader for various operations with lyricsgenius."""

def __init__(self, access_token: str):
"""Initialize the GeniusReader with an access token."""
try:
import lyricsgenius
except ImportError:
raise ImportError(
"Please install lyricsgenius via 'pip install lyricsgenius'"
)
self.genius = lyricsgenius.Genius(access_token)

def load_artist_songs(
self, artist_name: str, max_songs: Optional[int] = None
) -> List[Document]:
"""Load all or a specified number of songs by an artist."""
artist = self.genius.search_artist(artist_name, max_songs=max_songs)
return [Document(text=song.lyrics) for song in artist.songs] if artist else []

def load_all_artist_songs(self, artist_name: str) -> List[Document]:
artist = self.genius.search_artist(artist_name)
artist.save_lyrics()
return [Document(text=song.lyrics) for song in artist.songs]

def load_artist_songs_with_filters(
self,
artist_name: str,
most_popular: bool = True,
max_songs: Optional[int] = None,
max_pages: int = 50,
) -> Document:
"""Load the most or least popular song of an artist.
Args:
artist_name (str): The artist's name.
most_popular (bool): True for most popular, False for least popular song.
max_songs (Optional[int]): Maximum number of songs to consider for popularity.
max_pages (int): Maximum number of pages to fetch.
Returns:
Document: A document containing lyrics of the most/least popular song.
"""
artist = self.genius.search_artist(artist_name, max_songs=1)
if not artist:
return None

songs_fetched = 0
page = 1
songs = []
while (
page
and page <= max_pages
and (max_songs is None or songs_fetched < max_songs)
):
request = self.genius.artist_songs(
artist.id, sort="popularity", per_page=50, page=page
)
songs.extend(request["songs"])
songs_fetched += len(request["songs"])
page = (
request["next_page"]
if (max_songs is None or songs_fetched < max_songs)
else None
)

target_song = songs[0] if most_popular else songs[-1]
song_details = self.genius.search_song(target_song["title"], artist.name)
return Document(text=song_details.lyrics) if song_details else None

def load_song_by_url_or_id(
self, song_url: Optional[str] = None, song_id: Optional[int] = None
) -> List[Document]:
"""Load song by URL or ID."""
if song_url:
song = self.genius.song(url=song_url)
elif song_id:
song = self.genius.song(song_id)
else:
return []

return [Document(text=song.lyrics)] if song else []

def search_songs_by_lyrics(self, lyrics: str) -> List[Document]:
"""Search for songs by a snippet of lyrics.
Args:
lyrics (str): The lyric snippet you're looking for.
Returns:
List[Document]: A list of documents containing songs with those lyrics.
"""
search_results = self.genius.search_songs(lyrics)
songs = search_results["hits"] if search_results else []

results = []
for hit in songs:
song_url = hit["result"]["url"]
song_lyrics = self.genius.lyrics(song_url=song_url)
results.append(Document(text=song_lyrics))

return results

def load_songs_by_tag(
self, tag: str, max_songs: Optional[int] = None, max_pages: int = 50
) -> List[Document]:
"""Load songs by a specific tag.
Args:
tag (str): The tag or genre to load songs for.
max_songs (Optional[int]): Maximum number of songs to fetch. If None, no specific limit.
max_pages (int): Maximum number of pages to fetch.
Returns:
List[Document]: A list of documents containing song lyrics.
"""
lyrics = []
total_songs_fetched = 0
page = 1

while (
page
and page <= max_pages
and (max_songs is None or total_songs_fetched < max_songs)
):
res = self.genius.tag(tag, page=page)
for hit in res["hits"]:
if max_songs is None or total_songs_fetched < max_songs:
song_lyrics = self.genius.lyrics(song_url=hit["url"])
lyrics.append(Document(text=song_lyrics))
total_songs_fetched += 1
else:
break
page = (
res["next_page"]
if max_songs is None or total_songs_fetched < max_songs
else None
)

return lyrics


if __name__ == "__main__":
access_token = ""
reader = GeniusReader(access_token)
# Example usage
print(reader.load_artist_songs("Chance the Rapper", max_songs=1))
1 change: 1 addition & 0 deletions llama_hub/genius/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
lyricsgenius
Loading

0 comments on commit 5263112

Please sign in to comment.