Add export for goodreads csv #63

donoftime · 2022-03-17T13:13:36Z

Thank you for implementing the api and cli clients - they were a joy to use, which is impressive given the lack of public-facing documentation!

The main use case that I have is to keep my audible library in sync with goodreads. I threw together a minimal script to get the job done last night (https://github.com/donoftime/audible-goodreads), but I think that could be made generally useful as well. Given there is a dedicated support page for the lack of integration, I imagine it is something others would like to see as well: https://help.goodreads.com/s/article/Can-I-link-my-Goodreads-and-Audible-accounts

If I get a chance, I'll circle back and make it a plugin like the other examples in the readme. But I thought I would bring it up here in case someone else gets to it before me!

mkb79 · 2022-03-17T15:46:29Z

Hi,

extensions for audible-cli are welcome. Your script looks good.

Due to your additional dependencies (pandas and isbntools), I would prefer you write a plugin package instead of a plugin script. Thus, all dependencies are properly installed. And you do not need to invoke another command to get the library.json data first!

Below is a POC plugin for you. The Problem is in my case, that the file only has headlines but no content! The pd DataFrame has the library! But it seams something is going wrong after this!

FYI: Most of library items have an isbn. Maybe you can reuse this?! I have added them using the isbn_api key.

import asyncio
import logging
import pathlib

import audible
import click
from audible_cli.config import pass_session
from audible_cli.models import Library
from pandas import DataFrame, to_datetime
from isbntools.app import isbn_from_words


logger = logging.getLogger("audible_cli.cmds.cmd_goodreads-transform")


@click.command("goodreads-transform")
@click.option(
    "--output", "-o",
    type=click.Path(path_type=pathlib.Path),
    default=pathlib.Path().cwd() / "library.csv",
    show_default=True,
    help="output file"
)
@click.option(
    "--timeout", "-t",
    type=click.INT,
    default=10,
    show_default=True,
    help=(
        "Increase the timeout time if you got any TimeoutErrors. "
        "Set to 0 to disable timeout."
    )
)
@click.option(
    "--bunch-size",
    type=click.IntRange(10, 1000),
    default=1000,
    show_default=True,
    help="How many library items should be requested per request. A lower "
         "size results in more requests to get the full library. A higher "
         "size can result in a TimeOutError on low internet connections."
)
@pass_session
def cli(session, **params):
    """YOUR COMMAND DESCRIPTION"""
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(_goodreads_transform(session.auth, **params))
    finally:
        loop.run_until_complete(loop.shutdown_asyncgens())
        loop.close()


async def _goodreads_transform(auth, **params):
    output = params.get("output")

    logger.debug("fetching library")
    library = await _get_library(auth, **params)

    logger.debug("prepare library")
    # prepared library items now have a isbn key
    library = _prepare_library_for_export(library)

    logger.debug("Creating DataFrame")
    library = DataFrame.from_dict(library)

    original_columns = library.columns
    
    library['isbn'] = library.apply(lambda x : isbn_from_words(x.title + " " + x.authors) or None, axis=1)
    library["Date Added"] = library.apply(lambda x: to_datetime(x["date_added"], format='%Y-%m-%d', exact=False).strftime('%Y-%m-%d'), axis=1)
    library["Date Read"] = library.apply(lambda x: to_datetime(x["date_added"], format='%Y-%m-%d', exact=False).strftime('%Y-%m-%d') if x["is_finished"] == True else None, axis=1)
    library['Title'] = library.apply(lambda x : x.title, axis=1)
    
    library.drop(columns=original_columns, inplace=True)
    library.dropna(subset=['isbn', 'Date Read'], inplace=True)

    library.to_csv(output, index=False)
    logger.info(f"File saved to {output}")


async def _get_library(auth, **params):
    timeout = params.get("timeout")
    if timeout == 0:
        timeout = None

    bunch_size = params.get("bunch_size")

    async with audible.AsyncClient(auth, timeout=timeout) as client:
        # added product_detail to response_groups to obtain isbn
        library = await Library.from_api_full_sync(
            client,
            response_groups=(
                "contributors, media, price, product_attrs, product_desc, "
                "product_extended_attrs, product_plan_details, product_plans, "
                "rating, sample, sku, series, reviews, ws4v, origin, "
                "relationships, review_attrs, categories, badge_types, "
                "category_ladders, claim_code_url, is_downloaded, "
                "is_finished, is_returnable, origin_asin, pdf_url, "
                "percent_complete, provided_review, product_details"
            ),
            bunch_size=bunch_size
        )
    return library


def _prepare_library_for_export(library):
    keys_with_raw_values = (
        "asin", "title", "subtitle", "runtime_length_min", "is_finished",
        "percent_complete", "release_date"
    )

    prepared_library = []

    for item in library:
        data_row = {}
        for key in item:
            v = getattr(item, key)
            if v is None:
                pass
            elif key in keys_with_raw_values:
                data_row[key] = v
            elif key in ("authors", "narrators"):
                data_row[key] = ", ".join([i["name"] for i in v])
            elif key == "series":
                data_row["series_title"] = v[0]["title"]
                data_row["series_sequence"] = v[0]["sequence"]
            elif key == "rating":
                overall_distributing = v.get("overall_distribution") or {}
                data_row["rating"] = overall_distributing.get(
                    "display_average_rating", "-")
                data_row["num_ratings"] = overall_distributing.get(
                    "num_ratings", "-")
            elif key == "library_status":
                data_row["date_added"] = v["date_added"]
            elif key == "product_images":
                data_row["cover_url"] = v.get("500", "-")
            elif key == "category_ladders":
                genres = []
                for genre in v:
                    for ladder in genre["ladder"]:
                        genres.append(ladder["name"])
                data_row["genres"] = ", ".join(genres)
            # added isbn to exported values
            elif key == "isbn":
                data_row["isbn_api"] = v

        prepared_library.append(data_row)

    prepared_library.sort(key=lambda x: x["asin"])

    return prepared_library

mkb79 · 2022-03-17T16:22:49Z

To make a package, put the content above in goodreads_transform.py and add the following to pyproject.toml:

[tool.poetry.plugins."audible.cli_plugins"]
"godreads-transform" = "goodreads_transform:cli"

This should integrate your package in audible-cli and add the new command audible goodreads-transform.

mkb79 · 2022-03-18T05:30:57Z

I had some minutes and worked a bit on the script. I reduced the response_groups when fetching the library to the minimum, removed pandas (I can’t use pandas on Pythonista for iOS) and reuse isbns provided by the API.

import asyncio
import csv
import logging
import pathlib
from datetime import datetime, timezone

import audible
import click
from audible_cli.config import pass_session
from audible_cli.models import Library
from isbntools.app import isbn_from_words


logger = logging.getLogger("audible_cli.cmds.cmd_goodreads-transform")


@click.command("goodreads-transform")
@click.option(
    "--output", "-o",
    type=click.Path(path_type=pathlib.Path),
    default=pathlib.Path().cwd() / "library.csv",
    show_default=True,
    help="output file"
)
@click.option(
    "--timeout", "-t",
    type=click.INT,
    default=10,
    show_default=True,
    help=(
        "Increase the timeout time if you got any TimeoutErrors. "
        "Set to 0 to disable timeout."
    )
)
@click.option(
    "--bunch-size",
    type=click.IntRange(10, 1000),
    default=1000,
    show_default=True,
    help="How many library items should be requested per request. A lower "
         "size results in more requests to get the full library. A higher "
         "size can result in a TimeOutError on low internet connections."
)
@pass_session
def cli(session, **params):
    """YOUR COMMAND DESCRIPTION"""
    loop = asyncio.get_event_loop()
    try:
        loop.run_until_complete(_goodreads_transform(session.auth, **params))
    finally:
        loop.run_until_complete(loop.shutdown_asyncgens())
        loop.close()


async def _goodreads_transform(auth, **params):
    output = params.get("output")

    logger.debug("fetching library")
    library = await _get_library(auth, **params)

    logger.debug("prepare library")
    library = _prepare_library_for_export(library)

    logger.debug("write data rows to file")
    with output.open("w", encoding="utf-8", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["isbn", "Date Added", "Date Read", "Title"])

        for row in library:
            writer.writerow(row)

    logger.info(f"File saved to {output}")


async def _get_library(auth, **params):
    timeout = params.get("timeout")
    if timeout == 0:
        timeout = None

    bunch_size = params.get("bunch_size")

    async with audible.AsyncClient(auth, timeout=timeout) as client:
        # added product_detail to response_groups to obtain isbn
        library = await Library.from_api_full_sync(
            client,
            response_groups=(
                "product_details, contributors, is_finished, product_desc"
            ),
            bunch_size=bunch_size
        )
    return library


def _prepare_library_for_export(library):
    prepared_library = []

    isbn_counter = 0
    isbn_api_counter = 0
    isbn_no_result_counter = 0
    skipped_items = 0

    for i in library:
        title = i.title
        authors = i.authors
        if authors is not None:
            authors = ", ".join([a["name"] for a in authors])
        is_finished = i.is_finished
        
        isbn = i.isbn
        if isbn is None:
            isbn_counter += 1
            isbn = isbn_from_words(f"{title} {authors}") or None
            if isbn is None:
                isbn_no_result_counter += 1
        else:
            isbn_api_counter += 1

        date_added = i.library_status
        if date_added is not None:
            date_added = date_added["date_added"]
            date_added = datetime.strptime(
                date_added, '%Y-%m-%dT%H:%M:%S.%fZ'
            ).replace(tzinfo=timezone.utc).astimezone()    
            date_added = date_added.astimezone().date().isoformat()

        date_read = None
        if is_finished:
            date_read = date_added

        if isbn and date_read:
            data_row = [isbn, date_added, date_read, title]
            prepared_library.append(data_row)
        else:
            skipped_items += 1

    logger.debug(f"{isbn_api_counter} isbns from API")
    logger.debug(f"{isbn_counter} isbns requested with isbntools")
    logger.debug(f"{isbn_no_result_counter} isbns without a result")
    logger.debug(f"{skipped_items} title skipped due to no isbn for title found or title not read")

    return prepared_library

mkb79 · 2022-03-22T20:21:33Z

@donoftime Hi. Is it okay to you, if I add the latest script above to my plugin script examples?

donoftime · 2022-03-27T02:30:56Z

Hi @mkb79, I apologize for the delay - I haven't had a chance to circle back on fun projects till tonight.

By all means, please feel free to add the above script to the examples!

Once the example script is up, I will add a disclaimer at the top of my repo to redirect anyone else that stumbles over it to your plugin script instead.

Thanks again for the great tools!

mkb79 added the enhancement New feature or request label Mar 22, 2022

csandman mentioned this issue Feb 9, 2023

Feature Request: Original Publish Date laxamentumtech/audnexus#248

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add export for goodreads csv #63

Add export for goodreads csv #63

donoftime commented Mar 17, 2022

mkb79 commented Mar 17, 2022

mkb79 commented Mar 17, 2022

mkb79 commented Mar 18, 2022

mkb79 commented Mar 22, 2022

donoftime commented Mar 27, 2022

Add export for goodreads csv #63

Add export for goodreads csv #63

Comments

donoftime commented Mar 17, 2022

mkb79 commented Mar 17, 2022

mkb79 commented Mar 17, 2022

mkb79 commented Mar 18, 2022

mkb79 commented Mar 22, 2022

donoftime commented Mar 27, 2022