Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Original Publish Date #248

Closed
csandman opened this issue Feb 25, 2022 · 54 comments
Closed

Feature Request: Original Publish Date #248

csandman opened this issue Feb 25, 2022 · 54 comments
Labels
question Further information is requested

Comments

@csandman
Copy link

One thing which I have not been able to find a consistent source for is an original publish date for audiobooks. I'm not talking about the date an audiobook was released, I'm talking specifically about the date the first edition of a book was released. In terms of organizing an Audiobook collection (in my case, in Plex), it is generally far more convenient to allow things to be sorted by when the books were published. This is especially true for books in a series which may have multiple releases for their audiobook versions.

I know GoodReads has this info, but I also know they closed their API access. However, it's surely still possible to scrape the info from their website with something like cheerio. I know Readarr has GoodReads scraping integration but I haven't had a chance to look through their code for how they do it yet.

There is also the Google Books API, specifically the volume search which is a convenient way to search the Google books library with a JSON response, but they don't include the original publish date in that response. Which is weird because they do offer that info on their book pages. It could be realistic to use their API to find a book match and then use the books ID to scrape the Google Books page for that book.

Anyway, I'm not sure if attempting to scrape more sources like this is in the scope for this project, I've just been thinking that the original publish date for a book is one of the only things I haven't been able to get from Audible that is actually useful to have. I'm curious if you have any thoughts on the topic!

@djdembeck
Copy link
Collaborator

djdembeck commented Feb 28, 2022

This isn’t of real interest, as, in my opinion, the date the physical book was published is not relevant data to audiobooks. They are very different mediums, and should not be globbed together.

I will concede date of first publish of an audiobook could be interesting, but that’s something which requires a release group format of metadata. This is something which AudiobookDB will be able to handle.

Scraping implementations are bound to fail, it’s always just a matter of time. This is why Audnexus is an API first, scraping second approach, with massive safeguards when scraping.

@djdembeck djdembeck added the question Further information is requested label Feb 28, 2022
@csandman
Copy link
Author

csandman commented May 31, 2022

Just wanted to post an update on my opinion on this. The main reason I feel as though the original publish date of a book is useful is that, in any real world use of this tool, sorting by the original publish date is the only way to guarantee books are sorted in the correct order, especially in a series. Audible frequently removes original versions of audiobooks in favor of "movie editions", which often aren't any different from an original version except for the cover. If you try to use the release date of the movie version to sort a series of books in the order you should read them, you'll often end up with the movie edition of the first book in a series last, or towards the end.

For a practical example, I'd use the Plex metadata agent that is based on this tool. I'd argue that sorting audiobooks on Plex simply by the date that an audiobook version of a book is released, is almost entirely useless compared to when the book was originally published.

And as far as how to get that info on a book, I discovered that Readarr (which I linked in my original post) actually includes their API key for the Goodreads API in the source code, albeit in an obfuscated way. I tried implementing that in my own project for tagging audiobook files with proper metadata and it works like a charm. I'm not necessarily encouraging using theirs, but if you could get your hands on a Goodreads API key, they definitely provide that information with an API first approach like you said.

@djdembeck
Copy link
Collaborator

Hmm, you make a good point there. I'm not opposed to including original publication date where available. I would probably just not make it the default sort.

My concern with even integrating Goodreads is they've made it clear they don't intend to support the public API moving forward, so the rug could be pulled from under us at any moment. I also wouldn't want to put Readarr at risk of having their key revoked, as I'm not sure what if any TOS they agreed to in getting the key.

I'll still take a look at their implementation to see if I can glean any knowledge.

@csandman
Copy link
Author

I understand not making it the default, my main point was just that its an important field to have in terms of overall audiobook metadata.

And definitely understandable not wanting to yoink their API key, I've only used it in a project that isn't public and I'm only doing so because they're confident enough in using the same API key for each self hosted instance of their app that is out there.

And if you're at all curious what the results from their API looks like, here is an example request: https://gist.github.com/csandman/f86dabe760a90477504c1a15fcada874

Unfortunately the response is in XML, but I was able to use the xml2js to parse it to a usable format in TS.


As far as a them closing their API, I definitely understand not wanting to rely on an API that is planned to be removed. I've been keeping my eyes out for any alternatives that offer the same feature but haven't yet found one. If I do though, I'll definitely post any updates.

@djdembeck
Copy link
Collaborator

djdembeck commented May 31, 2022

XML shouldn't be too big of a problem. I forgot they're distributing the API key, so I wouldn't be in breach of TOS since it's on my system already (I think).

I'll see if I can get some preliminary testing going this week.

As for other services, AudiobookDB is almost polished enough for public usage (frontend still needs to be written), so keep an eye out for that when it's opened to public 😏 It has Audnexus integration as well for import logic, so it makes sense to add Goodreads here first and then over there.

@csandman
Copy link
Author

Sounds good! So what exactly is AudiobookDB? Is there a public repo for it yet? Is it just supposed to be a collection of data acquired from audnexus essentially?

@csandman
Copy link
Author

csandman commented May 31, 2022

Also, here's an example of how I'm parsing the Goodreads api in TypeScript in case it could offer you any ideas: https://gist.github.com/csandman/dba05dc48f29592d0db535282c00a2af

I made it a little hastily but its mostly effective.

@mkb79
Copy link

mkb79 commented May 31, 2022

@csandman

And as far as how to get that info on a book, I discovered that Readarr (which I linked in my original post) actually includes their API key for the Goodreads API in the source code, albeit in an obfuscated way. I tried implementing that in my own project for tagging audiobook files with proper metadata and it works like a charm. I'm not necessarily encouraging using theirs, but if you could get your hands on a Goodreads API key, they definitely provide that information with an API first approach like you said.

With some knowledge, you can obtain an access token by make a request to https://www.goodreads.com/oauth/grant_access_token.xml. Goodreads use the https://api.amazon.com/auth/register endpoint to register a new goodreads device. These way you got a private Amazon access token which can be used to authenticate your requests to the Goodreads API and the x-amz-access-token header instead using the key in the url query.

@csandman
Copy link
Author

@mkb79 any chance you could give more details on what you're describing? do you still need a goodreads API key in the first place to make that work?

@mkb79
Copy link

mkb79 commented May 31, 2022

@csandman

@mkb79 any chance you could give more details on what you're describing? do you still need a goodreads API key in the first place to make that work?

You only need your Goodreads username/password to register a new Goodreads device and obtain your access and refresh token!

@mkb79
Copy link

mkb79 commented May 31, 2022

@csandman
I can give you more details, but I don’t want to spam this issue.

@csandman
Copy link
Author

csandman commented Jun 1, 2022

I am definitely curious about the exact process, because I can't seem to find any up to date resources on it. Overall it could be helpful for this issue as well so you could post it here, but otherwise you could post more details on one of the gists above if you'd like.

@mkb79
Copy link

mkb79 commented Jun 1, 2022

@csandman

Here are a proof-of-concept how registration and deregistration works:

import base64
import gzip
import hashlib
import hmac
import json
import secrets
import uuid
from datetime import datetime
from io import BytesIO
from functools import partialmethod
from typing import Tuple, Union

import httpx
from pyaes import AESModeOfOperationCBC, Encrypter, Decrypter


USER_AGENT = "AmazonWebView/GoodreadsForIOS App/4.0.1/iOS/15.4.1/iPhone"
FRC_SIG_SALT: bytes = b"HmacSHA256"
FRC_AES_SALT: bytes = b"AES/CBC/PKCS7Padding"


class FrcCookieHelper:
    def __init__(self, password: str) -> None:
        self.password = password.encode()

    def _get_key(self, salt: bytes) -> bytes:
        return hashlib.pbkdf2_hmac("sha1", self.password, salt, 1000, 16)

    get_signature_key = partialmethod(_get_key, FRC_SIG_SALT)

    get_aes_key = partialmethod(_get_key, FRC_AES_SALT)

    @staticmethod
    def unpack(frc: str) -> Tuple[bytes, bytes, bytes]:
        pad = (4 - len(frc) % 4) * "="
        frc = BytesIO(base64.b64decode(frc+pad))
        frc.seek(1)  # the first byte is always 0, skip them
        return frc.read(8), frc.read(16), frc.read()  # sig, iv, data

    @staticmethod
    def pack(sig: bytes, iv: bytes, data: bytes) -> str:
        frc = b"\x00" + sig[:8] + iv[:16] + data
        frc = base64.b64encode(frc).strip(b"=")
        return frc.decode()

    def verify_signature(self, frc: str) -> bool:
        key = self.get_signature_key()
        sig, iv, data = self.unpack(frc)
        new_signature = hmac.new(key, iv + data, hashlib.sha256).digest()
        return sig == new_signature[:len(sig)]

    def decrypt(self, frc: str, verify_signature: bool = True) -> bytes:
        if verify_signature:
            self.verify_signature(frc)

        key = self.get_aes_key()
        sig, iv, data = self.unpack(frc)
        decrypter = Decrypter(AESModeOfOperationCBC(key, iv))
        decrypted = decrypter.feed(data) + decrypter.feed()
        decompressed = gzip.decompress(decrypted)
        return decompressed

    def encrypt(self, data: Union[str, dict]) -> str:
        if isinstance(data, dict):
            data = json.dumps(data, indent=2, separators=(",", " : ")).replace("/", "\\/").encode()

        compressed = BytesIO()
        with gzip.GzipFile(fileobj=compressed, mode="wb", mtime=False) as f:
            f.write(data)
        compressed.seek(8)
        compressed.write(b"\x00\x13")
        compressed = compressed.getvalue()
        
        key = self.get_aes_key()
        iv = secrets.token_bytes(16)
        encrypter = Encrypter(AESModeOfOperationCBC(key, iv))
        encrypted = encrypter.feed(compressed) + encrypter.feed()

        key = self.get_signature_key()
        signature = hmac.new(key, iv + encrypted, hashlib.sha256).digest()

        packed = self.pack(signature, iv, encrypted)
        return packed + len(packed) % 4 * "="


def register(username, password):
    url = "https://api.amazon.com/auth/register"

    device_serial = secrets.token_hex(16).upper()
    frc = {
        "ApplicationVersion": "4.1",
        "DeviceOSVersion": "iOS/15.5",
        "ScreenWidthPixels": "428",
        "TimeZone": "+02:00",
        "ScreenHeightPixels": "926",
        "ApplicationName": "Goodreads",
        "DeviceJailbroken": False,
        "DeviceLanguage": "en-DE",
        "DeviceFingerprintTimestamp": round(datetime.utcnow().timestamp()) * 1000,
        "ThirdPartyDeviceId": str(uuid.uuid4()).upper(),
        "DeviceName": "iPhone",
        "Carrier": "Vodafone.de"
    }
    frc = FrcCookieHelper(device_serial).encrypt(frc)

    headers = {
        "x-amzn-identity-auth-domain": "goodreads.com",
        "User-Agent": USER_AGENT,
        "Accept-Encoding": "gzip",
        "Accept": "application/json",
        "Accept-Language": "en-DE",
        "Accept-Charset": "utf-8"
    }

    json_body = {
        "requested_extensions": [
            "device_info",
            "customer_info"
        ],
        "cookies": {
            "website_cookies": [],
            "domain": ".goodreads.com"
        },
        "registration_data": {
            "domain": "Device",
            "app_version": "4.1",
            "device_type": "A3NWHXTQ4EBCZS",
            "os_version": "15.5",
            "device_serial": device_serial,
            "device_model": "iPhone",
            "app_name": "GoodreadsForIOS App",
            "software_version": "1"
        },
        "auth_data": {
            "user_id_password": {
                "user_id": username,
                "password": password
            }
        },
        "user_context_map": {
            "frc": frc
        },
        "requested_token_type": [
            "bearer",
            "mac_dms",
            "website_cookies"
        ]
    }
    
    r = httpx.post(url, headers=headers, json=json_body)
    return r


def deregister(access_token):
    json_body = {"deregister_all_existing_accounts": True}
    headers = {"Authorization": f"Bearer {access_token}"}

    r = httpx.post(
        "https://api.amazon.com/auth/deregister",
        json=json_body,
        headers=headers
    )
    return r


def refresh_access_token(refresh_token):
    pass


def exchange_cookies(refresh_token):
    pass

You are need httpx and pyaes installed from PyPI. Refreshing the access token and obtaining additional cookies must be implemented.

Then you can request the Goodreads API using x-amz-access-token ACCESS_TOKEN and User-Agent Goodreads/4.0.1 (iPhone; iOS 15.4.1; Scale/3.00) in your request headers.

@csandman
Copy link
Author

csandman commented Jun 1, 2022

Thanks for the detailed example! Now time to see if I can translate this to node...

@mkb79
Copy link

mkb79 commented Jun 1, 2022

Thanks for the detailed example! Now time to see if I can translate this to node...

@csandman Have you tried this code out. If yes, could you successfully register/unregister a device? I've tested this only on my machine.

@csandman
Copy link
Author

csandman commented Jun 1, 2022

haven't tried unregister yet but it does appear to be working! The token you're talking about I assume is the response.success.tokens.bearer.access_token right?

I'm also close to finishing a node version, but idk if I did it right. Translating all this buffer manipulation stuff is always a pain haha.

@mkb79
Copy link

mkb79 commented Jun 1, 2022

Yes, this is the access token. The token is valid for 60 minutes after registration. Before the token is invalid, you have to deregister or refresh the token with the refresh token. Refreshing token is easy with the correct request headers, params and body.

@djdembeck
Copy link
Collaborator

I'm having quite the time writing this in TS, since I've never worked with python's byte type and TS doesn't have the same type. @csandman let me know if you get a working POC on this. In the meantime I'll integrate using the API key directly.

@csandman
Copy link
Author

csandman commented Jun 1, 2022

Will do, I feel like I'm close but for some reason it's still not working. I've done similar conversion of python code to TS code before, and I've found using the Nodejs Buffer instead of Python's byte type is generally the way to go. If I can't figure out what's wrong just from tinkering then I'll probably end up breaking it down into pieces to figure out which part went wrong. It's just tough to check for comparisons when multiple values are randomly generated each time.

Here's what I have so far if you want to check it out:


EDIT: I ended up figuring it out! Man that was rough, had to learn way more about how the crypto module from Node works than I thought I ever would haha. But in the end, I was able to get it all working with only built in node modules (besides node-fetch but you can replace that part with whatever you want). I decided to replace this code to keep this post from getting ridiculously long.

Let me know if you have any trouble getting it working!

// types/goodreads.ts
export interface GoodreadsFrc {
  ApplicationVersion: string;
  DeviceOSVersion: string;
  ScreenWidthPixels: string;
  TimeZone: string;
  ScreenHeightPixels: string;
  ApplicationName: string;
  DeviceJailbroken: boolean;
  DeviceLanguage: string;
  DeviceFingerprintTimestamp: number;
  ThirdPartyDeviceId: string;
  DeviceName: string;
  Carrier: string;
}

export interface GoodreadsRegisterRequest {
  requested_extensions: string[];
  cookies: {
    website_cookies: string[];
    domain: string;
  };
  registration_data: {
    domain: string;
    app_version: string;
    device_type: string;
    os_version: string;
    device_serial: string;
    device_model: string;
    app_name: string;
    software_version: string;
  };
  auth_data: {
    user_id_password: {
      user_id: string;
      password: string;
    };
  };
  user_context_map: {
    frc: string;
  };
  requested_token_type: string[];
}

export interface GoodreadsRegisterFailureResponse {
  response: {
    challenge: {
      challenge_reason: string;
      uri: string;
      required_authentication_method: string;
    };
  };
  request_id: string;
}

export interface GoodreadsRegisterSuccessResponse {
  response: {
    success: {
      extensions: {
        device_info: {
          device_name: string;
          device_serial_number: string;
          device_type: string;
        };
        customer_info: {
          account_pool: string;
          preferred_marketplace: string;
          country_of_residence: string;
          user_id: string;
          home_region: string;
          name: string;
          given_name: string;
          source_of_country_of_residence: string;
        };
      };
      tokens: {
        mac_dms: {
          device_private_key: string;
          adp_token: string;
        };
        bearer: {
          access_token: string;
          refresh_token: string;
          expires_in: string;
        };
      };
      customer_id: string;
    };
  };
  request_id: string;
}

export interface GoodreadsDeregisterFailureResponse {
  response: {
    error: {
      code: string;
      message: string;
    };
  };
  request_id: string;
}

export interface GoodreadsDeregisterSuccessResponse {
  response: {
    success: Record<string, never>;
  };
  request_id: string;
}

export interface GoodreadsRefreshFailureResponse {
  error_index: string;
  error_description: string;
  error: string;
}

export interface GoodreadsRefreshSuccessResponse {
  access_token: string;
  token_type: string;
  expires_in: number;
}

export interface GoodreadsCookieFailureResponse {
  response: {
    error: {
      code: string;
      detail: string;
      message: string;
    };
  };
  request_id: string;
}

export interface GoodreadsCookie {
  Path: string;
  Secure: boolean;
  Value: string;
  Expires: string;
  HttpOnly: boolean;
  Name: string;
}

export interface GoodreadsCookieSuccessResponse {
  response: {
    tokens: {
      cookies: {
        ".goodreads.com": GoodreadsCookie[];
      };
    };
  };
  request_id: string;
}
import {
  createCipheriv,
  createDecipheriv,
  createHmac,
  pbkdf2Sync,
  randomBytes,
  randomFill,
  randomUUID,
} from "crypto";
import fetch from "node-fetch";
import type {
  GoodreadsCookieFailureResponse,
  GoodreadsCookieSuccessResponse,
  GoodreadsDeregisterFailureResponse,
  GoodreadsDeregisterSuccessResponse,
  GoodreadsFrc,
  GoodreadsRefreshFailureResponse,
  GoodreadsRefreshSuccessResponse,
  GoodreadsRegisterFailureResponse,
  GoodreadsRegisterRequest,
  GoodreadsRegisterSuccessResponse,
} from "types/goodreads";
import { promisify } from "util";
import { gunzipSync, gzipSync } from "zlib";

const REGISTER_URL = "https://api.amazon.com/auth/register";
const DEREGISTER_URL = "https://api.amazon.com/auth/deregister";
const REFRESH_URL = "https://api.amazon.com/auth/token";
const COOKIES_URL = "https://api.amazon.com/ap/exchangetoken/cookies";
const USER_AGENT = "AmazonWebView/GoodreadsForIOS App/4.0.1/iOS/15.4.1/iPhone";

class FrcCookieHelper {
  static FRC_SIG_SALT = Buffer.from("HmacSHA256");

  static FRC_AES_SALT = Buffer.from("AES/CBC/PKCS7Padding");

  static CIPHER_ALGORITHM = "aes-128-cbc";

  password: string;

  constructor(password: string) {
    this.password = password;
  }

  getKey(salt: Buffer) {
    return pbkdf2Sync(this.password, salt, 1000, 16, "sha1");
  }

  getSignatureKey() {
    return this.getKey(FrcCookieHelper.FRC_SIG_SALT);
  }

  getAesKey() {
    return this.getKey(FrcCookieHelper.FRC_AES_SALT);
  }

  static getRandomIv() {
    return new Promise<Uint8Array>((resolve, reject) => {
      randomFill(new Uint8Array(16), (err, arr) => {
        if (err) {
          reject(err);
        }

        resolve(arr);
      });
    });
  }

  static unpack(frc: string): [Buffer, Buffer, Buffer] {
    const pad = "=".repeat(4 - (frc.length % 4));
    const newFrc = Buffer.from(frc + pad, "base64");
    const sig = newFrc.slice(1, 9);
    const iv = newFrc.slice(9, 25);
    const data = newFrc.slice(26);
    return [sig, iv, data];
  }

  static pack(sig: Buffer, iv: Uint8Array, data: Buffer): string {
    let frc = Buffer.concat([Buffer.from([0x00]), sig.slice(0, 8), iv, data]);
    const rem = Buffer.from("=");
    while (frc.indexOf(rem) === 0) {
      frc = frc.slice(1);
    }
    while (frc.lastIndexOf(rem) === frc.length - 1) {
      frc = frc.slice(0, frc.length - 1);
    }
    return frc.toString("base64");
  }

  verifySignature(frc: string): boolean {
    const key = this.getSignatureKey();
    const [sig, iv, data] = FrcCookieHelper.unpack(frc);
    const hmac = createHmac("sha256", key);
    hmac.write(Buffer.concat([iv, data]));
    const newSignature = hmac.digest();
    return sig === newSignature.slice(0, sig.length);
  }

  decrypt(frc: string, verifySignature = true): Buffer {
    if (verifySignature) {
      this.verifySignature(frc);
    }

    const key = this.getAesKey();
    const [, iv, data] = FrcCookieHelper.unpack(frc);

    const decipher = createDecipheriv(
      FrcCookieHelper.CIPHER_ALGORITHM,
      key,
      iv
    );
    const decrypted = decipher.update(data);
    const decryptedFinal = decipher.final();
    const decryptedData = Buffer.concat([decrypted, decryptedFinal]);

    return gunzipSync(decryptedData);
  }

  async encrypt(data: string | GoodreadsFrc): Promise<string> {
    let dataStr: string;

    if (typeof data === "object") {
      dataStr = JSON.stringify(data, null, 2);
    } else {
      dataStr = data;
    }

    const zip = gzipSync(dataStr);
    const gzippedData = Buffer.concat([
      zip.slice(0, 8),
      Buffer.from([0x00, 0x13]),
      zip.slice(10),
    ]);

    const aesKey = this.getAesKey();
    const iv = await FrcCookieHelper.getRandomIv();
    const cipher = createCipheriv(FrcCookieHelper.CIPHER_ALGORITHM, aesKey, iv);
    const encrypted = cipher.update(gzippedData);
    const encryptedFinal = cipher.final();
    const encryptedData = Buffer.concat([encrypted, encryptedFinal]);

    const sigKey = this.getSignatureKey();
    const hmac = createHmac("sha256", sigKey);
    hmac.write(Buffer.concat([iv, encryptedData]));
    const signature = hmac.digest();

    const packed = FrcCookieHelper.pack(signature, iv, encryptedData);

    return packed + "=".repeat(packed.length % 4);
  }
}

const randomBytesAsync = promisify(randomBytes);

export const register = async (username: string, password: string) => {
  const deviceSerial = await randomBytesAsync(16).then((buf) =>
    buf.toString("hex").toUpperCase()
  );

  const frcBase: GoodreadsFrc = {
    ApplicationVersion: "4.1",
    DeviceOSVersion: "iOS/15.5",
    ScreenWidthPixels: "428",
    TimeZone: "+02:00",
    ScreenHeightPixels: "926",
    ApplicationName: "Goodreads",
    DeviceJailbroken: false,
    DeviceLanguage: "en-DE",
    DeviceFingerprintTimestamp: new Date().getTime(),
    ThirdPartyDeviceId: randomUUID().toUpperCase(),
    DeviceName: "iPhone",
    Carrier: "Vodafone.de",
  };

  const frcHelper = new FrcCookieHelper(deviceSerial);
  const frc = await frcHelper.encrypt(frcBase);

  const headers = {
    "x-amzn-identity-auth-domain": "goodreads.com",
    "User-Agent": USER_AGENT,
    "Accept-Encoding": "gzip",
    Accept: "application/json",
    "Accept-Language": "en-DE",
    "Accept-Charset": "utf-8",
  };

  const body: GoodreadsRegisterRequest = {
    requested_extensions: ["device_info", "customer_info"],
    cookies: {
      website_cookies: [],
      domain: ".goodreads.com",
    },
    registration_data: {
      domain: "Device",
      app_version: "4.1",
      device_type: "A3NWHXTQ4EBCZS",
      os_version: "15.5",
      device_serial: deviceSerial,
      device_model: "iPhone",
      app_name: "GoodreadsForIOS App",
      software_version: "1",
    },
    auth_data: {
      user_id_password: {
        user_id: username,
        password,
      },
    },
    user_context_map: {
      frc,
    },
    requested_token_type: ["bearer", "mac_dms", "website_cookies"],
  };

  const res = await fetch(REGISTER_URL, {
    method: "POST",
    headers,
    body: JSON.stringify(body),
  });

  const resData = (await res.json()) as
    | GoodreadsRegisterSuccessResponse
    | GoodreadsRegisterFailureResponse;

  console.log("GOODREADS AUTH RESPONSE");
  console.dir(resData, { depth: null });

  if ("challenge" in resData.response) {
    throw new Error(resData.response.challenge.challenge_reason);
  }

  return resData as GoodreadsRegisterSuccessResponse;
};

export const deregister = async (
  accessToken: string
): Promise<GoodreadsDeregisterSuccessResponse> => {
  const res = await fetch(DEREGISTER_URL, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${accessToken}`,
    },
    body: JSON.stringify({
      deregister_all_existing_accounts: true,
    }),
  });

  const resData = (await res.json()) as
    | GoodreadsDeregisterSuccessResponse
    | GoodreadsDeregisterFailureResponse;

  console.log("GOODREADS DEREGISTER RESPONSE");
  console.dir(resData, { depth: null });

  if ("error" in resData.response) {
    throw new Error(resData.response.error.message);
  }

  return resData as GoodreadsDeregisterSuccessResponse;
};

export const refreshAccessToken = async (
  refreshToken: string
): Promise<GoodreadsRefreshSuccessResponse> => {
  const res = await fetch(REFRESH_URL, {
    method: "POST",
    headers: {
      "x-amzn-identity-auth-domain": "goodreads.com",
      "User-Agent": USER_AGENT,
      "Accept-Encoding": "gzip",
      "Content-Type": "application/x-www-form-urlencoded",
    },
    body: new URLSearchParams({
      app_name: "GoodreadsForIOS App",
      app_version: "4.0.1",
      "di.sdk.version": "6.12.1",
      source_token: refreshToken,
      package_name: "com.goodreads.Goodreads",
      "di.hw.version": "iPhone",
      platform: "iOS",
      requested_token_type: "access_token",
      source_token_type: "refresh_token",
      "di.os.name": "iOS",
      "di.os.version": "15.4.1",
      current_version: "6.12.1",
    }),
  });

  const resData = (await res.json()) as
    | GoodreadsRefreshSuccessResponse
    | GoodreadsRefreshFailureResponse;

  console.log("GOODREADS REFRESH TOKEN RESPONSE");
  console.dir(resData, { depth: null });

  if ("error_description" in resData) {
    throw new Error(resData.error_description);
  }

  return resData as GoodreadsRefreshSuccessResponse;
};

export const exchangeCookies = async (
  refreshToken: string
): Promise<GoodreadsCookieSuccessResponse> => {
  const res = await fetch(COOKIES_URL, {
    method: "POST",
    headers: {
      "x-amzn-identity-auth-domain": "goodreads.com",
      "User-Agent": USER_AGENT,
      "Accept-Encoding": "gzip",
      "Content-Type": "application/x-www-form-urlencoded",
    },
    body: new URLSearchParams({
      "openid.assoc_handle": "amzn_goodreads_web_na",
      app_name: "GoodreadsForIOS App",
      app_version: "4.0.1",
      "di.sdk.version": "6.12.1",
      domain: ".goodreads.com",
      source_token: refreshToken,
      "di.hw.version": "iPhone",
      cookies: "eyJjb29raWVzIjp7Ii5nb29kcmVhZHMuY29tIjpbXX19",
      requested_token_type: "auth_cookies",
      source_token_type: "refresh_token",
      "di.os.name": "iOS",
      "di.os.version": "15.4.1",
    }),
  });

  const resData = (await res.json()) as
    | GoodreadsCookieSuccessResponse
    | GoodreadsCookieFailureResponse;

  console.log("GOODREADS COOKIES RESPONSE");
  console.dir(resData, { depth: null });

  if ("error" in resData.response) {
    throw new Error(resData.response.error.message);
  }

  return resData as GoodreadsCookieSuccessResponse;
};

@csandman
Copy link
Author

csandman commented Jun 2, 2022

I ended up figuring it out, updated my previous comment with a working example.

@csandman
Copy link
Author

csandman commented Jun 2, 2022

Oh, I also finally tested actually using the bearer token to pull from the API, and it works! Probably should have made sure of that before I went through the whole process of converting the file, but glad it works anyway haha.

@mkb79
Copy link

mkb79 commented Jun 2, 2022

Here are the function to refresh the access token:

def refresh_access_token(refresh_token):
    url = "https://api.amazon.com/auth/token"

    headers = {
        "x-amzn-identity-auth-domain": "goodreads.com",
        "User-Agent": USER_AGENT,
        "Accept-Encoding": "gzip"
    }

    body = {
        "app_name": "GoodreadsForIOS App",
        "app_version": "4.0.1",
        "di.sdk.version": "6.12.1",
        "source_token": refresh_token,
        "package_name": "com.goodreads.Goodreads",
        "di.hw.version": "iPhone",
        "platform": "iOS",
        "requested_token_type": "access_token",
        "source_token_type": "refresh_token",
        "di.os.name": "iOS",
        "di.os.version": "15.4.1",
        "current_version": "6.12.1"
    }

    r = httpx.post(url, data=body, headers=headers)
    return r

@mkb79
Copy link

mkb79 commented Jun 2, 2022

Finally here comes the exchange token for cookies part:

def exchange_cookies(refresh_token):
    url = "https://api.amazon.com/ap/exchangetoken/cookies"

    headers = {
        "x-amzn-identity-auth-domain": "goodreads.com",
        "User-Agent": USER_AGENT,
        "Accept-Encoding": "gzip"
    }

    body = {
        "openid.assoc_handle": "amzn_goodreads_web_na",
        "app_name": "GoodreadsForIOS App",
        "app_version": "4.0.1",
        "di.sdk.version": "6.12.1",
        "domain": ".goodreads.com",
        "source_token": refresh_token,
        "di.hw.version": "iPhone",
        "cookies": "eyJjb29raWVzIjp7Ii5nb29kcmVhZHMuY29tIjpbXX19",
        "requested_token_type": "auth_cookies",
        "source_token_type": "refresh_token",
        "di.os.name": "iOS",
        "di.os.version": "15.4.1"
    }

    r = httpx.post(url, data=body, headers=headers)
    return r

@mkb79
Copy link

mkb79 commented Jun 2, 2022

So, it’s on yours for further progress ;)! If you need any help, feel free to contact me.

@csandman
Copy link
Author

csandman commented Jun 2, 2022

@mkb79 thanks for all the extra info! I was just thinking about asking if you had a hint on the other requests. Also thanks a ton for this whole thing, I can definitely think of a few applications for it!

So I was able to get the refresh function working, but I did have to add "Content-Type": "application/json" to the request headers. However, for some reason the exchange cookies function isn't working. I'm getting the following error when I try:

{
  response: {
    error: {
      code: 'MissingValue',
      detail: 'Missing parameter: app_name',
      message: 'One or more required values are missing'
    }
  },
  request_id: 'F8Q090DH5W85QP7REAGK'
}

Which is odd because I copied your body exactly, and the app_name is in there. Any ideas?

I was also curious, what's the exchange_cookies function even for? I'm having trouble seeing a case where one might need cookies in this whole process, considering the whole point of this was to get a bearer token for the header.

@mkb79
Copy link

mkb79 commented Jun 2, 2022

So I was able to get the refresh function working, but I did have to add "Content-Type": "application/json" to the request headers.

The right Content-Type for https://api.amazon.com/auth/token and https://api.amazon.com/ap/exchangetoken/cookies is application/x-www-form-urlencoded. This is set by httpx automatically when post the request body as data instead of json (like the register and deregister function). So this header was missing in my code. Sorry for that.

Which is odd because I copied your body exactly, and the app_name is in there. Any ideas?

Maybe the solution is sending the data in urlencoded format? Or can you post your code implementation?

I was also curious, what's the exchange_cookies function even for? I'm having trouble seeing a care where one might need cookies in this whole process, considering the whole point of this was to get a bearer token for the header.

Some requests using these cookies in addition to the access token. But I had no issues sending the request without the cookies. So I post the code here for completeness. Maybe this cookies can be used to make authenticated requests to Goodreads.com?!

Edit:
I mean authenticated requests to the Goodreads.com webpages and not using the API.

@csandman
Copy link
Author

csandman commented Jun 2, 2022

The right Content-Type for https://api.amazon.com/auth/token and https://api.amazon.com/ap/exchangetoken/cookies is application/x-www-form-urlencoded.

Interesting that you say that because I saw the same thing in some different AWS docs, but it has been working so far for all of the other requests to use the application/json type. I ended up trying out using form data instead and am still running into the same issue. This is my code so far if you're curious:

import FormData from "form-data";
import fetch from "node-fetch";

const COOKIES_URL = "https://api.amazon.com/ap/exchangetoken/cookies";
const USER_AGENT = "AmazonWebView/GoodreadsForIOS App/4.0.1/iOS/15.4.1/iPhone";

export const exchangeCookies = async (refreshToken: string) => {
  const headers = {
    "x-amzn-identity-auth-domain": "goodreads.com",
    "User-Agent": USER_AGENT,
    "Accept-Encoding": "gzip",
    "Content-Type": "application/x-www-form-urlencoded",
  };

  const body = {
    "openid.assoc_handle": "amzn_goodreads_web_na",
    app_name: "GoodreadsForIOS App",
    app_version: "4.0.1",
    "di.sdk.version": "6.12.1",
    domain: ".goodreads.com",
    source_token: refreshToken,
    "di.hw.version": "iPhone",
    cookies: "eyJjb29raWVzIjp7Ii5nb29kcmVhZHMuY29tIjpbXX19",
    requested_token_type: "auth_cookies",
    source_token_type: "refresh_token",
    "di.os.name": "iOS",
    "di.os.version": "15.4.1",
  };

  const formBody = new FormData();
  Object.entries(body).forEach(([key, value]) => {
    formBody.append(key, value);
  });

  const res = await fetch(COOKIES_URL, {
    method: "POST",
    headers,
    body: formBody,
  });

  const resData = await res.json();
  console.log("COOKIES RESPONSE");
  console.dir(resData, {
    depth: null,
  });

  return resData;
};

I'm not overly concerned about making this function work though, I don't think I'd really need it for my use case. But like you said, completeness is nice.

@mkb79
Copy link

mkb79 commented Jun 3, 2022

@csandman

This works for me

const fetch = require('node-fetch')

const res = fetch('https://api.amazon.com/ap/exchangetoken/cookies', {
    method: 'POST',
    headers:{
      'Content-Type': 'application/x-www-form-urlencoded',
      "x-amzn-identity-auth-domain": "goodreads.com",
      "User-Agent": 'AmazonWebView/GoodreadsForIOS App/4.0.1/iOS/15.4.1/iPhone',
      "Accept-Encoding": "gzip"
    },    
    body: new URLSearchParams({
        "openid.assoc_handle": "amzn_goodreads_web_na",
        app_name: "GoodreadsForIOS App",
        app_version: "4.0.1",
        "di.sdk.version": "6.12.1",
        domain: ".goodreads.com",
        source_token: "YOUR_REFRESH_TOKEN",
        "di.hw.version": "iPhone",
        cookies: "eyJjb29raWVzIjp7Ii5nb29kcmVhZHMuY29tIjpbXX19",
        requested_token_type: "auth_cookies",
        source_token_type: "refresh_token",
        "di.os.name": "iOS",
        "di.os.version": "15.4.1",
    })
})
  .then(res => res.json())
  .then(json => {
    console.dir(json, {
      depth: null,
    });
  })

Edit:
I'm using Node 12 and node-fetch@2 because I'm coding on my iOS device.

@mkb79
Copy link

mkb79 commented Jun 3, 2022

Interesting that you say that because I saw the same thing in some different AWS docs, but it has been working so far for all of the other requests to use the application/json type.

Some post requests made by the Goodreads/Audible/Kindle iOS Apps are json encoded and some url encoded. The refresh token and cookie exchange requests are url encoded. Maybe json will work too, but I doesn’t try this out yet.

@csandman
Copy link
Author

csandman commented Jun 3, 2022

Great that worked for me, thanks for all the help @mkb79! This will definitely be useful for my own project, and I'm sure @djdembeck can get some use out of it!

Just a heads up, I edited my code again with a more complete example including all of the functions originally mentioned, as well as types for everything (success and error responses) and some basic error handling. @djdembeck I'm sure you'll have other ideas about how to handle the types and errors, as well as the responses (I'm still pretty new to TypeScript) but hopefully this is a good starting point!

@djdembeck
Copy link
Collaborator

djdembeck commented Jun 3, 2022

Awesome! Love seeing the collaboration. mkb79 is an absolute genius. First he gave us Audible's API docs and methods and now GR 😅 Tremendous help mate and thanks for stopping in to help us on this!

I'm tracking the GR work in a project: https://github.com/laxamentumtech/audnexus/projects/2

@mkb79
Copy link

mkb79 commented Jun 4, 2022

@djdembeck

Awesome! Love seeing the collaboration. mkb79 is an absolute genius. First he gave us Audible's API docs and methods and now GR 😅 Tremendous help mate and thanks for stopping in to help us on this!

Thanks goes to all, who helped out. Without @csandman there where no port to Node. I'm only a sparetime hobby coder with less time (currently)! But if I can help out more, feel free to contact me.

@csandman
Copy link
Author

csandman commented Jun 7, 2022

@mkb79 I definitely appreciate the help, I've already made good use of this in my own project!

One last question though, I notice the unpack, decrypt, and verify_signature functions are not used anywhere in your examples (specifically decrypt as the other two are just called by that). Do you have any use for them, or are they only there for completion from the original code you translated?

@mkb79
Copy link

mkb79 commented Jun 7, 2022

@csandman

I've already made good use of this in my own project!

That’s good to hear. Is your project something you want to make public?

One last question though, I notice the unpack, decrypt, and verify_signature functions are not used anywhere in your examples (specifically decrypt as the other two are just called by that). Do you have any use for them, or are they only there for completion from the original code you translated?

This is for completion. I used this to decrypt my own frc cookies, which are set by Amazon. I‘m interested in reverse engineering software. But for your use case, you don’t need these functions.

@csandman
Copy link
Author

csandman commented Jun 7, 2022

@mkb79

That’s good to hear. Is your project something you want to make public?

So my project is a self hosted web app with the main goal of downloading books from OverDrive public libraries (with chapters), scraping metadata from Audible, scraping covers from Audible/iTunes, and then merging them into .m4b files using m4b-tool. It can also be used to download books from your Audible library and remove the DRM on them (I took some inspiration from your audible-cli for a part of this too!).

The other big part of it is an editor I made for customizing the metadata and adjusting the positions of chapters/adding and removing chapters before merging. This part ended up working so well that I ended up adding the functionality to import local book files to merge/fix the metadata on them as well. You can check out some example images of the project if you're curious.

The main reason I'm hesitant about completely open sourcing it is for fear that potential employers might not look so kindly on the "piratey" nature of the app and it could hurt my chances of getting a job in the future, however unlikely that is. Which is a shame because it's the most work I've ever put into a side project haha. However I have no problem sharing it with people who are interested, and just the other day ended up sharing it with a few people on Reddit who wanted to try it out. I'll invite you with read access to the repo in case you want to try it out or check out the source code. @djdembeck I'll add you too in case you're curious, it seems up your alley. It might even give you some ideas for bragibooks!

This is for completion. I used this to decrypt my own frc cookies, which are set by Amazon. I‘m interested in reverse engineering software. But for your use case, you don’t need these functions.

Great, just wanted to make sure! And I can see the appeal, I personally get a certain satisfaction from translating one language into another.

@mkb79
Copy link

mkb79 commented Jun 7, 2022

@csandman
Thank you very much for the invite. I'll take a deeper look into your project as fast as possible. But what I see on the first is very good!

@djdembeck
Copy link
Collaborator

@csandman this is quite impressive. I wasn't expecting the level of polish you've put into your project.

I wouldn't make it public, personally. Less for employers (who usually just care about code readability/competence), and more for the hot water of removing DRM. That is my opinion, however, you've clearly put a lot of work into this, and it would be a shame for it to never see the light of day.

@csandman
Copy link
Author

csandman commented Jun 8, 2022

@mkb79

Thank you very much for the invite. I'll take a deeper look into your project as fast as possible. But what I see on the first is very good!

I appreciate it!

@djdembeck

I wouldn't make it public, personally. Less for employers (who usually just care about code readability/competence), and more for the hot water of removing DRM.

Yeah that's definitely my other concern. For the most part, providing source code for tools that accomplish the removal of DRM isn't usually attacked that often (from what I've seen), but its still a concern nonetheless.

I appreciate the feedback though! It has been a long time in the making. It started out as a command line app which just downloaded an entire OverDrive library, scraped metadata for each book, and merged the files all in one go. But I really wanted to be able to run it on my Unraid server, and I wanted to be able to tweak the metadata before merging, so I figured a webapp would be the best way to go. Conveniently, next.js has API routes built in, so moving from a CLI app to a web app wasn't that hard. The main thing that gave me problems was getting the Docker image working. I have very little experience with docker besides this project and it has been a steep learning curve for me haha.

And maybe I will find a way to share it some day. I do think a lot of people would get some good use out of it. You're welcome to use it if you want, or any of the source code if any of it would be helpful in any way. I didn't bother to add a license because I didn't plan on OS'ing it, but I'll probably add an MIT license now that some people have access to it.

@csandman
Copy link
Author

csandman commented Jul 22, 2022

By the way @djdembeck, one more thing I realized about this is that in many cases, you will get the correct book from Goodreads back if you use the ASIN as the search term. From what I've seen, you will either get the correct Audible book, or no results if you use the ASIN, which would always be more reliably correct than matching by the title/author.

I'll need to do some tests to see how often the ASIN matches to see whether or not this is a reliable method, but it's at least a good first step because you'd know the result is correct.


In my app, I've also started searching both Audible and Goodreads using the ISBN I get back from OverDrive. This seems to give me matches even more reliably, but unfortunately I don't think there's a way to get the ISBN back from Audible (as far as I know). If you end up figuring out how to get that though, it would probably work even better.

@djdembeck
Copy link
Collaborator

Let me know your findings about the reliability of searching by ASIN!

I've been gearing up Audnexus to be far easier to add 'plugins' to for providers. It's likely I'll have the Goodreads plug-in require an explicit ID, and make clients (Plex, AudiobookShelf, etc) do the actual searching. This is how Audible is currently setup so I think it's logical to stick to that design.

@csandman
Copy link
Author

csandman commented Aug 2, 2022

@djdembeck After some more testing (not a huge sample size, maybe 20 books) I've found that I had around a 50% success rate or a little lower using the ASIN. So it's definitely not something I'd say you would be able to consistently rely on, but it is still something worth considering as a first approach IMO, as the result returned should be 100% accurate vs a manual comparison.

@csandman
Copy link
Author

csandman commented Jan 6, 2023

By the way, I figured I'd share my complete implementation of the Goodreads API I'm using in my project: https://github.com/book-tools/audiobook-scraper-web/blob/main/src/backend/api/goodreads/goodreads-api.ts

It's the project I mentioned previously, it's still private as well, but you should have access. Not sure if you've done any work on this yourself, but you could always use some stuff from mine if you haven't. You're welcome to fully copy/modify it if you'd like, take some parts of it, or completely ignore it! Up to you.

I only implemented three Goodreads' endpoints, but I'm not really sure if you'd need more than that.


One thing to note, is in my last commit, I switched to using zod for all request handling. I'm not sure if you've ever looked at zod, but it's an amazing tool for assigning types to objects with runtime type safety. For example, when pulling data from the Audible API, you could parse the resulting data with zod instead of manually checking for the required keys, and you'll end up with a fully type-safe response. If the response doesn't match the schema, it will just throw an error, making it much easier to find all mistakes during development. You could also use a tool like this to easily convert your existing Audible response interfaces to a zod schema: https://transform.tools/typescript-to-zod. Here are some examples of how I'm using it for my Goodreads API:

Sorry for getting off-topic, but I think this would be a great tool for Audnexus, especially considering how many guards it seems you have in place to make sure your data is correct.

@djdembeck
Copy link
Collaborator

Your project has grown quite a bit. Congrats on the work! I might need to steal you some for AudiobookDB 😛

zod sounds fantastic. I hate manually checking! I'll have to look into the docs and try integrating them for key checks in my projects.

@csandman
Copy link
Author

Your project has grown quite a bit. Congrats on the work! I might need to steal you some for AudiobookDB 😛

Thanks! You're more than welcome to use whatever you like, I didn't make that project with a monetary goal in mind, so I'm happy to share if it helps make other apps better.

zod sounds fantastic. I hate manually checking! I'll have to look into the docs and try integrating them for key checks in my projects.

I discovered it a few months ago and to me, it feels like the future of runtime type checking in situations where types can't be inferred. The same guy who made it also made the initial version of tRPC, which is basically an abstraction of zod used to make applications type-safe in an end-to-end way. It's made specifically for JS apps using a Node.js backend web server, and while I haven't used it personally, I've heard nothing but raving reviews.

The great part about zod though, is you don't always have complete control over the front to back end interaction. For pretty much any external REST API you hit using standard web requests, you have to rely on them to not change their response structure and that the structure you find in documentation is correct. zod really takes out all the guesswork and gives you reliably typed data with minimal boilerplate code.

And one of the coolest parts is that you can get the typing of the output object without having to redefine it as a standalone type.

import { z } from "zod";

const userSchema = z.object({
  username: z.string(),
});

userSchema.parse({ username: "Ludwig" });

// extract the inferred type
type User = z.infer<typeof userSchema>;
// { username: string }

@djdembeck
Copy link
Collaborator

I want to revisit this. @csandman are you using GR lookup (via search?) or explicit ID lookup? I'm thinking the latter would be safe to add to Audnexus, but I don't love the lack of automation.

@csandman
Copy link
Author

csandman commented Feb 9, 2023

I want to revisit this. @csandman are you using GR lookup (via search?) or explicit ID lookup?

By ID lookup, do you mean ASIN lookup like I mentioned before? Because in my app I am doing that, but only as a second approach because it's frequently not attached to books on Goodreads. My first approach is to match based on the ISBN of the book, because searching that on Goodreads tends to give results at a much higher rate. And like I mentioned before, the ISBN is available on all books on OverDrive, where I'm sourcing my books from originally. And finally, I just do some fuzzy matching on the title and authors of the book after searching for the book's title.

Alternatively, if you mean lookup by Goodreads ID, I'm not sure what you mean. I wouldn't have access to that before running this function personally

Here is my code for matching if you're curious:

export const stripDiacretics = (str: string): string =>
  str.normalize("NFD").replace(/[\u0300-\u036f]/g, "");

export const removeNonWordChars = (str: string): string =>
  str.replace(/\W+/g, " ");

export const removeSpaces = (str: string): string => str.replace(/\s/g, "");

export const removeExtraSpaces = (str: string): string =>
  str.replace(/\s{2,}/g, " ").trim();

export const simplify = (str: string): string =>
  removeSpaces(removeNonWordChars(stripDiacretics(str).toLowerCase()));

export const fuzzyMatch = (
  str1: string,
  str2: string,
  checkIncludes = false
) => {
  const simpleStr1 = simplify(str1);
  const simpleStr2 = simplify(str2);

  return (
    simpleStr1 === simpleStr2 ||
    (checkIncludes &&
      (simpleStr1.includes(simpleStr2) || simpleStr2.includes(simpleStr1)))
  );
};

export const checkAuthorOverlap = (authors1: Author[], authors2: Author[]) => {
  for (let i = 0; i < authors1.length; i += 1) {
    for (let j = 0; j < authors2.length; j += 1) {
      if (fuzzyMatch(authors1[i].name, authors2[j].name)) {
        return true;
      }
    }
  }

  return false;
};

/**
 * Find a goodreads book based on an input book
 *
 * @param book - A book to find a match for
 * @returns A Goodreads book or null if no match was found
 */
async function getGoodreadsMatch(book: Book): Promise<GoodreadsBook | null> {
  const settings = await loadSettings();

  const goodreadsUser = settings.goodreadsUser || process.env.GOODREADS_USER;
  const goodreadsPass = settings.goodreadsPass || process.env.GOODREADS_PASS;

  const goodreads = new GoodreadsApi(goodreadsUser, goodreadsPass);
  await goodreads.init();

  // First try to find a match based on the ISBN if available
  // this tends to have the best results
  if (book.isbn) {
    const goodreadsItemsFromIsbn = await goodreads.searchBooks(book.isbn);
    if (goodreadsItemsFromIsbn.length) {
      return goodreadsItemsFromIsbn[0];
    }
  }

  // If no match was found, try to find a match based on the ASIN
  // this is less likely to return any results but can still be useful
  if (book.asin) {
    const goodreadsItemsFromAsin = await goodreads.searchBooks(book.asin);
    if (goodreadsItemsFromAsin.length) {
      return goodreadsItemsFromAsin[0];
    }
  }

  // If no match was found, try to find a match based on the title
  // the author is still being compared so it shouldn't provide incorrect results
  const goodreadsItems = await goodreads.searchBooks(book.title, {
    searchField: "title",
  });

  for (let i = 0; i < goodreadsItems.length; i += 1) {
    const item = goodreadsItems[i];

    if (
      checkAuthorOverlap(item.authors, book.authors) &&
      fuzzyMatch(book.title, item.title, true)
    ) {
      return item;
    }
  }

  return null;
}

I know manual matching on the title and author(s) probably isn't the most appealing prospect, but it is highly successful. And if you were to implement a similar scoring function, like you have in the Audnexus Plex agent, you could get even better results.


One alternative approach I thought of was trying to get a matching ISBN for your Audible books, and using that to search Goodreads, as I believe all Goodreads books have an ISBN attached to them. I got some inspiration from this issue in the audible-cli repo: mkb79/audible-cli#63

Unfortunately, the way the plugin mentioned in that issue works is by searching for the ISBN of a book based on it's title/author. Again, not a bulletproof solution. Additionally, I don't believe that package finds the ISBN of the audiobook version of a book, probably just the origianal physical book's.

However, my idea is that if you could somehow find the Audiobook version of the ISBN (the same edition that's on Audible), you could run a lookup on that ISBN on Audible to confirm you have the right book.

e.g. https://www.audible.com/search?keywords=9780525633723

Because Audible allows you to search by ISBN, you would only ever end up with one result when running that search. So this is a little convoluted (and would involve a few steps) but here is the process as I envisioned it:

  1. Find the audiobook version of an ISBN by searching some form of ISBN database using the book's title and author.
  2. Once found, search that ISBN on Audible. If there's a match, only one result will come back.
  3. Check that resulting book's ASIN against the ASIN of the original book you were searching for.
  4. If the ASIN matches, you have your ISBN, and you can use this with 100% certainty to search Goodreads.

This is super convoluted, I know, I'm mostly just brainstorming at this point to works towards a fail proof matching solution. It's pretty frustrating that you can't get the book's ISBN directly from the Audible API, considering they obviously have that information available if you're allowed to search by it. And realistically, having the ISBN would probably be one of the most useful things you could offer from this package.


The main approach I've thought of for actually finding the audiobook ISBN for a book is using OverDrive. There are two different ways to search OverDrive.

  1. Search an individual library's OverDrive site. This method has the advantage of giving you the ISBN for every result in the page, without requiring any manual DOM parsing. This can be done using a trick I figured out. When pretty much any OverDrive page loads, they add an OverDrive object to the window which contains information about the books listed on that page. And this is done using a script embedded in the HTML of the returned page, so you can easily grab that information with simple regex and JSON parsing. Here's an example search and the resulting json from that search. You can see an example of how I use this to get book info without Puppeteer or Cheerio here. This approach is good because of the ease of getting this information from the page, but the drawback is that you're limited to books available in that library's collection. If you pick a library like the New York Public Library, you'll have a massive collection to work with, but this can still be a limiting factor.
  2. The alternative approach is to search the root https://www.overdrive.com/ site. This should give you results across their entire platform, which includes many more books overall. However, unlike the individual library sites, they don't include this convenient metadata object on the search results page. They do include it on the pages for the individual books' pages, but you'd have to pull those as well in order to get that information. Another drawback of this approach is that, without any search limitation, I've found the results to be much less accurate than the individual library sites. In case you're curious, here is an example search response and the accompanying book page.

In either case, you can get the ISBN by running the script I mentioned above, and then once you have the info access it like this:

mediaItems[bookId].formats[0].identifiers

and then you look for the identifier with type === "ISBN" and grab it's value.

Accessing the media item on the root OverDrive site is a bit different as well. There the info is added to that page at:

dataLayer[0].content.formats[0].identifiers

And the media item could be pulled from the page with a regex that's something like this:

const bookPage = await fetch(bookUrl);
const bookPageText = await bookPage.text();

const dataLayerMatch = /dataLayer ?=(.*?]);/.exec(bookPageText);

const dataLayer = JSON.parse(dataLayerMatch[1].trim())

const mediaItem = dataLayer[0].content;

Sorry for the information dump, I just figured I'd provide you with everything I think might help you in this process that I've learned over time building my own app. A lot of this parsing is already coded by me in my own app, so again, if you want to use anything from it feel free!

@djdembeck
Copy link
Collaborator

Closing this because AudiobookDB is opening soon-ish and has this capability.

@djdembeck djdembeck closed this as not planned Won't fix, can't repro, duplicate, stale Jul 24, 2023
@csandman
Copy link
Author

csandman commented Jul 25, 2023

Closing this because AudiobookDB is opening soon-ish and has this capability.

Is AudiobookDB open source? Or will it be? You've mentioned it a couple times but I'm not exactly sure what it is yet haha. It sounds very up my alley though, and I'm very curious to check it out! And could also be interested in helping with it!

@djdembeck
Copy link
Collaborator

Closing this because AudiobookDB is opening soon-ish and has this capability.

Is AudiobookDB open source? Or will it be? You've mentioned it a couple times but I'm not exactly sure what it is yet haha. It sounds very up my alley though, and I'm very curious to check it out! And could also be interested in helping with it!

It's going to start closed source. The scale I would like to bring it up to isn't achievable without significant backing.

I meant to release it last year, but had a personal loss that killed my ambition to work on it. Thankfully doing better now and the ideas are flowing again! Hoping to have something in testers hands in a month or so.

@csandman
Copy link
Author

The scale I would like to bring it up to isn't achievable without significant backing

By backing do you mean like hosting resources?

I meant to release it last year, but had a personal loss that killed my ambition to work on it.

Ah yeah, you've mentioned that before, I'm sorry to hear it. Glad to hear you're doing better now though!

Well either way, excited to check it out! I can guess the general purpose for it, and it's something I've thought about doing myself, so I'm very curious how it will turn out.

@benonymity
Copy link

It's going to start closed source. The scale I would like to bring it up to isn't achievable without significant backing.

I meant to release it last year, but had a personal loss that killed my ambition to work on it. Thankfully doing better now and the ideas are flowing again! Hoping to have something in testers hands in a month or so.

Hey! I've been watching this thread for a while for all the helpful Goodreads reverse engineering, and am also interested in this AudiobookDB idea! I've contributed a bit to the audiobookshelf project, and its developer @advplyr had thought about starting a similar audiobook database. He owns bookdb.org and started the bookdb repo. I don't know what your vision for AudiobookDB is, but the audiobookshelf community is very interested in a crowdsourced database for everything audiobooks, so even if you wouldn't want to collaborate on its development I'm guessing integration with audiobookshelf would be mutually beneficial. Food for thought!

@djdembeck
Copy link
Collaborator

djdembeck commented Jul 26, 2023

It's going to start closed source. The scale I would like to bring it up to isn't achievable without significant backing.

I meant to release it last year, but had a personal loss that killed my ambition to work on it. Thankfully doing better now and the ideas are flowing again! Hoping to have something in testers hands in a month or so.

Hey! I've been watching this thread for a while for all the helpful Goodreads reverse engineering, and am also interested in this AudiobookDB idea! I've contributed a bit to the audiobookshelf project, and its developer @advplyr had thought about starting a similar audiobook database. He owns bookdb.org and started the bookdb repo. I don't know what your vision for AudiobookDB is, but the audiobookshelf community is very interested in a crowdsourced database for everything audiobooks, so even if you wouldn't want to collaborate on its development I'm guessing integration with audiobookshelf would be mutually beneficial. Food for thought!

Hey, thanks for reaching out! I'll definitely be making announcements to various communities when the beta opens. As it stands, I know audiobookshelf is using our other project, Audnexus.

To divulge a bit about my progress on AbDB: I'm currently working on the CRUD frontend pages for main data types. These are mostly done, but I'm unsatisfied with some of the designs so I've been refining them. Once those are in place, I'll reach out to some interested people to provide feedback on the MVP, so we can make things like schema and major design changes before launching beta to everyone.

Additionally, we are having our lawyer work on what we are allowed to host from major companies, write our ToS, etc. This doesn't have much time impact until wide release.

The MVP won't have edit history, which has been a technical hurdle that I don't want to block getting feedback.

The API is and has been ready to go. At the MVP stage, I'm going to fork out Audnexus agent into an AbDB agent for Plex to get some real world usage (and enjoy the fruit of my labor).

I will make a public issues tracker and maybe a Discord for communication. API docs are already written and will be released when the MVP is setup on a dev server.

Thanks for all the interest y'all!

@djdembeck
Copy link
Collaborator

Coming here to follow up. I made an issue with details on how to test the alpha of AudiobookDB and help shape it a bit before wider usage: #689

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants