Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a sample search replace file #89

Open
haydonryan opened this issue Oct 7, 2024 · 7 comments
Open

Create a sample search replace file #89

haydonryan opened this issue Oct 7, 2024 · 7 comments

Comments

@haydonryan
Copy link
Contributor

Loving this app. Thankyou all for the great work.

It would be good to crowd source some of the word replacements

There are a bunch of clear ones based on the books I read.
$1 million reads as dollar one million
2010 reads as two thousand ten.

I'm currently doing these changes on the command line. Happy to contribute mines just need to confirm the format.

@haydonryan
Copy link
Contributor Author

I also wonder if we should consider having two files - an included on that had been throughly vetted and custom replacements.

@p0n1
Copy link
Owner

p0n1 commented Oct 10, 2024

Hey @haydonryan . Thanks for reaching out. Not sure if the word replacement you mentioned would be something similar with this PR #80 we have merged.

There are a bunch of clear ones based on the books I read.
$1 million reads as dollar one million
2010 reads as two thousand ten.

Besides, just curious about which TTS engine are you using?

@haydonryan
Copy link
Contributor Author

Oh yes good point - it's definitely going to be specific to the TTS engine. I'm currently using piper, but have been thinking about trying https://github.com/coqui-ai/TTS, but as this isn't currently a supported option, I'd export the text files before passing it on.

Still looking for the best free TTS system. I like piper but the lack of GPU acceleration is frustrating.

@haydonryan
Copy link
Contributor Author

haydonryan commented Oct 19, 2024

So the readme is helpful - but what regular expression syntax is it using? Eg in my script to run epub_to_audiobook I have:

# numbers will be in the form:
# 19 20 or 19o4
ls *.txt | xargs sed -i 's/2000/two thousand/g'
ls *.txt | xargs sed -i 's/200\([1-9]\)/two thousand and \1/g'
ls *.txt | xargs sed -i 's/\([0-9]\{2\}\)0\([0-9]\)/\1o\2/g'
ls *.txt | xargs sed -i 's/\([0-9]\{2\}\)\([0-9]\{2\}\)/\1 \2/g'

and some involve punctuation eg:

ls *.txt | xargs sed -i 's/Jr.’s/juniors/g'

@haydonryan
Copy link
Contributor Author

haydonryan commented Oct 22, 2024

I dug into the code. seems it's calling re.sub. Therefore python regex format is the one it's doing.

# Search and replace from books I'm listening to:
\$([0-9]+.[0-9])\sbillion==\1 billion dollars

This as a search and replace file didn't work.

however this did:

import re
test="$70 billion"
re.sub(r"\$([0-9]+) billion", r"\1 billion dollars", test)
e.sub(r"\$([0-9]+.*[0-9]*)\sbillion", r"\1 billion dollars", test)
'70.3  billion dollars'

Also I don't think it should be one regex per line, it's highly lkely that you'll get more than one match -

eg:

Carls Jr spent $3.1 Billion on advertising. Has two items that would not get spoken right...

Better to run the search and replace over the whole file.

@p0n1
Copy link
Owner

p0n1 commented Oct 25, 2024

'70.3  billion dollars'

Why would this lead to '70.3 billion dollars'?

@haydonryan
Copy link
Contributor Author

sorry i fudged the example, the example above would be 70 billion dollars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants