Skip to content

Commit

Permalink
Backporting changes from faab42c
Browse files Browse the repository at this point in the history
  • Loading branch information
hadware committed May 17, 2022
1 parent 814b934 commit eba83f0
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 4 deletions.
31 changes: 31 additions & 0 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,37 @@ Changelog

Version numbers follow `semantic versioning <https://semver.org>`__

phonemizer-3.2.0
----------------

* **bug fixes**

* Fixed a bug when trying to restore punctuation on very long text.
See `issue #108 <https://github.com/bootphon/phonemizer/issues/108>`__

* **improvements**

* Improved consistency with the handling of word separators when
preserving punctuation, and when using a word separator that is
not a literal space character. See
`issue #106 <https://github.com/bootphon/phonemizer/issues/106>`__

* **new features**

* Added the option to define punctuation with a regular expression.
Previously only strings were accepted. See
`PR #120 <https://github.com/bootphon/phonemizer/pull/120>`__

* In the python API, the ``punctuation_marks`` parameter can now be
passed to ``phonemize`` (or a backend constructor) as a ``re.Pattern``
that defines which characters will be matched as punctuation.
Passing ``punctuation_marks`` as a str will continue to function as
before, treating each character in the string as a punctuation mark.

* Added the optional parameter ``--punctuation_marks_is_regex`` to the CLI
interface. When used, the CLI will attempt to compile a ``re.Pattern``
from the value passed to ``--punctuation-marks``.

phonemizer-3.1.1
----------------

Expand Down
20 changes: 19 additions & 1 deletion docs/source/cli.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ See the installed backends with the ``--version`` option:
phonemizer-3.0
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.1.3
Input/output exemples
Input/output examples
---------------------

- from stdin to stdout:
Expand Down Expand Up @@ -182,6 +182,24 @@ by the **espeak-mbrola** backend):
$ echo "hello, world!" | phonemize --preserve-punctuation --strip
həloʊ, wɜːld!
The default punctuation marks are each of the following characters: ``;:,.!?¡¿—…"«»“”``.
These can be overridden by the ``--punctuation-marks`` option.
.. code-block:: shell
$ echo "hello, world!" | phonemize --preserve-punctuation --strip --punctuation-marks '!?'
həloʊ wɜːld!
The punctuation marks can be specified as a regular expression by additionally using the
``--punctuation-marks-is-regex`` option. For example, to preserve the default punctuation marks
except for commas and periods in the middle of numbers, the following will work:
.. code-block:: shell
$ echo "1,000, or so." | phonemize --preserve-punctuation --strip --punctuation-marks '[;:!?¡¿—…"«»“”]|[,.](?!\d)' --punctuation-marks-is-regex
wʌn θaʊzənd, ɔːɹ soʊ.
Espeak specific options
-----------------------
Expand Down
6 changes: 3 additions & 3 deletions docs/source/python_examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ In Python import the ``phonemize`` function with ``from phonemizer import phonem
See :py:meth:`phonemizer.phonemize`.


Exemple 1: phonemize a text with festival
Example 1: phonemize a text with festival
-----------------------------------------

The following exemple downloads a text and phonemizes it using the
Expand Down Expand Up @@ -42,7 +42,7 @@ syllables by ``|``.
njobs=4)
Exemple 2: build a lexicon with espeak
Example 2: build a lexicon with espeak
--------------------------------------

The following exemple extracts a list of words present in a text,
Expand All @@ -56,7 +56,7 @@ We consider here the same text as in the previous exemple.
from phonemizer.punctuation import Punctuation
from phonemizer.separator import Separator
# remove all the punctuation from the text, condidering only the specified
# remove all the punctuation from the text, considering only the specified
# punctuation marks
text = Punctuation(';:,.!"?()-').remove(text)
Expand Down

0 comments on commit eba83f0

Please sign in to comment.