Added a github action, improved index.rst, added common_issues.rst to…

… the menu.
podcastle-studio · May 16, 2022 · 0661054 · 0661054
1 parent f735c42
commit 0661054
Show file tree

Hide file tree

Showing 3 changed files with 112 additions and 11 deletions.
diff --git a/.github/workflows/doc.yaml b/.github/workflows/doc.yaml
@@ -0,0 +1,45 @@
+# build the sphinx documentation and pushes it to a doc branch, then used by github pages
+
+name: Doc
+
+on: [ push, pull_request ]
+
+jobs:
+  docs:
+    runs-on: ubuntu-latest
+    strategy:
+      max-parallel: 4
+      matrix:
+        python-version: [ 3.7 ]
+    steps:
+      - uses: actions/checkout@v1
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v1
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install
+        run: |
+          python -m pip install --upgrade pip
+          pip install .[doc]
+      - name: Build documentation
+        run: |
+          make --directory=docs html
+          touch ./docs/build/html/.nojekyll
+      - name: Commit documentation changes
+        run: |
+          git clone https://github.com/bootphon/phonemizer.git --branch doc --single-branch doc
+          cp -r docs/build/html/* doc
+          cd doc
+          touch .nojekyll
+          git config --local user.email "action@github.com"
+          git config --local user.name "GitHub Action"
+          git add .
+          git commit -m "Update documentation" -a || true
+          # The above command will fail if no changes were present, so we ignore
+          # the return code.
+      - name: Push changes
+        uses: ad-m/github-push-action@master
+        with:
+          branch: doc
+          directory: doc
+          github_token: ${{ secrets.GITHUB_TOKEN }}
diff --git a/docs/source/common_issues.rst b/docs/source/common_issues.rst
@@ -1,5 +1,33 @@
 ==============
-Command Issues
+Common Issues
 ==============
 
 
+Phonemization is slow
+---------------------
+
+You may have realized that large number of calls to the ``phonemize``
+makes for a very slow execution. It is much more efficient to minimize the number of calls to the phonemize function.
+Indeed the initialization of the phonemization backend can be expensive, especially for espeak.
+It's much more efficient to either:
+
+- group all the calls into one using a list of strings
+- "manually" instantiate your backend of choice, then call its own ``phonemize`` method
+
+.. code-block:: python
+
+    from phonemizer import phonemize
+
+    text = [line1, line2, ...]
+
+    # Do this:
+    phonemized = phonemize(text, ...)
+
+    # Not this:
+    phonemized = [phonemize(line, ...) for line in text]
+
+    # An alternative is to directly instanciate the backend and to call the
+    # phonemize function from it:
+    from phonemizer.backend import EspeakBackend
+    backend = EspeakBackend('en-us', ...)
+    phonemized = [backend.phonemize(line, ...) for line in text]
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -6,6 +6,44 @@
 Welcome to Phonemizer's documentation!
 ======================================
 
+
+* ``phonemizer`` allows simple phonemization of words and texts in many languages.
+
+* Provides both the ``phonemize`` command-line tool and the Python function
+  ``phonemizer.phonemize``. See :ref:`phonemize`.
+
+* It is based on four backends: **espeak**, **espeak-mbrola**, **festival** and
+  **segments**. The backends have different properties and capabilities resumed
+  in table below. The backend choice is let to the user.
+
+  * `espeak-ng <https://github.com/espeak-ng/espeak-ng>`_ is a Text-to-Speech
+    software supporting a lot of languages and IPA (International Phonetic
+    Alphabet) output.
+
+  * `espeak-ng-mbrola <https://github.com/espeak-ng/espeak-ng/blob/master/docs/mbrola.md>`_
+    uses the SAMPA phonetic alphabet instead of IPA but does not preserve word
+    boundaries.
+
+  * `festival <http://www.cstr.ed.ac.uk/projects/festival>`_ is another
+    Tex-to-Speech engine. Its phonemizer backend currently supports only
+    American English. It uses a [custom phoneset][festival-phoneset], but it
+    allows tokenization at the syllable level.
+
+  * `segments <https://github.com/cldf/segments>`_ is a Unicode tokenizer that
+    build a phonemization from a grapheme to phoneme mapping provided as a file
+    by the user.
+
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   install
+   cli
+   python_examples
+   common_issues
+   api_reference
+
 To reference ``phonemizer`` in your own work, please cite the following
 `JOSS paper <https://joss.theoj.org/papers/10.21105/joss.03958>`_.
 
@@ -24,15 +62,5 @@ To reference ``phonemizer`` in your own work, please cite the following
      journal = {Journal of Open Source Software}
    }
 
-.. toctree::
-   :maxdepth: 2
-   :caption: Contents:
-
-   install
-   cli
-   common_issues
-   api_reference
-
-