Skip to content

posts titles unicode characters cause exceptions when running nbdev_migrate --path #1510

Open
@progressEdd

Description

@progressEdd

description

I am in the process of migrating my fastpages blog to nbdev and quarto. I followed the migration guide
https://nbdev.fast.ai/tutorials/blogging.html
When running nbdev_migrate --path posts, the code raises an exception for unicode characters. I temporarily got around the issue by substituting my em dash with -. I wanted to document this in case anyone else encountered this issue and potentially identify the function that is causing the exception

how to reproduce

If you would like to replicate this, here's the source markdown file:
https://github.com/progressEdd/blog/blob/2e040a0a3fa86268555f00b525f33f2b86331491/_posts/2020-12-23-Investigating-RPiPlay-Apple-Airplay-on-Fedora-Linux.md
a snippet of the code it fails when the unicode character for a em dash is in the text—

---
keywords: fastai
description: As a graduate student during COVID-19 some of my classes were online. I found it difficult to share my iPad screen over web conferences. At that time, I did not want to install Zoom because of the numerous <a href='https://thehackernews.com/2020/08/zoom-software-vulnerabilities.html'>security vulnerabilities</a>. Furthermore, the feature of casting my iPad screen was exclusive to Zoom. If I wanted to share my screen in other applications such as Microsoft Teams or Google Meet — I needed an alternative. Over the course of a couple months, I researched and tested multiple methods to cast my iPad screen. This blog post is the fruits of my labor.
title: Investigating RPiPlay — Apple Airplay on Fedora Linux
comments: true
nb_path: _notebooks/2020-12-23-Investigating-RPiPlay-Apple-Airplay-on-Fedora-Linux.ipynb
layout: notebook
---
...

When I run

uv run nbdev_migrate --path posts
Traceback (most recent call last):
  File "/mnt/sda1/Documents/development_projects/progressEdd_projects/Python-Notebooks/personal-project/blog-migration/.venv/lib64/python3.13/site-packages/nbdev/migrate.py", line 180, in nbdev_migrate
    if f.name.endswith('.md'): migrate_md(f)
                               ~~~~~~~~~~^^^
  File "/mnt/sda1/Documents/development_projects/progressEdd_projects/Python-Notebooks/personal-project/blog-migration/.venv/lib64/python3.13/site-packages/nbdev/migrate.py", line 164, in migrate_md
    txt = fp_md_fm(path)
  File "/mnt/sda1/Documents/development_projects/progressEdd_projects/Python-Notebooks/personal-project/blog-migration/.venv/lib64/python3.13/site-packages/nbdev/migrate.py", line 100, in fp_md_fm
    return _re_fm_md.sub(_dict2fm(fm), md)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/re/__init__.py", line 377, in _compile_template
    return _sre.template(pattern, _parser.parse_template(repl, pattern))
                                  ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/re/_parser.py", line 1076, in parse_template
    raise s.error('bad escape %s' % this, len(this)) from None
re.PatternError: bad escape \u at position 618 (line 10, column 26)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/sda1/Documents/development_projects/progressEdd_projects/Python-Notebooks/personal-project/blog-migration/.venv/bin/nbdev_migrate", line 12, in <module>
    sys.exit(nbdev_migrate())
             ~~~~~~~~~~~~~^^
  File "/mnt/sda1/Documents/development_projects/progressEdd_projects/Python-Notebooks/personal-project/blog-migration/.venv/lib64/python3.13/site-packages/fastcore/script.py", line 125, in _f
    return tfunc(**merge(args, args_from_prog(func, xtra)))
  File "/mnt/sda1/Documents/development_projects/progressEdd_projects/Python-Notebooks/personal-project/blog-migration/.venv/lib64/python3.13/site-packages/nbdev/migrate.py", line 181, in nbdev_migrate
    except Exception as e: raise Exception(f'Error in migrating file: {f}') from e
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The exact exception

    return _re_fm_md.sub(_dict2fm(fm), md)
           ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/re/__init__.py", line 377, in _compile_template
    return _sre.template(pattern, _parser.parse_template(repl, pattern))
                                  ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/re/_parser.py", line 1076, in parse_template
    raise s.error('bad escape %s' % this, len(this)) from None
re.PatternError: bad escape \u at position 618 (line 10, column 26)

The above exception was the direct cause of the following exception:

The issue appears to be with this function that causes the error. the md variable needs to add some processing to substitute the escape sequences \u with \\u.

# %% ../nbs/api/16_migrate.ipynb
def fp_md_fm(path):
    "Make fastpages front matter in markdown files quarto compliant."
    p = Path(path)
    md = p.read_text()
    fm = _fm2dict(md, nb=False)
    if fm:
        fm = _fp_convert(fm, path)
        return _re_fm_md.sub(_dict2fm(fm), md)
    else: return md 

Chatgpt says there are other cases as well

  • Unicode Escapes:

    • \uXXXX: Represents a Unicode character with 4 hex digits.
    • \UXXXXXXXX: Represents a Unicode character with 8 hex digits.
    • \N{name}: Represents a Unicode character by name.
  • Hexadecimal Escapes:

    • \xXX: Represents a character with 2 hex digits.
  • Common Whitespace and Control Escapes:

    • \n: Newline.
    • \t: Tab.
    • \r: Carriage return.
    • \b: Backspace (or word boundary in regex contexts).
    • \f: Form feed.
    • \v: Vertical tab.
  • Other Special Sequences in Regex:

    • \1, \2, etc.: References to captured groups.
    • \g<name> or \g<number>: Named or numbered backreferences.

Asking it to suggest a substitution to capture all kinds of sequences for those above

md = re.sub(r'(?<!\\)\\([uUNx])', r'\\\1', md)

my environment

  • IDE: VScodium
  • Python: 3.13
    • uv
    • nbdev>=2.3.35
uv run quarto check
Quarto 1.6.42
[✓] Checking environment information...
      Quarto cache location: /home/progressedd/.cache/quarto
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.4.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.46.3: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.6.42
      Path: /home/progressedd/opt/quarto-1.6.42/bin

[✓] Checking tools....................OK
      TinyTeX: (external install)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /home/progressedd/.TinyTeX/bin/x86_64-linux
      Version: 2021

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.13.2
      Path: /mnt/sda1/Documents/development_projects/progressEdd_projects/Python-Notebooks/personal-project/blog-migration/.venv/bin/python3
      Jupyter: (None)

      Jupyter is not available in this Python installation.
      Install with python3 -m pip install jupyter

[✓] Checking R installation...........OK
      Version: 4.4.3
      Path: /usr/lib64/R
      LibPaths:
        - /usr/lib64/R/library
        - /usr/share/R/library
      knitr: 1.33
      rmarkdown: (None)

      The rmarkdown package is not available in this R installation.
      Install with install.packages("rmarkdown")

progressEdd@codium /m/s/D/d/p/P/p/blog-migration > quarto check
Quarto 1.6.42
[✓] Checking environment information...
      Quarto cache location: /home/progressedd/.cache/quarto
[✓] Checking versions of quarto binary dependencies...
      Pandoc version 3.4.0: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.46.3: OK
      Typst version 0.11.0: OK
[✓] Checking versions of quarto dependencies......OK
[✓] Checking Quarto installation......OK
      Version: 1.6.42
      Path: /home/progressedd/opt/quarto-1.6.42/bin

[✓] Checking tools....................OK
      TinyTeX: (external install)
      Chromium: (not installed)

[✓] Checking LaTeX....................OK
      Using: TinyTex
      Path: /home/progressedd/.TinyTeX/bin/x86_64-linux
      Version: 2021

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK
      Version: 3.13.2
      Path: /usr/bin/python3
      Jupyter: (None)

      Jupyter is not available in this Python installation.
      Install with python3 -m pip install jupyter

      There is an unactivated Python environment in .venv. Did you forget to activate it?

[✓] Checking R installation...........OK
      Version: 4.4.3
      Path: /usr/lib64/R
      LibPaths:
        - /usr/lib64/R/library
        - /usr/share/R/library
      knitr: 1.33
      rmarkdown: (None)

      The rmarkdown package is not available in this R installation.
      Install with install.packages("rmarkdown")

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions