Skip to content

Commit 2e0962e

Browse files
authored
Support unicode ids in toc (Python-Markdown#970)
A second function, `slugify_unicode` was added rather than changing the existing function so as to maintain backward compatibility. While an `encoding` parameter was added to the `slugify` function, we can't expect existing third party functions to accept a third parameter. Therefore, the two parameter API was preserved with this change.
1 parent b701c34 commit 2e0962e

File tree

4 files changed

+41
-4
lines changed

4 files changed

+41
-4
lines changed

docs/change_log/release-3.3.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,13 @@ The following new features have been included in the 3.3 release:
8181
maintain the current behavior in the rebuilt Markdown in HTML extension. A few random
8282
edge-case bugs (see the included tests) were resolved in the process (#803).
8383

84+
* An alternate function `markdown.extensions.headerid.slugify_unicode` has been included
85+
with the [Table of Contents](../extensions/toc.md) extension which supports Unicode
86+
characters in table of contents slugs. The old `markdown.extensions.headerid.slugify`
87+
method which removes non-ASCII characters remains the default. Import and pass
88+
`markdown.extensions.headerid.slugify_unicode` to the `slugify` configuration option
89+
to use the new behavior.
90+
8491
## Bug fixes
8592

8693
The following bug fixes are included in the 3.3 release:

docs/extensions/toc.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,9 @@ The following options are provided to configure the output:
202202

203203
The callable must return a string appropriate for use in HTML `id` attributes.
204204

205+
An alternate version of the default callable supporting Unicode strings is also
206+
provided as `markdown.extensions.headerid.slugify_unicode`.
207+
205208
* **`separator`**:
206209
Word separator. Character which replaces white space in id. Defaults to "`-`".
207210

markdown/extensions/toc.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,16 @@
2323
import xml.etree.ElementTree as etree
2424

2525

26-
def slugify(value, separator):
26+
def slugify(value, separator, encoding='ascii'):
2727
""" Slugify a string, to make it URL friendly. """
28-
value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
29-
value = re.sub(r'[^\w\s-]', '', value.decode('ascii')).strip().lower()
30-
return re.sub(r'[%s\s]+' % separator, separator, value)
28+
value = unicodedata.normalize('NFKD', value).encode(encoding, 'ignore')
29+
value = re.sub(r'[^\w\s-]', '', value.decode(encoding)).strip().lower()
30+
return re.sub(r'[{}\s]+'.format(separator), separator, value)
31+
32+
33+
def slugify_unicode(value, separator):
34+
""" Slugify a string, to make it URL friendly while preserving Unicode characters. """
35+
return slugify(value, separator, 'utf-8')
3136

3237

3338
IDCOUNT_RE = re.compile(r'^(.*)_([0-9]+)$')

tests/test_syntax/extensions/test_toc.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,3 +141,25 @@ def testPermalinkWithEmptyTitle(self):
141141
'</h1>', # noqa
142142
extensions=[TocExtension(permalink=True, permalink_title="")]
143143
)
144+
145+
def testPermalinkWithUnicodeInID(self):
146+
from markdown.extensions.toc import slugify_unicode
147+
self.assertMarkdownRenders(
148+
'# Unicode ヘッダー',
149+
'<h1 id="unicode-ヘッター">' # noqa
150+
'Unicode ヘッダー' # noqa
151+
'<a class="headerlink" href="#unicode-ヘッター" title="Permanent link">&para;</a>' # noqa
152+
'</h1>', # noqa
153+
extensions=[TocExtension(permalink=True, slugify=slugify_unicode)]
154+
)
155+
156+
def testPermalinkWithUnicodeTitle(self):
157+
from markdown.extensions.toc import slugify_unicode
158+
self.assertMarkdownRenders(
159+
'# Unicode ヘッダー',
160+
'<h1 id="unicode-ヘッター">' # noqa
161+
'Unicode ヘッダー' # noqa
162+
'<a class="headerlink" href="#unicode-ヘッター" title="パーマリンク">&para;</a>' # noqa
163+
'</h1>', # noqa
164+
extensions=[TocExtension(permalink=True, permalink_title="パーマリンク", slugify=slugify_unicode)]
165+
)

0 commit comments

Comments
 (0)