A PHP function that truncates (shortens) a given HTML5 string to a max number of characters.
Example: truncate after 6 characters including the ellipsis:
<p><b>A</b> red ball.</p> => <p><b>A</b> red…</p>
Compatible with PHP 5.6 and 7+
Uses the mbstring PHP extension for UTF-8.
More than 240 unit tests (see or run: unittest.php)
The function is in truncateHTML.php, you can just copy/paste it to your project.
- Quickly truncate most common HTML5 sources without using a full HTML parser (which is ~100x slower).
- Configurable ellipsis:
…,...,<a href="">More</a>, etc.- Can include the length of the ellipsis in the truncated result.
- Supports self-closing tags like:
<img>,<img/>,<newtag /> - Collapsing spaces: sequences of multiple spaces are counted only once (including
<br>, and a few others) - Don't count characters in invisible elements like:
<head>,<script>,<noscript>,<style>,<!-- comments --> - Supports HTML entities (
,…,", etc.) - Whole word: can truncate at the end of the last word instead of cutting in the middle of a word.
- Cut long words: can truncate in the middle of a word if it is very long (useful to truncate an URL)
- Truncates before the error in case of malformed HTML (like a mismatched closing tag)
- UTF-8 support (multibyte characters)
// Example from the introduction:
truncateHTML(6, "<p><b>A</b> red ball.</p>");
// => "<p><b>A</b> red…</p>"
// Whole word:
truncateHTML(5, "<blockquote>A lumberjack</blockquote>");
// => "<blockquote>A…</blockquote>"
// Without whole word, without includeEllipsisLength:
truncateHTML(5, "<blockquote>A lumberjack</blockquote>", ['wholeWord' => false, 'includeEllipsisLength' => false]);
// => "<blockquote>A lum…</blockquote>"
// Whole word: example of cutting only long words:
truncateHTML( 5, "<a href='https://php.net/docs.php'>https://php.net/docs.php</a>");
// => "…" Notice how wholeWord truncates before opening a tag that would be left empty.
truncateHTML(20, "<a href='https://php.net/docs.php'>https://php.net/docs.php</a>");
// => "<a href='https://php.net/docs.php'>https://php.net/doc…</a>"
// Comments, scripts and styles are not counted:
truncateHTML(3, "<script>$();</script><!-- Start div --><div>Hi</div><!-- End div --> More text.");
// => "<script>$();</script><!-- Start div --><div>Hi…</div>"
// Collapsing multiple spaces:
truncateHTML(6, "A <br> \n\t long space!");
// => "A <br> \n\t long…"
// Tag mismatch: truncates before the error:
truncateHTML(99, "Click</a>here</a>");
// => "Click…"string truncateHTML(int $maxLength, string $html, array $options = [])
-
$maxLength: the returned HTML will contain at most $maxLength countable characters. If negative, remove $maxLength countable characters from the end of the $html. -
$html: the input HTML string that will be truncated. -
$options: (optional) an array of options:Options (with default value) Descriptions 'ellipsis'=>'…'
(or:'ellipsis'=>'...')The ellipsis that will be included. Can be an empty string, can contain HTML tags.
('…'is the horizontal ellipsis character, ie.'...'as a single unicode character)
(If not using UTF-8 mode, the default value will be'...'instead of'…')'includeEllipsisLength'=>trueWhether to include the length of the ellipsis in the length of the truncated result. 'wholeWord'=>trueWhen truncating, don't cut in the middle of a word. Instead cut at the end of the last word. 'cutWord'=>18When wholeWordis enabled, allows to cut long words aftercutWordcharacters (Set to0orfalseto disable)'utf8'=>trueUse UTF-8 mode. You should always use UTF-8 though.
Ifutf8isfalse, only ASCII-compatible single-byte encodings (such as Latin-1) are supported. For other encodings, use mb_convert_encoding to convert to UTF-8 and back.
(If UTF-8 is disabled, the default ellipsis will be'...'instead of'…')
XHTML: probably works in most cases, but is untested.
Not supported:
- Malformed HTML, badly nested tags, missing closing tags: it doesn't try to guess the correct fix (for this you would need a full HTML parser).
Note: when meeting an unexpected closing tag: it always truncates before the closing tag (see the examples). - Uncommon HTML code like:
- HTML tags inside an HTML Tag attribute:
<img title="Hello<br>World">
- HTML tags inside an HTML Tag attribute:
- The string
</script>inside<script>code…</script>. For this you would need a full HTML parser, or a JavaScript parser. (Other tags are ok, but don't have a closing tag</script>in a JavaScript string or comment) - The string
</style>inside<style>code…</style>. For this you would need a full HTML parser, or a CSS parser. (Other tags are ok, but don't have a closing tag</style>in a CSS comment) - XML
- CDATA (deprecated in HTML5)
If you find more, please open an issue.
- v1.0.1 (9 Feb. 2018):
- Fix multibyte characters in regex
- Add parameter types verifications
- v1.0 (5 Feb. 2018):
- Initial version
- Inspired by: