Skip to content

Commit

Permalink
Improvement: Mxp entity & special character support in normal mud out…
Browse files Browse the repository at this point in the history
…put (Mudlet#6903)

<!-- Keep the title short & concise so anyone non-technical can
understand it,
     the title appears in PTB changelogs -->
#### Brief overview of PR changes/additions
Mudlet was only able to display entities with only one Latin1 character
in normal mud output.
This fix removes this restriction and also enables the use of special
characters represented by MXP default entities.
#### Motivation for adding to Mudlet
Improve MXP compliance
#### Other info (issues closed, discussion etc)
The patch changes small tidbits in several places: TBuffer was not able
to add additional characters to an input line, also the inserted entity
content must be reparsed to see if an MXP element is contained or to
interprete any non-Latin1 encoding while not risking a deadlock by
trying to resolve a recursive entity definition. This also required a
way for the entity resolver if a given entity is custom, default, or
unknown.

There are some screenshots and remarks at the end of
https://forums.mudlet.org/viewtopic.php?f=7&t=23206&start=10 .

Also, you can see how this works by connection to aldebaran-mud.de login
as a guest char (hit y to confirm) then "set mxp test 5", "set mxp test
6" or, for insights: "set mxp test 5 source", "set mxp test 6 source"
  • Loading branch information
eowmob authored Aug 31, 2024
1 parent b49c1bc commit 2958e5c
Show file tree
Hide file tree
Showing 11 changed files with 254 additions and 54 deletions.
92 changes: 86 additions & 6 deletions src/TBuffer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -437,12 +437,27 @@ void TBuffer::translateToPlainText(std::string& incoming, const bool isFromServe
// repeated switch(...) and branch to one of a series of decoding methods
// each with another up to 128 value switch()

size_t const localBufferLength = localBuffer.length();
size_t localBufferLength = localBuffer.length();
size_t localBufferPosition = 0;
if (!localBufferLength) {
return;
}

// If we are resolving/interpolating an MXP entity, the interpolated text
// ends at localBuffer[endOfMXPEntity - 1]. This variable used to avoid an
// (infinite) recursion like <!EN E "foobar&E;>&E;
// Recursively interpolating a predefined entity like <!EN E "foobar&frac12;>&E;
// will work though.
size_t endOfMXPEntity = 0;

// A similar index which points behind the name of a literal entity name like
// &unknown; which does not exist and will be printed literal, w/o
// any MXP interpretation. Again, this avoid endless recursion trying to
// resolve an unsolvable entity. We need the hassle in both cases, as the
// the resolved values may be in a character encoding that must be decoded by
// Mudlet.
size_t endOfLiteralEntity = 0;

while (true) {
if (localBufferPosition >= localBufferLength) {
return;
Expand Down Expand Up @@ -689,17 +704,82 @@ void TBuffer::translateToPlainText(std::string& incoming, const bool isFromServe

// We are outside of a CSI or OSC sequence if we get to here:

if (mpHost->mMxpProcessor.isEnabled()) {
if (localBufferPosition >= endOfLiteralEntity && mpHost->mMxpProcessor.isEnabled()) {
if (mpHost->mServerMXPenabled) {
if (mpHost->mMxpProcessor.mode() != MXP_MODE_LOCKED) {
TMxpProcessingResult const result = mpHost->mMxpProcessor.processMxpInput(ch);
if (result == HANDLER_NEXT_CHAR) {
// The comparison signals to the processor, if custom entities may be resolved
// (countermeasure against infinite recursion)
TMxpProcessingResult const result =
mpHost->mMxpProcessor.processMxpInput(ch, localBufferPosition >= endOfMXPEntity);

switch (result) {
case HANDLER_NEXT_CHAR:
localBufferPosition++;
continue;
} else if (result == HANDLER_COMMIT_LINE) { // BR tag
case HANDLER_COMMIT_LINE: // BR tag or &newline;
ch = '\n';
goto COMMIT_LINE;
} else { //HANDLER_FALL_THROUGH -> do nothing
case HANDLER_INSERT_ENTITY_CUST:
// custom entity value set with <!EN>, recurse except for other custom entities
[[fallthrough]];
case HANDLER_INSERT_ENTITY_LIT: {
// Unknown entity name like &unknown; push back into buffer for codeset interpretation,
// but no MXP parsing.

// We replace the already processed text with the entity value into the buffer and restart
// processing it for charset encoding but with limited MXP handling
size_t valueLength = mpHost->mMxpProcessor.getEntityValue().length();
localBuffer.replace(0, localBufferPosition + 1, mpHost->mMxpProcessor.getEntityValue().toLatin1());

if (result == HANDLER_INSERT_ENTITY_LIT) {
if (localBufferPosition < endOfMXPEntity) {
// This is a special case, our unknown entity might actually be a custom one
// inside a custom one which we refused to resolve to avoid an endless recursion.
// So we carefully adjust the end marker s.t. custom entities are not reenabled
// too early
endOfMXPEntity -= localBufferPosition + 1 - valueLength;
endOfLiteralEntity = valueLength;
} else {
endOfMXPEntity = valueLength;
}
endOfLiteralEntity = valueLength;
} else {
// HANDLER_INSERT_ENTITY_CUST
endOfMXPEntity = valueLength;
endOfLiteralEntity = 0;
}

// Now restart the loop to parse the newly inserted text
localBufferLength = localBuffer.length();
localBufferPosition = 0;
continue;
}
case HANDLER_INSERT_ENTITY_SYS: {
// System entities are literal QString / UTF values which we just 'print'
// There is no further MXP or Codeset evaluation

const TChar::AttributeFlags attributeFlags =
((mIsDefaultColor ? mBold || mpHost->mMxpClient.bold() : false) ? TChar::Bold : TChar::None)
| (mItalics || mpHost->mMxpClient.italic() ? TChar::Italic : TChar::None)
| (mOverline ? TChar::Overline : TChar::None)
| (mReverse ? TChar::Reverse : TChar::None)
| (mStrikeOut || mpHost->mMxpClient.strikeOut() ? TChar::StrikeOut : TChar::None)
| (mUnderline || mpHost->mMxpClient.underline() ? TChar::Underline : TChar::None);

TChar c((!mIsDefaultColor && mBold) ? mForeGroundColorLight : mForeGroundColor, mBackGroundColor, attributeFlags);

size_t valueLength = mpHost->mMxpProcessor.getEntityValue().length();
mMudLine.append(mpHost->mMxpProcessor.getEntityValue());
// We also need to set the color attributes for the special character
while (valueLength--) {
mMudBuffer.push_back(c);
}
// We already handled the input, go to the next character
localBufferPosition++;
continue;
}
default:
//HANDLER_FALL_THROUGH -> do nothing
assert(localBuffer[localBufferPosition] == ch);
}
} else {
Expand Down
13 changes: 5 additions & 8 deletions src/TEntityHandler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,25 +23,22 @@
#include "TEntityHandler.h"

// returns true if the char is handled by the EntityHandler (i.e. it is part of an entity)
bool TEntityHandler::handle(char character)
bool TEntityHandler::handle(char character, bool resolveCustomEntities)
{
if (character == ';' && !mCurrentEntity.isEmpty()) { // END OF ENTITY
mCurrentEntity.append(character);

QString resolved = mpEntityResolver.getResolution(mCurrentEntity);
// we only get the last character, current implementation of TBuffer loop is based on one char at a time
// TODO: it could be interesting to have a way to send longer sequences to the buffer
mResult = resolved.back().toLatin1();

mResult = mpEntityResolver.getResolution(mCurrentEntity, resolveCustomEntities, &entityType);
mIsResolved = true;
mCurrentEntity.clear();
return true;
} else if (character == '&' || !mCurrentEntity.isEmpty()) { // START OR MIDDLE OF ENTITY
mIsResolved = false;
entityType = ENTITY_TYPE_UNKNOWN;
mCurrentEntity.append(character);
return true;
} else if (mCurrentEntity.length() > 7) { // LONG ENTITY? MAYBE INVALID... IGNORE IT
reset();
entityType = ENTITY_TYPE_UNKNOWN;
return false;
} else {
return false;
Expand All @@ -57,7 +54,7 @@ void TEntityHandler::reset()
mCurrentEntity.clear();
mIsResolved = false;
}
char TEntityHandler::getResultAndReset()
QString TEntityHandler::getResultAndReset()
{
reset();
return mResult;
Expand Down
9 changes: 5 additions & 4 deletions src/TEntityHandler.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,19 +34,20 @@ class TEntityHandler
: mpEntityResolver(pResolver)
{}

bool handle(char character);
bool handle(char character, bool resolveCustomEntities);
void reset();

bool isEntityResolved() const;
char getResultAndReset();
QString getResultAndReset();
inline TEntityType getEntityType(void) {return entityType;}

private:
const TEntityResolver& mpEntityResolver;

QString mCurrentEntity;
bool mIsResolved = false;
char mResult = '\0';

QString mResult;
TEntityType entityType = ENTITY_TYPE_UNKNOWN;
};

#endif //MUDLET_TENTITYHANDLER_H
111 changes: 103 additions & 8 deletions src/TEntityResolver.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,55 @@
#include "TEntityResolver.h"
#include "utils.h"

QString TEntityResolver::getResolution(const QString& entity) const
QString TEntityResolver::getResolution(const QString& entity, bool resolveCustomEntities, TEntityType *entityType) const
{
if (entity.front() != '&' || entity.back() != ';') {
if (entityType) {
*entityType = ENTITY_TYPE_UNKNOWN;
}
return entity;
}

auto ptr = mEntititesMap.find(entity.toLower());
if (ptr != mEntititesMap.end()) {
return *ptr;
if (resolveCustomEntities) {
auto ptr = mEntititesMap.find(entity.toLower());
if (ptr != mEntititesMap.end()) {
if (entityType) {
*entityType = ENTITY_TYPE_CUSTOM;
}
return *ptr;
}
}

auto stdPtr = scmStandardEntites.find(entity.toLower());
// Although Mudlet ignores case, MXP defines entity names as case-sensitive.
// Also, the predefined entities &Auml; and &auml; are for example different.
// So we first check for an exact match:
auto stdPtr = scmStandardEntites.find(entity);
if (stdPtr != scmStandardEntites.end()) {
if (entityType) {
*entityType = ENTITY_TYPE_SYSTEM;
}
return *stdPtr;
}

// then see if there is at least a case-insensitive match for backwards compatibility:
stdPtr = scmStandardEntites.find(entity.toLower());
if (stdPtr != scmStandardEntites.end()) {
if (entityType) {
*entityType = ENTITY_TYPE_SYSTEM;
}
return *stdPtr;
}

return entity[1] == '#' ? resolveCode(entity.mid(2, entity.size() - 3)) : entity;
if (entity[1] == '#') {
if (entityType) {
*entityType = ENTITY_TYPE_SYSTEM;
}
return resolveCode(entity.mid(2, entity.size() - 3));
}
if (entityType) {
*entityType = ENTITY_TYPE_UNKNOWN;
}
return entity;
}

bool TEntityResolver::registerEntity(const QString& entity, const QString& str)
Expand Down Expand Up @@ -147,14 +178,17 @@ const QHash<QString, QString> TEntityResolver::scmStandardEntites = {
{qsl("&ordf;"), qsl("ª")},
{qsl("&laquo;"), qsl("«")},
{qsl("&not;"), qsl("¬")},
{qsl("&shy;"), qsl("­")},
{qsl("&shy;"), QChar(0x00AD)},
{qsl("&reg;"), qsl("®")},
{qsl("&macr;"), qsl("¯")},
{qsl("&deg;"), qsl("°")},
{qsl("&plusmn;"), qsl("±")},
{qsl("&divide;"), qsl("÷")},
{qsl("&times;"), qsl("×")},
{qsl("&sup2;"), qsl("²")},
{qsl("&sup3;"), qsl("³")},
{qsl("&acute;"), qsl("´")},
{qsl("&uml;"), qsl("¨")},
{qsl("&micro;"), qsl("µ")},
{qsl("&para;"), qsl("")},
{qsl("&middot;"), qsl("·")},
Expand All @@ -165,6 +199,67 @@ const QHash<QString, QString> TEntityResolver::scmStandardEntites = {
{qsl("&frac14;"), qsl("¼")},
{qsl("&frac12;"), qsl("½")},
{qsl("&frac34;"), qsl("¾")},
{qsl("&iquest;"), qsl("¿")}
{qsl("&iquest;"), qsl("¿")},
{qsl("&Aacute;"), qsl("Á")},
{qsl("&aacute;"), qsl("á")},
{qsl("&Acirc;"), qsl("Â")},
{qsl("&acirc;"), qsl("â")},
{qsl("&AElig;"), qsl("Æ")},
{qsl("&aelig;"), qsl("æ")},
{qsl("&Agrave;"), qsl("À")},
{qsl("&agrave;"), qsl("à")},
{qsl("&Aring;"), qsl("Å")},
{qsl("&aring;"), qsl("å")},
{qsl("&Atilde;"), qsl("Ã")},
{qsl("&atilde;"), qsl("ã")},
{qsl("&Auml;"), qsl("Ä")},
{qsl("&auml;"), qsl("ä")},
{qsl("&Ccedil;"), qsl("Ç")},
{qsl("&ccedil;"), qsl("ç")},
{qsl("&Eacute;"), qsl("É")},
{qsl("&eacute;"), qsl("é")},
{qsl("&Ecirc;"), qsl("Ê")},
{qsl("&ecirc;"), qsl("ê")},
{qsl("&Egrave;"), qsl("È")},
{qsl("&egrave;"), qsl("è")},
{qsl("&Euml;"), qsl("Ë")},
{qsl("&euml;"), qsl("ë")},
{qsl("&Iacute;"), qsl("Í")},
{qsl("&iacute;"), qsl("í")},
{qsl("&Icirc;"), qsl("Î")},
{qsl("&icirc;"), qsl("î")},
{qsl("&Igrave;"), qsl("Ì")},
{qsl("&igrave;"), qsl("ì")},
{qsl("&Iuml;"), qsl("Ï")},
{qsl("&iuml;"), qsl("ï")},
{qsl("&ETH;"), qsl("Ð")},
{qsl("&eth;"), qsl("ð")},
{qsl("&Ntilde;"), qsl("Ñ")},
{qsl("&ntilde;"), qsl("ñ")},
{qsl("&Oacute;"), qsl("Ó")},
{qsl("&oacute;"), qsl("ó")},
{qsl("&Ocirc;"), qsl("Ô")},
{qsl("&ocirc;"), qsl("ô")},
{qsl("&Ograve;"), qsl("Ò")},
{qsl("&ograve;"), qsl("ò")},
{qsl("&Oslash;"), qsl("Ø")},
{qsl("&oslash;"), qsl("ø")},
{qsl("&Otilde;"), qsl("Õ")},
{qsl("&otilde;"), qsl("õ")},
{qsl("&Ouml;"), qsl("Ö")},
{qsl("&ouml;"), qsl("ö")},
{qsl("&Uacute;"), qsl("Ú")},
{qsl("&uacute;"), qsl("ú")},
{qsl("&Ucirc;"), qsl("Û")},
{qsl("&ucirc;"), qsl("û")},
{qsl("&Ugrave;"), qsl("Ù")},
{qsl("&ugrave;"), qsl("ù")},
{qsl("&Uuml;"), qsl("Ü")},
{qsl("&uuml;"), qsl("ü")},
{qsl("&Yacute;"), qsl("Ý")},
{qsl("&yacute;"), qsl("ý")},
{qsl("&THORN;"), qsl("Þ")},
{qsl("&thorn;"), qsl("þ")},
{qsl("&szlig;"), qsl("ß")}
};
// clang-format on
7 changes: 6 additions & 1 deletion src/TEntityResolver.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
#include "post_guard.h"
#include <functional>

enum TEntityType { ENTITY_TYPE_SYSTEM, ENTITY_TYPE_CUSTOM, ENTITY_TYPE_UNKNOWN };

class TEntityResolver
{
public:
Expand All @@ -40,7 +42,10 @@ class TEntityResolver
bool registerEntity(const QString& entity, const QString& str);
bool unregisterEntity(const QString& entity);

QString getResolution(const QString& entityValue) const;
// Having this optional pointer argument may not be optimal, but some callers must know the type
// and we cannot have a private variable recording it as many classes calling us are using this
// as a const class.
QString getResolution(const QString& entityValue, bool resolveCustomEntities = true, TEntityType *entityType = nullptr) const;

static QString resolveCode(ushort val);
static QString resolveCode(const QString& entityValue);
Expand Down
15 changes: 12 additions & 3 deletions src/TMxpProcessor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ void TMxpProcessor::enable()
mMXP = true;
}

TMxpProcessingResult TMxpProcessor::processMxpInput(char& ch)
TMxpProcessingResult TMxpProcessor::processMxpInput(char& ch, bool resolveCustomEntities)
{
if (!mMxpTagBuilder.accept(ch) && mMxpTagBuilder.isInsideTag() && !mMxpTagBuilder.hasTag()) {
return HANDLER_NEXT_CHAR;
Expand All @@ -163,9 +163,18 @@ TMxpProcessingResult TMxpProcessor::processMxpInput(char& ch)
return result == MXP_TAG_COMMIT_LINE ? HANDLER_COMMIT_LINE : HANDLER_NEXT_CHAR;
}

if (mEntityHandler.handle(ch)) { // ch is part of an entity
if (mEntityHandler.handle(ch, resolveCustomEntities)) { // ch is part of an entity
if (mEntityHandler.isEntityResolved()) { // entity has been mapped (i.e. ch == ';')
ch = mEntityHandler.getResultAndReset();
lastEntityValue = mEntityHandler.getResultAndReset();
switch (mEntityHandler.getEntityType()) {
case ENTITY_TYPE_CUSTOM:
return HANDLER_INSERT_ENTITY_CUST;
case ENTITY_TYPE_SYSTEM:
// Note special handling for '\n' as a result of &newline;
return lastEntityValue == qsl("\n") ? HANDLER_COMMIT_LINE : HANDLER_INSERT_ENTITY_SYS;
default:
return HANDLER_INSERT_ENTITY_LIT;
}
} else { // ask for the next char
return HANDLER_NEXT_CHAR;
}
Expand Down
Loading

0 comments on commit 2958e5c

Please sign in to comment.