Skip to content

Commit

Permalink
BugFix: fix encoding problems with QTextStreams (Mudlet#5264)
Browse files Browse the repository at this point in the history
As Leris/Kebap found in issue Mudlet#5262 Windows in particular does not default
to using UTF-8 when reading and writing text files - instead it can use an
8-bit local which (probably) depends on the end-users OS country settings.
However this is not UTF-8 and that will cause issues if such files are
used on another system or OS that does not use the same setting.

This commit forces all uses of `QTextStream` (that are not associated with
a `QString` as the underlying "device" for which it is not applicable) to
use the UTF-8 encoding.

However retro-fitting this change *may* (checks will be useful) mess with
existing files, specifically:
* The config.lua file that contains details of packages/modules in the
  `.mpackage` ZIP archive file format. This is the bug that this is
  intended to cure!
* Reading of the external Lua files that Mudlet uses itself, the
  `LuaGlobal.lua` file and the others that it itself loads - however since
  those are pure ASCII there should not be an issue with them.
* (WIndows ONLY) Reading the external `utf8_filenames.lua` file that is
  used to patch the Lua IO handlers to work with the UTF-16 file names that
  Windows uses in non-en locales. There shouldn't be an issue with this.
* Custom user dictionaries - it is possible that the existing code was
  making use of UTF-8 anyhow - it needs someone like Leris/Kebap to check
  that any custom dictionaries still work given that they had a system
  this encoding problem was happening...!

Provided it doesn't break other things this should close Mudlet#5262.

Revised to ensure QTextCode::setCodec is used AFTER setDevice

I think this may have been the cause of the odd behaviour where the code
asked for the UTF-8 codec but ended up using a different one - the
setDevice(...) method happens to reset the codec so a setCodec(...) call
must be used after it and not before!

Signed-off-by: Stephen Lyons <slysven@virginmedia.com>
Co-authored-by: Vadim Peretokin <vperetokin@gmail.com>
  • Loading branch information
SlySven and vadi2 authored Aug 14, 2021
1 parent e10eeee commit bf119ae
Show file tree
Hide file tree
Showing 6 changed files with 36 additions and 5 deletions.
17 changes: 15 additions & 2 deletions src/Host.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -419,9 +419,14 @@ Host::Host(int port, const QString& hostname, const QString& login, const QStrin
}
mErrorLogFile.setFileName(logFileName);
mErrorLogFile.open(QIODevice::Append);
// This is NOW used (for map
// file auditing and other issues)
/*
* Mudlet will log messages in ASCII, but force a universal (UTF-8) encoding
* since user-content can contain anything and someone else reviewing
* such logs need not have the same default encoding which would be used
* otherwise - note that this must be done AFTER setDevice(...):
*/
mErrorLogStream.setDevice(&mErrorLogFile);
mErrorLogStream.setCodec(QTextCodec::codecForName("UTF-8"));

QTimer::singleShot(0, this, [this]() {
qDebug() << "Host::Host() - restore map case 4 {QTimer::singleShot(0)} lambda.";
Expand Down Expand Up @@ -1973,9 +1978,17 @@ QString Host::getPackageConfig(const QString& luaConfig, bool isModule)
QStringList strings;
if (configFile.open(QIODevice::ReadOnly | QIODevice::Text)) {
QTextStream in(&configFile);
/*
* We also have to explicit set the codec to use whilst reading the file
* as otherwise QTextCodec::codecForLocale() is used which might be a
* local8Bit codec that thus will not handle all the characters
* contained in Unicode:
*/
in.setCodec(QTextCodec::codecForName("UTF-8"));
while (!in.atEnd()) {
strings += in.readLine();
}
configFile.close();
}

lua_State* L = luaL_newstate();
Expand Down
3 changes: 1 addition & 2 deletions src/TConsole.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
/***************************************************************************
* Copyright (C) 2008-2012 by Heiko Koehn - KoehnHeiko@googlemail.com *
* Copyright (C) 2014 by Ahmed Charles - acharles@outlook.com *
* Copyright (C) 2014-2016, 2018-2020 by Stephen Lyons *
* Copyright (C) 2014-2016, 2018-2021 by Stephen Lyons *
* - slysven@virginmedia.com *
* Copyright (C) 2016 by Ian Adkins - ieadkins@gmail.com *
* Copyright (C) 2020 by Matthias Urlichs matthias@urlichs.de *
Expand Down Expand Up @@ -38,7 +38,6 @@
#include <QFile>
#include <QLabel>
#include <QPointer>
#include <QTextStream>
#include <QWidget>
#include "post_guard.h"

Expand Down
11 changes: 11 additions & 0 deletions src/TLuaInterpreter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14395,6 +14395,17 @@ QString TLuaInterpreter::readScriptFile(const QString& path) const
}

QTextStream in(&file);
in.setCodec(QTextCodec::codecForName("UTF-8"));

/*
* FIXME: Qt Documentation for this method reports:
* "Reads the entire content of the stream, and returns it as a QString.
* Avoid this function when working on large files, as it will consume a
* significant amount of memory.
*
* Calling readLine() is better if you do not know how much data is
* available."
*/
QString text = in.readAll();
file.close();

Expand Down
1 change: 1 addition & 0 deletions src/TMainConsole.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@
#include <QShortcut>
#include <QTextBoundaryFinder>
#include <QTextCodec>
#include <QTextStream>
#include <QPainter>
#include "post_guard.h"

Expand Down
5 changes: 4 additions & 1 deletion src/dlgPackageExporter.cpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/***************************************************************************
* Copyright (C) 2012-2013 by Heiko Koehn - KoehnHeiko@googlemail.com *
* Copyright (C) 2014 by Ahmed Charles - acharles@outlook.com *
* Copyright (C) 2015, 2017-2020 by Stephen Lyons *
* Copyright (C) 2015, 2017-2021 by Stephen Lyons *
* - slysven@virginmedia.com *
* *
* This program is free software; you can redistribute it and/or modify *
Expand Down Expand Up @@ -775,6 +775,7 @@ void dlgPackageExporter::exportXml(bool& isOk,
// seen the error message...
}
}

void dlgPackageExporter::writeConfigFile(const QString& stagingDirName, const QFileInfo& iconFile, const QString& packageDescription)
{
QStringList dependencies;
Expand All @@ -801,11 +802,13 @@ void dlgPackageExporter::writeConfigFile(const QString& stagingDirName, const QF
QFile configFile(luaConfig);
if (configFile.open(QIODevice::WriteOnly | QIODevice::Text)) {
QTextStream out(&configFile);
out.setCodec(QTextCodec::codecForName("UTF-8"));
out << mPackageConfig;
out.flush();
configFile.close();
}
}

QFileInfo dlgPackageExporter::copyIconToTmp(const QString& tempPath) const
{
QFileInfo iconFile(mPackageIconPath);
Expand Down
4 changes: 4 additions & 0 deletions src/mudlet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3838,6 +3838,7 @@ bool mudlet::scanDictionaryFile(QFile& dict, int& oldWC, QHash<QString, unsigned
}

QTextStream ds(&dict);
ds.setCodec(QTextCodec::codecForName("UTF-8"));
QString dictionaryLine;
ds.readLineInto(&dictionaryLine);

Expand Down Expand Up @@ -3905,6 +3906,7 @@ bool mudlet::overwriteDictionaryFile(QFile& dict, const QStringList& wl)
}

QTextStream ds(&dict);
ds.setCodec(QTextCodec::codecForName("UTF-8"));
ds << qMax(0, wl.count());
if (!wl.isEmpty()) {
ds << QChar(QChar::LineFeed);
Expand All @@ -3928,6 +3930,7 @@ int mudlet::getDictionaryWordCount(QFile &dict)
}

QTextStream ds(&dict);
ds.setCodec(QTextCodec::codecForName("UTF-8"));
QString dictionaryLine;
// Read the header line containing the word count:
ds.readLineInto(&dictionaryLine);
Expand Down Expand Up @@ -3974,6 +3977,7 @@ bool mudlet::overwriteAffixFile(QFile& aff, QHash<QString, unsigned int>& gc)
}

QTextStream as(&aff);
as.setCodec(QTextCodec::codecForName("UTF-8"));
as << affixLines.join(QChar::LineFeed).toUtf8();
as << QChar(QChar::LineFeed);
as.flush();
Expand Down

0 comments on commit bf119ae

Please sign in to comment.