Skip to content

Commit

Permalink
rewrite strip_nastyhtml, strip_html in Qt (#1341)
Browse files Browse the repository at this point in the history
* rewrite strip_html with QString.

* rewrite strip_nasty_html in Qt.

and actually produce valid html:
1. the replacement for "<body>", "<!   >", is invalid.
2. leaving an html tag in causes the html format output to be invalid.

* cleanup comment xstrdup

* use regex for strip_html

* strip_html deletes other tags

* fix strip_html img tag handling

* Revert "fix strip_html img tag handling"

This reverts commit b0440f7.

* Revert "strip_html deletes other tags"

This reverts commit 40fe2ef.

* Revert "use regex for strip_html"

This reverts commit 677da95.

* implement strip_html using QRegularExpressionMatchIterator.

* a little cleanup

* remove obsolete include

* take care to distinguish tags with common roots

like p, param, pre.

* suppress InvalidReads in qhash.

These are known to occur per comment in qhash.cpp.

* supress qhash false positive with libqt6core6/jammy-updates,now 6.2.4+dfsg-2ubuntu1.1 amd64

* suppress vg warnings on noble (intermittant).

* install qt core dbgsyms for valgrind suppression.

* kill space preceding newline when stripping html.

* use modernize-raw-string-literal

* valgrind suppressions for f40

* add symbols for fedora valgrind suppression.
  • Loading branch information
tsteven4 authored Sep 23, 2024
1 parent b3d9a51 commit d7c2ad3
Show file tree
Hide file tree
Showing 17 changed files with 284 additions and 169 deletions.
2 changes: 1 addition & 1 deletion exif.cc
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
#include <cstring> // for memcmp, strlen
#include <utility> // for as_const

#include "defs.h" // for Waypoint, fatal, warning, global_options, global_opts, unknown_alt, xfree, route_disp_all, track_disp_all, waypt_disp_all, wp_flags, KNOTS_TO_MPS, KPH_TO_MPS, MPH_TO_MPS, MPS_TO_KPH, WAYPT_HAS, case_ignore_strcmp, waypt_add, xstrdup, fix_2d
#include "defs.h" // for Waypoint, fatal, warning, global_options, global_opts, unknown_alt, xfree, route_disp_all, track_disp_all, waypt_disp_all, wp_flags, KNOTS_TO_MPS, KPH_TO_MPS, MPH_TO_MPS, MPS_TO_KPH, WAYPT_HAS, case_ignore_strcmp, waypt_add, fix_2d
#include "garmin_tables.h" // for gt_lookup_datum_index
#include "gbfile.h" // for gbfputuint32, gbfputuint16, gbfgetuint16, gbfgetuint32, gbfseek, gbftell, gbfile, gbfclose, gbfcopyfrom, gbfwrite, gbfopen_be, gbfread, gbfrewind, gbfgetflt, gbfgetint16, gbfopen, gbfputc, gbfputflt, gbsize_t, gbfeof, gbfgetdbl, gbfputdbl, gbfile::(anonymous)
#include "jeeps/gpsmath.h" // for GPS_Math_WGS84_To_Known_Datum_M
Expand Down
36 changes: 36 additions & 0 deletions gpsbabel.supp
Original file line number Diff line number Diff line change
@@ -1,3 +1,39 @@
{
<Fedora 40 vtesto text, qt6-qtbase.x86_64 6.7.2-6.fc40 intermittant>
Memcheck:Addr16
fun:UnknownInlinedFun
fun:aeshash128_lt16
fun:_ZL10aeshash128PKhmmm
fun:calculateHash<QStringView>
}
{
<Ubnutu jammy vtesto text, libqt6core6/jammy-updates,now 6.2.4+dfsg-2ubuntu1.1 amd64 >
Memcheck:Addr16
fun:UnknownInlinedFun
fun:_ZL7aeshashPKhmm
fun:calculateHash<QStringView>
}
{
<Ubuntu noble vtesto text, libqt6core6t64/noble,now 6.4.2+dfsg-21.1build5 amd64>
Memcheck:Addr16
fun:UnknownInlinedFun
fun:_ZL15aeshash128_lt16Dv2_xPKhm
}
{
<Ubuntu Jammy vtesto text, qtio qhash.cpp 6.2.4>
Memcheck:Addr16
fun:_mm_loadu_si128
fun:_ZL7aeshashPKhmm
fun:calculateHash<QStringView>
}
{
<Ubuntu Jammy vtesto text, qtio qhash.cpp 6.7.2 intermittant>
Memcheck:Addr16
fun:_mm_loadu_si128
fun:aeshash128_lt16
fun:_ZL10aeshash128PKhmmm
fun:calculateHash<QStringView>
}
{
<Fedora18: Qt's use of libuuc leaks.>
Memcheck:Leak
Expand Down
12 changes: 6 additions & 6 deletions reference/gc/GC7FA4.text
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@ Points géodésiques du Québec by Sverdrup2 - Locationless (Reverse) Cache / Vi

LES COORDONÉES PUBLIÉES NE REPRÉSENTENT PAS LA LOCALISATION D'UNE CACHE PUBLISHED COORDINATES DO NOT REPRESENT THE LOCALIZATION OF A CACHE

Le but de cette cache virtuelle est de trouver les points géodésiques du territoire québécois. Les points géodésiques sont faciles à identifier (capuchons de laiton au niveau du sol). Généralement, il y a un panneau de couleur orange sur un poteau à proximité du point. Sur ce panneau, le numéro du point est identifié. Aussi, la distance relative du panneau au point est indiquée.
Le but de cette cache virtuelle est de trouver les points géodésiques du territoire québécois. Les points géodésiques sont faciles à identifier (capuchons de laiton au niveau du sol). Généralement, il y a un panneau de couleur orange sur un poteau à proximité du point. Sur ce panneau, le numéro du point est identifié. Aussi, la distance relative du panneau au point est indiquée.
Pour inscrire votre découverte, vous devez prendre en note le NUMÉRO DU POINT(inscrit sur le point même ou au centre du panneau)LA COORDONNÉE(en format HDDD MM.MM WGS84 datum ET UTM NAD83 indiquer la zone SVP)et L'ALTITUDE RELATIVE. Si le points n'est pas visible (il se peut qu'il soit sous quelques centimètres de terre) vous pouvez prendre la coordonnée à l'emplacement du panneau SI LA PRÉCISION DE VOTRE GPS EST SUPÉRIEUR À LA DISTANCE INSCRITE SUR LE PANNEAU (ex : Précison du GPS de 5m et distance au point inscrite sur le panneau de 3m).
Une photo du point ou du panneau et une description générale des lieux serait aussi des informations importantes.
Enfin, il faudrait aussi prendre en note l'organisme propriétaire du point géodésique. Au Québec il en existe plusieurs:
Le Service de la géodésie du Québec, Ministère des Ressources naturelles, Québec
La Division des levés géodésiques, Géomatique Canada, Secteur des sciences de la terre Ressources naturelles Canada
Le Service hydrographique du Canada, Direction des sciences, Pêches et Océans Canada et la Garde côtière canadienne, Pêches et Océans Canada
La Division des levés géodésiques, Géomatique Canada, Secteur des sciences de la terre Ressources naturelles Canada
Le Service hydrographique du Canada, Direction des sciences, Pêches et Océans Canada et la Garde côtière canadienne, Pêches et Océans Canada
Et tout les anciens noms de ministères et/ou organisme
Des photos de points de même que des panneaux suivront bientôt. VOUS NE POUVEZ INSCRIRE QU'UN SEUL POINT GÉODÉSIQUE (UN POINT PAR GÉOCACHEUR) Bonne chance!
The goal of this virtual cache is to find the geodetic points of Québec’s territory. The geodetic points are easy to identify (Brass cap at ground level) Generally, there is an orange panel of on a post near the point. On this panel, the number of the point is identified. Also, the distance relating from the panel to the point is also indicated. In order to log your find, you must take in note THE NUMBER OF THE POINT(registered on the point or in the center of the panel) and THE COORDINATES(in format HDDD MM.MM WGS84 datum AND UTM NAD83 indicate the zone please)and THE ALTITUDE. If the point is not visible (it may be buried under few centimetres) you can take the coordinate at the panel IF THE ACCURACY OF YOUR GPS IS HIGHER Than the DISTANCE REGISTERED ON the PANEL. (Ex: accuracy of the GPS is 5m and the distance to the point registered on the panel is 3m).
A picture of the point or panel and a general description of the places would be also significant information. Finally, it would also be important to take in note the organization owner of the geodetic point. In Quebec there are several:
The "Service de la géodésie du Québec, Ministère des Ressources naturelles Québec" The Geodetic Survey Division, Geomatics Canada, Earth Sciences Sector, Natural Resources Canada The Canadian Hydrographic Service, Sciences Directorate, Fisheries and Oceans Canada and the Canadian Coast Guard, Fisheries and Oceans Canada And all old names of ministries and/or organization
The goal of this virtual cache is to find the geodetic points of Québec’s territory. The geodetic points are easy to identify (Brass cap at ground level) Generally, there is an orange panel of on a post near the point. On this panel, the number of the point is identified. Also, the distance relating from the panel to the point is also indicated. In order to log your find, you must take in note THE NUMBER OF THE POINT(registered on the point or in the center of the panel) and THE COORDINATES(in format HDDD MM.MM WGS84 datum AND UTM NAD83 indicate the zone please)and THE ALTITUDE. If the point is not visible (it may be buried under few centimetres) you can take the coordinate at the panel IF THE ACCURACY OF YOUR GPS IS HIGHER Than the DISTANCE REGISTERED ON the PANEL. (Ex: accuracy of the GPS is 5m and the distance to the point registered on the panel is 3m).
A picture of the point or panel and a general description of the places would be also significant information. Finally, it would also be important to take in note the organization owner of the geodetic point. In Quebec there are several:
The "Service de la géodésie du Québec, Ministère des Ressources naturelles Québec" The Geodetic Survey Division, Geomatics Canada, Earth Sciences Sector, Natural Resources Canada The Canadian Hydrographic Service, Sciences Directorate, Fisheries and Oceans Canada and the Canadian Coast Guard, Fisheries and Oceans Canada And all old names of ministries and/or organization
PICTURES of points and of the panels will follow soon. YOU CAN ONLY LOG ONE POINT (ONE POINT PER GEOCACHER) Good luck!

Found it by Christopher R & Pooh B on 2005-07-12
Expand Down
2 changes: 1 addition & 1 deletion reference/gc/GCGCA8-encoded.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Oozy rat in a sanitary zoo by robertlipe - Unknown Cache / Unknown - (3 / 2)

The cache is not at the coordinates above. These coords will get you to the correct park and within 1/2 mile of the cache. The cache is within 35 feet of the trail. It is not handicapped accessible. It is a nice walk in the woods that is practical for all ages. There is no space in the container for trading items. You should bring a writing stick and bug spray is recommended.

So if the cache isn't at the above coordinates, where is it? Too bad I hid a boot Too hot to hoot Never odd or even Do geese see God? "Do nine men interpret?" "Nine men," I nod Rats live on no evil star Go hang a salami, I'm a lasagna hog Now that it's intuitively obvious to even the most casual observer where the cache is, turn on your geo-mojo and go find it.
So if the cache isn't at the above coordinates, where is it? Too bad I hid a boot Too hot to hoot Never odd or even Do geese see God? "Do nine men interpret?" "Nine men," I nod Rats live on no evil star Go hang a salami, I'm a lasagna hog Now that it's intuitively obvious to even the most casual observer where the cache is, turn on your geo-mojo and go find it.
[IMG]


Expand Down
2 changes: 1 addition & 1 deletion reference/gc/GCGCA8.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Oozy rat in a sanitary zoo by robertlipe - Unknown Cache / Unknown - (3 / 2)

The cache is not at the coordinates above. These coords will get you to the correct park and within 1/2 mile of the cache. The cache is within 35 feet of the trail. It is not handicapped accessible. It is a nice walk in the woods that is practical for all ages. There is no space in the container for trading items. You should bring a writing stick and bug spray is recommended.

So if the cache isn't at the above coordinates, where is it? Too bad I hid a boot Too hot to hoot Never odd or even Do geese see God? "Do nine men interpret?" "Nine men," I nod Rats live on no evil star Go hang a salami, I'm a lasagna hog Now that it's intuitively obvious to even the most casual observer where the cache is, turn on your geo-mojo and go find it.
So if the cache isn't at the above coordinates, where is it? Too bad I hid a boot Too hot to hoot Never odd or even Do geese see God? "Do nine men interpret?" "Nine men," I nod Rats live on no evil star Go hang a salami, I'm a lasagna hog Now that it's intuitively obvious to even the most casual observer where the cache is, turn on your geo-mojo and go find it.
[IMG]


Expand Down
80 changes: 80 additions & 0 deletions reference/gc/GCGCA8_nasty.gpx
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<?xml version="1.0" encoding="utf-8"?>
<gpx xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.0" creator="Groundspeak, Inc. All Rights Reserved. http://www.groundspeak.com" xsi:schemaLocation="http://www.topografix.com/GPX/1/0 http://www.topografix.com/GPX/1/0/gpx.xsd http://www.groundspeak.com/cache/1/0/1 http://www.groundspeak.com/cache/1/0/1/cache.xsd" xmlns="http://www.topografix.com/GPX/1/0">
<name>Cache Listing Generated from Geocaching.com</name>
<desc>This is an individual cache generated from Geocaching.com</desc>
<author>Account "robertlipe" From Geocaching.com</author>
<email>contact@geocaching.com</email>
<url>https://www.geocaching.com</url>
<urlname>Geocaching - High Tech Treasure Hunting</urlname>
<time>2023-10-25T00:44:53.7176739Z</time>
<keywords>cache, geocache</keywords>
<bounds minlat="35.921667" minlon="-86.861667" maxlat="35.921667" maxlon="-86.861667" />
<wpt lat="35.921667" lon="-86.861667">
<time>2003-06-29T00:00:00</time>
<name>GCGCA8</name>
<desc>Oozy rat in a sanitary zoo by robertlipe, Unknown Cache (3/2)</desc>
<url>https://www.geocaching.com/geocache/GCGCA8</url>
<urlname>Oozy rat in a sanitary zoo</urlname>
<sym>Geocache</sym>
<type>Geocache|Unknown Cache</type>
<groundspeak:cache id="77386" available="False" archived="True" xmlns:groundspeak="http://www.groundspeak.com/cache/1/0/1">
<groundspeak:name>Oozy rat in a sanitary zoo</groundspeak:name>
<groundspeak:placed_by>robertlipe</groundspeak:placed_by>
<groundspeak:owner id="32733">robertlipe</groundspeak:owner>
<groundspeak:type>Unknown Cache</groundspeak:type>
<groundspeak:container>Not chosen</groundspeak:container>
<groundspeak:attributes>
<groundspeak:attribute id="24" inc="0">Wheelchair accessible</groundspeak:attribute>
<groundspeak:attribute id="19" inc="1">Ticks</groundspeak:attribute>
<groundspeak:attribute id="18" inc="1">Dangerous animals</groundspeak:attribute>
<groundspeak:attribute id="17" inc="1">Poisonous plants</groundspeak:attribute>
<groundspeak:attribute id="39" inc="1">Thorns</groundspeak:attribute>
<groundspeak:attribute id="30" inc="1">Picnic tables nearby</groundspeak:attribute>
<groundspeak:attribute id="28" inc="1">Public restrooms nearby</groundspeak:attribute>
<groundspeak:attribute id="1" inc="1">Dogs</groundspeak:attribute>
</groundspeak:attributes>
<groundspeak:difficulty>3</groundspeak:difficulty>
<groundspeak:terrain>2</groundspeak:terrain>
<groundspeak:country>United States</groundspeak:country>
<groundspeak:state>Tennessee</groundspeak:state>
<groundspeak:short_description html="True">&lt;body&gt;The cache is &lt;style&gt;
not&lt;/style&gt; at the coordinates above. These coords will get
you to the correct park and within 1/2 mile of the cache. The cache
is within 35 feet of the trail. It is not handicapped accessible.
It is a nice walk in the woods that is practical for all ages.
There is no space in the container for trading items. You should
bring a writing stick and bug spray is recommended.&lt;/body&gt;
</groundspeak:short_description>
<groundspeak:long_description html="True">&lt;html&gt;&lt;body text="color"&gt;So if the cache isn't at the above coordinates, where is it?
&lt;ul&gt;
&lt;li&gt;Too bad I hid a boot&lt;/li&gt;
&lt;li&gt;Too hot to hoot&lt;/li&gt;
&lt;li&gt;Never odd or even&lt;/li&gt;
&lt;li&gt;Do geese see God?&lt;/li&gt;
&lt;li&gt;"Do nine men interpret?" "Nine men," I nod&lt;/li&gt;
&lt;li&gt;Rats live on no evil star&lt;/li&gt;
&lt;li&gt;Go hang a salami, I'm a lasagna hog&lt;/li&gt;&lt;/ul&gt;
Now that it's intuitively obvious to even the most casual observer
where the cache is, turn on your geo-mojo and go find it. &lt;br&gt;
&lt;image src="http://www.mtgc.org/mtgc_member-banner.gif" width="500"
height="40" alt=
"Member of Middle Tennessee GeoCachers Club [www.mtgc.org]"
border="0"&gt;&lt;br&gt;
&lt;br&gt;&lt;/body&gt;&lt;/html&gt;
</groundspeak:long_description>
<groundspeak:encoded_hints>
</groundspeak:encoded_hints>
<groundspeak:logs>
<groundspeak:log id="732879189">
<groundspeak:date>2017-11-11T01:44:14Z</groundspeak:date>
<groundspeak:type>Archive</groundspeak:type>
<groundspeak:finder id="32733">robertlipe</groundspeak:finder>
<groundspeak:text encoded="False">Removed the container from the final location. Enough construction has occurred since this was placed to make it much less of an adventure than is used to be, so I'm archiving.

Thanx to all that hunted it.</groundspeak:text>
</groundspeak:log>
</groundspeak:logs>
<groundspeak:travelbugs />
</groundspeak:cache>
</wpt>
</gpx>
55 changes: 55 additions & 0 deletions reference/gc/GCGCA8_nasty.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>GPSBabel HTML Output</title>
<style>
p.gpsbabelwaypoint { font-size: 120%; font-weight: bold }
</style>
</head>
<body>
<p class="index">
<a href="#WPT001">GCGCA8 - Oozy rat in a sanitary zoo</a><br>
</p>
<div id="WPT001"><hr>
<table style="width:100%">
<tr>
<td>
<p class="gpsbabelwaypoint">GCGCA8 - N35&deg;55.300 W86&deg;51.700 (16S 512480 3975269)<br>
<a href="https://www.geocaching.com/geocache/GCGCA8">Oozy rat in a sanitary zoo</a> by robertlipe</p>
</td>
<td style="text-align:right">
<p class="gpsbabelcacheinfo">3 / 2<br>
Unknown Cache / Unknown</p>
</td>
</tr>
<tr>
<td colspan="2">
<div><p class="gpsbabeldescshort">The cache is at the coordinates above. These coords will get
you to the correct park and within 1/2 mile of the cache. The cache
is within 35 feet of the trail. It is not handicapped accessible.
It is a nice walk in the woods that is practical for all ages.
There is no space in the container for trading items. You should
bring a writing stick and bug spray is recommended.</div>
<div><p class="gpsbabeldesclong">So if the cache isn't at the above coordinates, where is it?
<ul>
<li>Too bad I hid a boot</li>
<li>Too hot to hoot</li>
<li>Never odd or even</li>
<li>Do geese see God?</li>
<li>"Do nine men interpret?" "Nine men," I nod</li>
<li>Rats live on no evil star</li>
<li>Go hang a salami, I'm a lasagna hog</li></ul>
Now that it's intuitively obvious to even the most casual observer
where the cache is, turn on your geo-mojo and go find it. <br>
<img src="http://www.mtgc.org/mtgc_member-banner.gif" width="500"
height="40" alt=
"Member of Middle Tennessee GeoCachers Club [www.mtgc.org]"
border="0"><br>
<br></div>
</td>
</tr>
</table>
</div>
</body>
</html>
2 changes: 1 addition & 1 deletion reference/gc/GCGCA8~vcard.vcf
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ VERSION:3.0
N:Oozy rat in a sanitary zoo;GCGCA8;;;
ADR:N35 55.300 W86 51.700
URL:https://www.geocaching.com/geocache/GCGCA8
NOTE:The cache is not at the coordinates above. These coords will get you to the correct park and within 1/2 mile of the cache. The cache is within 35 feet of the trail. It is not handicapped accessible. It is a nice walk in the woods that is practical for all ages. There is no space in the container for trading items. You should bring a writing stick and bug spray is recommended.\nSo if the cache isn't at the above coordinates\, where is it? Too bad I hid a boot Too hot to hoot Never odd or even Do geese see God? "Do nine men interpret?" "Nine men\," I nod Rats live on no evil star Go hang a salami\, I'm a lasagna hog Now that it's intuitively obvious to even the most casual observer where the cache is\, turn on your geo-mojo and go find it. \n [IMG]\n \n\n\nHINT:\n
NOTE:The cache is not at the coordinates above. These coords will get you to the correct park and within 1/2 mile of the cache. The cache is within 35 feet of the trail. It is not handicapped accessible. It is a nice walk in the woods that is practical for all ages. There is no space in the container for trading items. You should bring a writing stick and bug spray is recommended.\nSo if the cache isn't at the above coordinates\, where is it? Too bad I hid a boot Too hot to hoot Never odd or even Do geese see God? "Do nine men interpret?" "Nine men\," I nod Rats live on no evil star Go hang a salami\, I'm a lasagna hog Now that it's intuitively obvious to even the most casual observer where the cache is\, turn on your geo-mojo and go find it. \n [IMG]\n \n\n\nHINT:\n
END:VCARD
1 change: 1 addition & 0 deletions testo
Original file line number Diff line number Diff line change
Expand Up @@ -189,4 +189,5 @@ if [ -z "${VALGRIND}" ]; then
fi
fi

echo "Total Errors: $errorcount"
exit $errorcount
4 changes: 4 additions & 0 deletions testo.d/text.test
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,7 @@ gpsbabel -i gpx -f ${REFERENCE}/gc/GC7FA4.gpx \
-o text,logs -F ${TMPDIR}/GC7FA4.text
compare ${REFERENCE}/gc/GC7FA4.html ${TMPDIR}/GC7FA4.html
compare ${REFERENCE}/gc/GC7FA4.text ${TMPDIR}/GC7FA4.text

# GCGC8_nasty.gpx is hand modifed to test strip_nasty_html
gpsbabel -i gpx -f ${REFERENCE}/gc/GCGCA8_nasty.gpx -o html -F ${TMPDIR}/GCGCA8_nasty.html
compare ${REFERENCE}/gc/GCGCA8_nasty.html ${TMPDIR}/GCGCA8_nasty.html
4 changes: 4 additions & 0 deletions tools/Dockerfile_f37
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,7 @@ RUN dnf install --assumeyes qt6-qtbase-devel qt6-qtserialport-devel qt6-qtwebeng
# tools to build the docs
RUN dnf install --assumeyes expat desktop-file-utils libxslt docbook-style-xsl fop docbook5-style-xsl docbook5-schemas && \
dnf clean all
# debuginfo for valgrind suppressions (or use DEBUGINFOD server)
RUN dnf install --assumeyes 'dnf-command(debuginfo-install)' && \
dnf debuginfo-install --assumeyes qt6-qtbase && \
dnf clean all
Loading

0 comments on commit d7c2ad3

Please sign in to comment.