Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strings that contain null characters #4120

Closed
ThomasBreuer opened this issue Oct 2, 2020 · 7 comments · Fixed by #5546
Closed

strings that contain null characters #4120

ThomasBreuer opened this issue Oct 2, 2020 · 7 comments · Fixed by #5546
Labels
kind: quirk Issues that are not bugs, but a discrepancy between user expectation and system behavior topic: kernel

Comments

@ThomasBreuer
Copy link
Contributor

The following happens in released versions of GAP and in the current master branch.

gap> s:= "abc\000def";
"abc\000def"
gap> Length( s );
7
gap> Print( s, "\n" );
abc

Is this a bug or a feature?
(In the latter case, it should become documented.)

@fingolfin
Copy link
Member

I tend to classify it as a bug resp. a limitation of how printing was done for a long time in GAP -- but actually some changes we made in the last year or so should allow us to fix it! In a nutshell, the C function FormatOutput is used to print strings. It can print both C strings (which are terminated by a NULL byte), and GAP strings (which are allowed to contain NULL bytes). The problem is that the code to deal with them tries hard to share as much code as possible (which in general is a good idea), but one of the things it shares is the way it iterates over the bytes in a string: it scans until it finds a null byte... What it should do, at least for GAP strings, is to first query the string length and then use that for iteration.

The simplest fix would be to change it to query the string length in both cases -- while this leads to a performance regression when printing C strings, reality is that those are almost always very short, meaning that the extra call to strlen will be negligible. Of course it would still be a good idea to perform some benchmarking, but overall I am not concerned overly by this.

@fingolfin fingolfin added kind: quirk Issues that are not bugs, but a discrepancy between user expectation and system behavior topic: kernel labels Oct 2, 2020
@frankluebeck
Copy link
Member

I'm not sure if this should be considered as a bug. Is it really sensible to print strings with 0-characters?
What is the expected behaviour? Print usually does some line break magic. How should a 0-character be counted (as character with zero width)?

Of course, you can always write arbitrary byte sequences to your terminal (or other file) with WriteAll (or FileString to a named file):

gap> s:= "abc\000def";
"abc\000def"
gap> WriteAll(OutputTextUser(), s);;Print("\n");
abcdef

@ThomasBreuer
Copy link
Contributor Author

Just for the sake of a cross-reference: The origin of the issue is oscar-system/GAP.jl/issues/527.

@nskeip
Copy link
Contributor

nskeip commented Dec 16, 2023

So, GET_LEN_STRING need to be fixed, am I right?

@nskeip
Copy link
Contributor

nskeip commented Dec 16, 2023

Meanwhile stringobj.c says:

A string is a list that has no holes, and whose elements are all
characters. For the full definition of strings see chapter "Strings" in
the {\GAP} manual. Read also "More about Strings" about the string flag
and the compact representation of strings.

@fingolfin
Copy link
Member

@nskeip no not at all, it works just fine here. Just the printing code gets "confused".

@nskeip
Copy link
Contributor

nskeip commented Dec 18, 2023

@fingolfin hey, I tried to fix it in #5546 - can you check out and give a feedback on how it should behave? (skip NULLs or print someting?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: quirk Issues that are not bugs, but a discrepancy between user expectation and system behavior topic: kernel
Projects
None yet
4 participants