Efficient alternate representations for lists of characters (strings).

Consider to add an internal string data type as an alternate representation for lists of characters:

Complete strings
-----------

The tagged value is a byte-aligned pointer to a null-terminated sequence of UTF-8 characters on the heap (copy stack). This means that the full address-space of 32/64 bits cannot be used but some (1 or two) bits less than that. In many systems this is not a problem since the operating systems already makes similar assumptions anyway.

A list of *n* characters `[C1, ..., CN]` would thus occupy between *n+1* and *n+8* bytes, depending on alignment, whereas the naive representation as  a list of chars would take *n · (3 · 8)* bytes.

In this manner, the "tail" of the strings can be rapidly (constantly) computed which is key for fast parsing with DCGs. Note that SWI7's built-in string requires for `sub_string(String, 1, _, 0, Tail)` time and space proportional to the length of `String`.

Then, all operations from unification upwards need to be able to handle both lists of chars and that string type. That is indeed the more challenging part.

Partial strings
---------

This would take the idea one step further. Instead of representing a list of known characters, a partial list of known characters would be represented compactly. E.g. `Xs0 = [a,b,c,d|Xs]` would be represented as a null-terminated string "abcd" plus padding to the next word which represents `Xs`. This would make `library(pio)` very fast!


Just as a remark: Strings that contain the null-character can still be represented as a regular lists of characters.









Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient alternate representations for lists of characters (strings). #24

Complete strings

Partial strings

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Efficient alternate representations for lists of characters (strings). #24

Description

Complete strings

Partial strings

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions