Skip to content

[libclang] annotateTokens() produces different cursor than visitChildren() #76472

Open
@jimmy-zx

Description

@jimmy-zx

While testing the annotateTokens() function (used by Token.cursor of the python binding), I found that for some cursor, the (only) token that belongs to that cursor does not maps back to the cursor itself.

For example, on the following code,

struct a {
    int b;
};

int func(struct a *ptr) {
    int r = ptr->b;
    return r;
}

I made a script that selects the DeclRefExpr that refers ptr in the statement int r = ptr->b, and check if the only token that belongs to the expression, ptr's cursor maps to the cursor.

from clang.cindex import TranslationUnit, Cursor, CursorKind


def main():
    tu = TranslationUnit.from_source("./demo.c")
    root: Cursor = tu.cursor

    node = None
    for node in root.walk_preorder():
        if node.kind == CursorKind.DECL_REF_EXPR and node.spelling == "ptr":
            break

    token = None
    for token in node.get_tokens():
        break

    print(token.cursor == node)

    print(token.cursor._kind_id, node._kind_id)
    print(token.cursor.xdata, node.xdata)
    print(*token.cursor.data)
    print(*node.data)


if __name__ == '__main__':
    main()

The result of the above script is

False
101 101
0 0
140162768666120 140162768666224 140162768050240
None 140162768666224 140162768050240

The cursors node and token.cursor should be the same, and they indeed share the same spelling and extent. However, libclang consider them as different cursors.

The equality of cursor is provided by clang_equalCursors(), and the only difference between these two cursors are data[0].

unsigned clang_equalCursors(CXCursor X, CXCursor Y) {
// Clear out the "FirstInDeclGroup" part in a declaration cursor, since we
// can't set consistently. For example, when visiting a DeclStmt we will set
// it but we don't set it on the result of clang_getCursorDefinition for
// a reference of the same declaration.
// FIXME: Setting "FirstInDeclGroup" in CXCursors is a hack that only works
// when visiting a DeclStmt currently, the AST should be enhanced to be able
// to provide that kind of info.
if (clang_isDeclaration(X.kind))
X.data[1] = nullptr;
if (clang_isDeclaration(Y.kind))
Y.data[1] = nullptr;
return X == Y;
}

I suspect that the creation for DeclRefExpr cursors are in MakeCXCursor(), and data[0] probably means the parent cursor.

case Stmt::DeclRefExprClass:
if (const ImplicitParamDecl *IPD = dyn_cast_or_null<ImplicitParamDecl>(
cast<DeclRefExpr>(S)->getDecl())) {
if (const ObjCMethodDecl *MD =
dyn_cast<ObjCMethodDecl>(IPD->getDeclContext())) {
if (MD->getSelfDecl() == IPD) {
K = CXCursor_ObjCSelfExpr;
break;
}
}
}
K = CXCursor_DeclRefExpr;
break;

CXCursor C = {K, 0, {Parent, S, TU}};
return C;
}

There might be an issue where the data[0] (parent) field is not being set properly, or clang_equalCursors() should ignore data[0] when comparing statements?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions