Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libclang] annotateTokens() produces different cursor than visitChildren() #76472

Open
jimmy-zx opened this issue Dec 27, 2023 · 0 comments
Open
Labels
clang:as-a-library libclang and C++ API

Comments

@jimmy-zx
Copy link
Contributor

While testing the annotateTokens() function (used by Token.cursor of the python binding), I found that for some cursor, the (only) token that belongs to that cursor does not maps back to the cursor itself.

For example, on the following code,

struct a {
    int b;
};

int func(struct a *ptr) {
    int r = ptr->b;
    return r;
}

I made a script that selects the DeclRefExpr that refers ptr in the statement int r = ptr->b, and check if the only token that belongs to the expression, ptr's cursor maps to the cursor.

from clang.cindex import TranslationUnit, Cursor, CursorKind


def main():
    tu = TranslationUnit.from_source("./demo.c")
    root: Cursor = tu.cursor

    node = None
    for node in root.walk_preorder():
        if node.kind == CursorKind.DECL_REF_EXPR and node.spelling == "ptr":
            break

    token = None
    for token in node.get_tokens():
        break

    print(token.cursor == node)

    print(token.cursor._kind_id, node._kind_id)
    print(token.cursor.xdata, node.xdata)
    print(*token.cursor.data)
    print(*node.data)


if __name__ == '__main__':
    main()

The result of the above script is

False
101 101
0 0
140162768666120 140162768666224 140162768050240
None 140162768666224 140162768050240

The cursors node and token.cursor should be the same, and they indeed share the same spelling and extent. However, libclang consider them as different cursors.

The equality of cursor is provided by clang_equalCursors(), and the only difference between these two cursors are data[0].

unsigned clang_equalCursors(CXCursor X, CXCursor Y) {
// Clear out the "FirstInDeclGroup" part in a declaration cursor, since we
// can't set consistently. For example, when visiting a DeclStmt we will set
// it but we don't set it on the result of clang_getCursorDefinition for
// a reference of the same declaration.
// FIXME: Setting "FirstInDeclGroup" in CXCursors is a hack that only works
// when visiting a DeclStmt currently, the AST should be enhanced to be able
// to provide that kind of info.
if (clang_isDeclaration(X.kind))
X.data[1] = nullptr;
if (clang_isDeclaration(Y.kind))
Y.data[1] = nullptr;
return X == Y;
}

I suspect that the creation for DeclRefExpr cursors are in MakeCXCursor(), and data[0] probably means the parent cursor.

case Stmt::DeclRefExprClass:
if (const ImplicitParamDecl *IPD = dyn_cast_or_null<ImplicitParamDecl>(
cast<DeclRefExpr>(S)->getDecl())) {
if (const ObjCMethodDecl *MD =
dyn_cast<ObjCMethodDecl>(IPD->getDeclContext())) {
if (MD->getSelfDecl() == IPD) {
K = CXCursor_ObjCSelfExpr;
break;
}
}
}
K = CXCursor_DeclRefExpr;
break;

CXCursor C = {K, 0, {Parent, S, TU}};
return C;
}

There might be an issue where the data[0] (parent) field is not being set properly, or clang_equalCursors() should ignore data[0] when comparing statements?

@EugeneZelenko EugeneZelenko added clang:as-a-library libclang and C++ API and removed new issue labels Dec 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:as-a-library libclang and C++ API
Projects
None yet
Development

No branches or pull requests

2 participants