Skip to content

User search function, if the user name has - and. Some special symbols will not be found when searching  #16675

Closed
@matrixbot

Description

@matrixbot

This issue has been migrated from #16675.


Description

User AA-71 exists. homeserver is matrix.org
图片
The user cannot be found by searching aaa-71
图片
My request limit is 50
图片
There are only six of them
图片

aaa-71 is registered on your matrix.org? There's no way to limit it to 50, none of the best local matches were found, and only six were returned?

Steps to reproduce

  • list the steps
  • that reproduce the bug
  • using hyphens as bullet points

Homeserver

matrix.org

Synapse Version

matrixdotorg/synapse:latest

Installation Method

Docker (matrixdotorg/synapse)

Database

PostgreSQL

Workers

Single process

Platform

ubuntu

Configuration

No response

Relevant log output

2023-11-22 11:03:19,246 - synapse.storage.SQL - 449 - DEBUG - POST-222893 - [SQL] {search_user_dir-118fa1} WITH matching_users AS ( SELECT user_id, vector FROM user_directory_search WHERE vector @@ to_tsquery('simple', ?) LIMIT 10000 ) SELECT d.user_id AS user_id, display_name, avatar_url FROM matching_users as t INNER JOIN user_directory AS d USING (user_id) LEFT JOIN users AS u ON t.user_id = u.name WHERE user_id != ? ORDER BY (CASE WHEN d.user_id IS NOT NULL THEN 4.0 ELSE 1.0 END) * (CASE WHEN display_name IS NOT NULL THEN 1.2 ELSE 1.0 END) * (CASE WHEN avatar_url IS NOT NULL THEN 1.2 ELSE 1.0 END) * ( 3 * ts_rank_cd( '{0.1, 0.1, 0.9, 1.0}', vector, to_tsquery('simple', ?), 8 ) + ts_rank_cd( '{0.1, 0.1, 0.9, 1.0}', vector, to_tsquery('simple', ?), 8 ) ) * (CASE WHEN user_id LIKE ? THEN 2.0 ELSE 1.0 END) DESC, display_name IS NULL, avatar_url IS NULL LIMIT ?
2023-11-22 11:03:19,246 - synapse.storage.SQL - 454 - DEBUG - POST-222893 - [SQL values] {search_user_dir-118fa1} ("('eafe':* | 'eafe') & ('dsds':* | 'dsds')", '@0xe52000e012ce8851fcb9532adcc066db55fa53c8:matrix-dev.defed.network', "'eafe' & 'dsds'", "'eafe':* & 'dsds':*", '%:matrix-dev.defed.network', 21)

Anything else that would be useful to know?

import icu
import re
from typing import List

def parse_words_with_icu(search_term: str) -> List[str]:
results = []
breaker = icu.BreakIterator.createWordInstance(icu.Locale.getEnglish())

breaker.setText(search_term)
i = 0
while True:
    j = breaker.nextBoundary()
    if j == icu.BreakIterator.DONE:
        break

    result = search_term[i:j]
    print(result)

    # libicu considers spaces and punctuation between words as words, but we don't
    # want to include those in results as they would result in syntax errors in SQL
    # queries (e.g. "foo bar" would result in the search query including "foo &  &
    # bar").
    if len(re.findall(r"([\w\-]+)", result, re.UNICODE)):
        results.append(result)

    i = j

return results

调用函数并打印结果

if name == "main":
print("输入: hongtao:aaa")
search_string = "hongtao*aaa"
parsed_words = parse_words_with_icu(search_string)
print(parsed_words)

图片

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions