Skip to content

When search_model is "advanced" and search query is chinese, the encoding of the content field in the output result is sometimes not utf-8 #93

@Prayforhanluo

Description

@Prayforhanluo

When search_model is "advanced", the encoding of the content field in the output result is sometimes not utf-8

example:
{'url': 'https://www.tencent.com/',
'title': 'Tencent 腾讯',
'content': 'Tencent 腾讯\nTencent腾讯 =========\n\nAbout\nAbout Us\nVision & Mission\nMilestones\nCompany Structure\nManagement Team\nBoard Members\nOur Culture\nOur Offices\n\n\nBusiness\nConsumers\nEnterprises\nInnovation\n\n\nEmployees\nTalent Development\nTencent Academy\nWork Environment\nEmployee Activities\n\n\nESG\nEnvironment\nSocial\nGovernance\nESG Rating\nReports [...] Connecting the now and the future --------------------------------- Pioneering technological innovations to bring the future to the present [...] ### Communications and Social Offering a comprehensive suite of communications and social services that connect people to make communication easy and intuitive.\n\n### Digital Content Delivering high-quality content through industry-leading technologies, to shape our next-generation social and content offering.\n\n### FinTech Services Connecting users to merchants via fast and secure payment service, and to financial institutions via innovative consumer finance products.',
'score': 0.57837737,
'raw_content': None},
{'url': 'https://apps.apple.com/cn/app/%E8%85%BE%E8%AE%AF%E6%96%87%E6%A1%A3/id1370780836',
'title': 'App Store 上的“腾讯文档”',
'content': 'è\x85¾è®¯æ\x96\x87â\x80ªæ¡£â\x80¬\n 4+\n\nå\x8f¯å¤\x9a人å®\x9eæ\x97¶å\x8d\x8fä½\x9cç\x9a\x84å\x9c¨çº¿æ\x96\x87â\x80ªæ¡£â\x80¬\n\nTencent Technology (Shenzhen) Company Limited\n\næ\x88ªå±\x8f\n\nç®\x80ä»\x8b [...] æ\x9c\x80è¿\x91ä½\x93éª\x8cæ\x84\x9fé£\x9eèµ·ï¼\x8cæ¡\x8cé\x9d¢ç«¯ç\x94¨èµ·æ\x9d¥æ\x9b´ä¸\x9dæ»\x91äº\x86ã\x80\x82\n\né\x9d\x9e常å®\x9eç\x94¨å\x90§\n\né\x9d\x9e常å®\x9eç\x94¨ï¼\x8c å¸\x8cæ\x9c\x9bæ\x97©æ\x97¥è¶\x85è¶\x8aofficeå\x92\x8cé\x87\x91å±±æ\x96\x87æ¡£\n\nApp é\x9a\x90ç§\x81\n\nå¼\x80å\x8f\x91è\x80\x85â\x80\x9cTencent Technology (Shenzhen) Company Limitedâ\x80\x9d已表æ\x98\x8e该 App ç\x9a\x84é\x9a\x90ç§\x81è§\x84è\x8c\x83å\x8f¯è\x83½å\x8c\x85æ\x8b¬äº\x86ä¸\x8bè¿°ç\x9a\x84æ\x95°æ\x8d®å¤\x84ç\x90\x86æ\x96¹å¼\x8fã\x80\x82æ\x9c\x89å\x85³æ\x9b´å¤\x9aä¿¡æ\x81¯ï¼\x8c请å\x8f\x82é\x98\x85å¼\x80å\x8f\x91è\x80\x85é\x9a\x90ç§\x81æ\x94¿ç\xad\x96ã\x80\x82\n\nç\x94¨äº\x8e追踪ä½\xa0ç\x9a\x84æ\x95°æ\x8d®\n\n以ä¸\x8bæ\x95°æ\x8d®å\x8f¯è\x83½ä¼\x9aç\x94¨äº\x8eå\x9c¨å\x85¶ä»\x96å\x85¬å\x8f¸ç\x9a\x84 App å\x92\x8cç½\x91ç«\x99ä¸\xad追踪ä½\xa0ï¼\x9a\n\nä¸\x8eä½\xa0å\x85³è\x81\x94ç\x9a\x84æ\x95°æ\x8d® [...] .comæ\x84\x8fè§\x81å\x8f\x8dé¦\x88ï¼\x9aç\x99»å½\x95è\x85¾è®¯æ\x96\x87æ¡£ï¼\x8cè¿\x9bå\x85¥â\x80\x9c设置-æ\x84\x8fè§\x81å\x8f\x8dé¦\x88â\x80\x9dè¿\x9bè¡\x8cå\x8f\x8dé¦\x88ã\x80\x82å¦\x82æ\x9e\x9cä½\xa0è§\x89å¾\x97è\x85¾è®¯æ\x96\x87æ¡£è¿\x98ä¸\x8dé\x94\x99ï¼\x8c请ç»\x99æ\x88\x91们äº\x94æ\x98\x9f好è¯\x84ï½\x9eå¦\x82æ\x9e\x9cå\x9c¨ä½¿ç\x94¨è¿\x87ç¨\x8bä¸\xadæ\x9c\x89ä»»ä½\x95é\x97®é¢\x98ï¼\x8c欢è¿\x8eå\x9c¨è¯\x84论å\x8cºç\x95\x99ä¸\x8bæ\x84\x8fè§\x81ã\x80\x82ä½\xa0æ\x89\x80é\x9c\x80è¦\x81ç\x9a\x84ï¼\x8cå°±æ\x98¯æ\x88\x91们å\x8aªå\x8a\x9bç\x9a\x84æ\x96¹å\x90\x91ã\x80\x82-----è\x85¾è®¯æ\x96\x87æ¡£ä¼\x9aå\x91\x98/è\x85¾è®¯æ\x96\x87æ¡£è¶\x85级ä¼\x9aå\x91\x98è\x87ªå\x8a¨è®¢é\x98\x85æ\x9c\x8då\x8a¡è¯´æ\x98\x8e-----1ã\x80\x81è\x85¾è®¯æ\x96\x87æ¡£ä¼\x9aå\x91\x98è\x87ªå\x8a¨è®¢é\x98\x85æ\x9c\x8då\x8a¡æ\x9c\x89以ä¸\x8bä¸\x80ç§\x8d订è´\xadç±»å\x9e\x8bï¼\x9a9å\x85\x83/1个æ\x9c\x88ã\x80\x82è\x85¾è®¯æ\x96\x87æ¡£è¶\x85级ä¼\x9aå\x91\x98è\x87ªå\x8a¨è®¢é\x98\x85æ\x9c\x8då\x8a¡æ\x9c\x89以ä¸\x8b订è´\xadç±»å\x9e\x8b',
'score': 0.48294178,
'raw_content': None},

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions