Skip to content

Conversation

@Grenz1inie
Copy link

@Grenz1inie Grenz1inie commented Oct 10, 2025

加入中文readme,采用pypi_simple的api解决无法搜索的问题,同时优化搜索结果顺序--匹配度优先。

Add Chinese READMEs, use the pypi_simple API to solve the search failure issue, and optimize the order of search results with priority given to matching degree.

Summary by Sourcery

Use PyPI Simple v1 JSON API for more reliable package search with pagination, cancellation, and ordered results; add Chinese documentation.

New Features:

  • Switch PyPI search implementation to use the Simple API JSON index and per-package metadata endpoints
  • Add pagination and cancellation support to PyPI search

Enhancements:

  • Prioritize search results by exact match, prefix match, then substring match
  • Refine TypeScript annotations and replace includes with indexOf for package checks

Documentation:

  • Add Chinese README
  • Add Chinese quickstart guide for VS Code extension

@sourcery-ai
Copy link

sourcery-ai bot commented Oct 10, 2025

Reviewer's Guide

This PR adds Chinese documentation and overhauls the PyPI search implementation in PackageManager: switching from HTML scraping to the PyPI Simple v1 JSON API, adding compatibility handling, match-priority sorting, pagination, and parallel metadata fetching with cancellation support.

Sequence diagram for new PyPI search process with cancellation and prioritization

sequenceDiagram
    actor User
    participant "PackageManager"
    participant "PyPI Simple API"
    participant "PyPI Metadata API"
    User->>PackageManager: searchFromPyPi(keyword, page, cancelToken)
    PackageManager->>"PyPI Simple API": GET /simple/ (Accept: vnd.pypi.simple.v1+json)
    "PyPI Simple API"-->>PackageManager: JSON index of projects
    PackageManager->>PackageManager: Filter and sort projects by match priority
    PackageManager->>"PyPI Metadata API": parallel GET /pypi/<name>/json for each result
    "PyPI Metadata API"-->>PackageManager: Metadata for each package
    PackageManager->>User: Return paginated, prioritized results
    User-->>PackageManager: (optional) Cancel search
    PackageManager->>"PyPI Simple API": Cancel outstanding requests
Loading

File-Level Changes

Change Details Files
Refactor searchFromPyPi to use PyPI Simple JSON API and optimize result ordering
  • Removed HTML parsing via regex/xml2js in favor of JSON index retrieval
  • Added compatibility logic for various JSON structures (projects, entries)
  • Implemented case-insensitive filtering and priority sorting (exact, prefix, contains)
  • Introduced fixed pageSize pagination and current page calculation
  • Parallelized package metadata fetches with axios and cancellation support
  • Added fallback handling for individual metadata request failures
src/modules/PackageManager.ts
Minor API and logic adjustments
  • Annotated process close handler callback with number type
  • Replaced necessaryPackage.includes with indexOf check
src/modules/PackageManager.ts
Add Chinese documentation
  • Created README.zh-cn.md with project overview and usage
  • Created vsc-extension-quickstart.zh-cn.md for extension setup guide
README.zh-cn.md
vsc-extension-quickstart.zh-cn.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Pulling the entire /simple index on each search may be too heavy—consider using PyPI’s official JSON search API or caching the simple index to avoid performance bottlenecks.
  • You’re firing off parallel axios requests for every package on the page; introduce a concurrency limit (or make pageSize a configurable constant) to avoid overwhelming the network.
  • The change from necessaryPackage.includes(name) to indexOf !== -1 reduces readability—consider reverting to includes() for clearer intent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Pulling the entire /simple index on each search may be too heavy—consider using PyPI’s official JSON search API or caching the simple index to avoid performance bottlenecks.
- You’re firing off parallel axios requests for every package on the page; introduce a concurrency limit (or make pageSize a configurable constant) to avoid overwhelming the network.
- The change from necessaryPackage.includes(name) to indexOf !== -1 reduces readability—consider reverting to includes() for clearer intent.

## Individual Comments

### Comment 1
<location> `src/modules/PackageManager.ts:320-329` </location>
<code_context>
+        // indexResp.data 可能是 { projects: [...] } 或数组 - 两种格式都做兼容处理
</code_context>

<issue_to_address>
**suggestion:** The compatibility logic for indexResp.data is broad and may mask unexpected API changes.

Consider adding a warning or log when indexResp.data does not match the expected format to help detect upstream API changes early.
</issue_to_address>

### Comment 2
<location> `src/modules/PackageManager.ts:377-386` </location>
<code_context>
+        await Promise.all(pageItems.map(async (name) => {
</code_context>

<issue_to_address>
**suggestion (performance):** Unthrottled parallel requests for package metadata may hit rate limits or degrade performance.

Implement a concurrency limit or throttling to reduce the risk of rate limiting and performance issues, particularly if pageSize increases.

Suggested implementation:

```typescript
import pLimit from 'p-limit';

        const pageSize = 20; // 每页大小,保持合理以防并发请求过多
        const totalPages = Math.max(1, Math.ceil(matched.length / pageSize));
        const currentPage = Math.max(1, Math.min(page, totalPages));
        const start = (currentPage - 1) * pageSize;
        const pageItems = matched.slice(start, start + pageSize);

        // 为当前页的包并发获取元数据(支持取消),限制最大并发数以防止被限流
        const list: PackagePickItem[] = [];
        const limit = pLimit(5); // 最大并发数可根据实际情况调整
        await Promise.all(pageItems.map((name) =>
            limit(async () => {
                // 如果外部触发了取消(CancellationToken),axiosCancelToken.token.reason 会被设置
                if (axiosCancelToken.token.reason) { return; }
                try {
                    const metaResp = await axios.get(`https://pypi.org/pypi/${encodeURIComponent(name)}/json`, {
                        cancelToken: axiosCancelToken.token,
                    });
                    const info = metaResp.data && metaResp.data.info ? metaResp.data.info : undefined;
                    const version = info?.version || '';
                    const summary = info?.summary || '';
                    list.push({
                } catch (err) {
                    // 错误处理逻辑(可选)
                }
            })
        ));

```

1. You may need to install `p-limit` if it's not already present: `npm install p-limit`.
2. If your project uses ES modules, ensure the import syntax matches your configuration.
3. If you want to customize the concurrency, adjust the value passed to `pLimit`.
</issue_to_address>

### Comment 3
<location> `src/modules/PackageManager.ts:379` </location>
<code_context>
+        const list: PackagePickItem[] = [];
+        await Promise.all(pageItems.map(async (name) => {
+            // 如果外部触发了取消(CancellationToken),axiosCancelToken.token.reason 会被设置
+            if (axiosCancelToken.token.reason) { return; }
+            try {
+                const metaResp = await axios.get(`https://pypi.org/pypi/${encodeURIComponent(name)}/json`, {
</code_context>

<issue_to_address>
**issue (bug_risk):** Checking axiosCancelToken.token.reason for cancellation may not be robust across axios versions.

This approach may break in future axios releases. Use axios's documented cancellation methods, such as catching axios.Cancel, for consistent behavior.
</issue_to_address>

### Comment 4
<location> `src/modules/PackageManager.ts:336-150` </location>
<code_context>
+                const info = metaResp.data && metaResp.data.info ? metaResp.data.info : undefined;
+                const version = info?.version || '';
+                const summary = info?.summary || '';
+                list.push({
+                    name,
+                    version,
+                    alwaysShow: true,
+                    label: name,
+                    description: `${version}`,
+                    detail: summary,
+                });
+            } catch (err) {
+                // 如果单个包的元数据请求出错,则退回到最小信息,避免整个搜索失败
</code_context>

<issue_to_address>
**suggestion:** Fallback to minimal info on metadata fetch failure is good, but error handling could be more granular.

Please handle cancellation errors separately from other errors to avoid adding incomplete items to the list when a request is cancelled.
</issue_to_address>

### Comment 5
<location> `src/modules/PackageManager.ts:328-333` </location>
<code_context>
            } else if (indexResp.data && typeof indexResp.data === 'object') {
                // 回退处理:尝试从 entries 字段中提取 name
                if (Array.isArray(indexResp.data.entries)) {
                    projects = indexResp.data.entries.map((p: any) => (p && p.name) || '').filter(Boolean);
                }
            }

</code_context>

<issue_to_address>
**suggestion (code-quality):** Merge nested if conditions ([`merge-nested-ifs`](https://docs.sourcery.ai/Reference/Rules-and-In-Line-Suggestions/TypeScript/Default-Rules/merge-nested-ifs))

```suggestion
            } else if (indexResp.data && typeof indexResp.data === 'object' && Array.isArray(indexResp.data.entries)) {
                         projects = indexResp.data.entries.map((p: any) => (p && p.name) || '').filter(Boolean);
                   }

```

<br/><details><summary>Explanation</summary>Reading deeply nested conditional code is confusing, since you have to keep track of which
conditions relate to which levels. We therefore strive to reduce nesting where
possible, and the situation where two `if` conditions can be combined using
`and` is an easy win.
</details>
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines 320 to 329
// indexResp.data 可能是 { projects: [...] } 或数组 - 两种格式都做兼容处理
let projects: string[] = [];
try {
if (Array.isArray(indexResp.data)) {
// 有些服务器可能直接返回包名字符串数组或对象数组
projects = indexResp.data.map((p: any) => (typeof p === 'string' ? p : p.name)).filter(Boolean);
} else if (Array.isArray(indexResp.data.projects)) {
projects = indexResp.data.projects.map((p: any) => (typeof p === 'string' ? p : p.name)).filter(Boolean);
} else if (indexResp.data && typeof indexResp.data === 'object') {
// 回退处理:尝试从 entries 字段中提取 name
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The compatibility logic for indexResp.data is broad and may mask unexpected API changes.

Consider adding a warning or log when indexResp.data does not match the expected format to help detect upstream API changes early.

Comment on lines 377 to 386
await Promise.all(pageItems.map(async (name) => {
// 如果外部触发了取消(CancellationToken),axiosCancelToken.token.reason 会被设置
if (axiosCancelToken.token.reason) { return; }
try {
const metaResp = await axios.get(`https://pypi.org/pypi/${encodeURIComponent(name)}/json`, {
cancelToken: axiosCancelToken.token,
});
const info = metaResp.data && metaResp.data.info ? metaResp.data.info : undefined;
const version = info?.version || '';
const summary = info?.summary || '';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Unthrottled parallel requests for package metadata may hit rate limits or degrade performance.

Implement a concurrency limit or throttling to reduce the risk of rate limiting and performance issues, particularly if pageSize increases.

Suggested implementation:

import pLimit from 'p-limit';

        const pageSize = 20; // 每页大小,保持合理以防并发请求过多
        const totalPages = Math.max(1, Math.ceil(matched.length / pageSize));
        const currentPage = Math.max(1, Math.min(page, totalPages));
        const start = (currentPage - 1) * pageSize;
        const pageItems = matched.slice(start, start + pageSize);

        // 为当前页的包并发获取元数据(支持取消),限制最大并发数以防止被限流
        const list: PackagePickItem[] = [];
        const limit = pLimit(5); // 最大并发数可根据实际情况调整
        await Promise.all(pageItems.map((name) =>
            limit(async () => {
                // 如果外部触发了取消(CancellationToken),axiosCancelToken.token.reason 会被设置
                if (axiosCancelToken.token.reason) { return; }
                try {
                    const metaResp = await axios.get(`https://pypi.org/pypi/${encodeURIComponent(name)}/json`, {
                        cancelToken: axiosCancelToken.token,
                    });
                    const info = metaResp.data && metaResp.data.info ? metaResp.data.info : undefined;
                    const version = info?.version || '';
                    const summary = info?.summary || '';
                    list.push({
                } catch (err) {
                    // 错误处理逻辑(可选)
                }
            })
        ));
  1. You may need to install p-limit if it's not already present: npm install p-limit.
  2. If your project uses ES modules, ensure the import syntax matches your configuration.
  3. If you want to customize the concurrency, adjust the value passed to pLimit.

const list: PackagePickItem[] = [];
await Promise.all(pageItems.map(async (name) => {
// 如果外部触发了取消(CancellationToken),axiosCancelToken.token.reason 会被设置
if (axiosCancelToken.token.reason) { return; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Checking axiosCancelToken.token.reason for cancellation may not be robust across axios versions.

This approach may break in future axios releases. Use axios's documented cancellation methods, such as catching axios.Cancel, for consistent behavior.

this.output.appendLine(data);
errMsg += data;
}
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Fallback to minimal info on metadata fetch failure is good, but error handling could be more granular.

Please handle cancellation errors separately from other errors to avoid adding incomplete items to the list when a request is cancelled.

@Grenz1inie
Copy link
Author

效果
Screenshot 2025-10-10 231257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant