Skip to content

Commit

Permalink
fetchHTML API parameter can be Object type
Browse files Browse the repository at this point in the history
  • Loading branch information
coder-hxl committed Jan 31, 2023
1 parent 53bf5d5 commit 6e2947d
Show file tree
Hide file tree
Showing 8 changed files with 36 additions and 17 deletions.
24 changes: 18 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

English | <a href="#cn" style="text-decoration: none">简体中文</a>

XCrawl is a Nodejs multifunctional crawler library. Provide configuration to batch fetch HTML, JSON, images, etc.
XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.

## Install

Expand Down Expand Up @@ -47,7 +47,7 @@ class XCrawl {
constructor(baseConfig?: IXCrawlBaseConifg)
fetchData<T = any>(config: IFetchDataConfig): Promise<IFetchCommon<T>>
fetchFile(config: IFetchFileConfig): Promise<IFetchCommon<IFileInfo>>
fetchHTML(url: string): Promise<JSDOM>
fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM>
}
```
Expand Down Expand Up @@ -130,7 +130,7 @@ fetchHTML is the method of the above <a href="#myXCrawl" style="text-decoration
- Type
```ts
function fetchHTML(url: string): Promise<JSDOM>
function fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM>
```

- Example
Expand Down Expand Up @@ -237,6 +237,12 @@ interface IFetchFileConfig extends IFetchBaseConifg {
}
```
- IFetchHTMLConfig
```ts
interface IFetchHTMLConfig extends IRequestConfig {}
```
## More
If you have any **questions** or **needs** , please submit **Issues in** https://github.com/coder-hxl/x-crawl/issues .
Expand All @@ -249,7 +255,7 @@ If you have any **questions** or **needs** , please submit **Issues in** https:/
<a href="#en" style="text-decoration: none">English</a> | 简体中文
XCrawl 是 Nodejs 多功能爬虫库。提供配置即可批量抓取 HTML 、JSON、图片等等
XCrawl 是 Nodejs 多功能爬虫库。只需简单的配置即可抓取 HTML 、JSON、文件资源等等
## 安装
Expand Down Expand Up @@ -294,7 +300,7 @@ class XCrawl {
constructor(baseConfig?: IXCrawlBaseConifg)
fetchData<T = any>(config: IFetchDataConfig): Promise<IFetchCommon<T>>
fetchFile(config: IFetchFileConfig): Promise<IFetchCommon<IFileInfo>>
fetchHTML(url: string): Promise<JSDOM>
fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM>
}
```

Expand Down Expand Up @@ -377,7 +383,7 @@ fetchHTML 是上面 <a href="#cn-myXCrawl" style="text-decoration: none">myXCra
- 类型

```ts
function fetchHTML(url: string): Promise<JSDOM>
function fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM>
```

- 示例
Expand Down Expand Up @@ -484,6 +490,12 @@ interface IFetchFileConfig extends IFetchBaseConifg {
}
```

- IFetchHTMLConfig

```ts
interface IFetchHTMLConfig extends IRequestConfig {}
```

## 更多

如有 **问题****需求** 请在 https://github.com/coder-hxl/x-crawl/issues 中提 **Issues** 。
4 changes: 2 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
{
"private": true,
"name": "x-crawl",
"version": "0.1.0",
"version": "0.1.1",
"author": "CoderHxl",
"description": "XCrawl is a Nodejs multifunctional crawler library.",
"description": "XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.",
"license": "MIT",
"main": "src/index.ts",
"scripts": {
Expand Down
4 changes: 2 additions & 2 deletions publish/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

English | <a href="#cn" style="text-decoration: none">简体中文</a>

XCrawl is a Nodejs multifunctional crawler library. Provide configuration to batch fetch HTML, JSON, images, etc.
XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.

## Install

Expand Down Expand Up @@ -249,7 +249,7 @@ If you have any **questions** or **needs** , please submit **Issues in** https:/
<a href="#en" style="text-decoration: none">English</a> | 简体中文
XCrawl 是 Nodejs 多功能爬虫库。提供配置即可批量抓取 HTML 、JSON、图片等等
XCrawl 是 Nodejs 多功能爬虫库。只需简单的配置即可抓取 HTML 、JSON、文件资源等等
## 安装
Expand Down
4 changes: 2 additions & 2 deletions publish/package.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
{
"name": "x-crawl",
"version": "0.1.0",
"version": "0.1.1",
"author": "CoderHxl",
"description": "XCrawl is a Nodejs multifunctional crawler library.",
"description": "XCrawl is a Nodejs multifunctional crawler library. Crawl HTML, JSON, file resources, etc. through simple configuration.",
"license": "MIT",
"keywords": [
"nodejs",
Expand Down
11 changes: 8 additions & 3 deletions src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ import path from 'node:path'
import { JSDOM } from 'jsdom'

import { batchRequest, request } from './request'
import { isArray, isUndefined } from './utils'
import { isArray, isString, isUndefined } from './utils'

import {
IXCrawlBaseConifg,
IFetchDataConfig,
IFetchFileConfig,
IFetchHTMLConfig,
IFetchBaseConifg,
IFileInfo,
IFetchCommon,
Expand Down Expand Up @@ -145,9 +146,13 @@ export default class XCrawl {
})
}

async fetchHTML(url: string): Promise<JSDOM> {
async fetchHTML(config: string | IFetchHTMLConfig): Promise<JSDOM> {
const rawRequestConifg: IFetchHTMLConfig = isString(config)
? { url: config }
: config

const { requestConifg } = mergeConfig(this.baseConfig, {
requestConifg: { url }
requestConifg: rawRequestConifg
})

const requestResItem = await request(requestConifg)
Expand Down
2 changes: 2 additions & 0 deletions src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ export interface IFetchFileConfig extends IFetchBaseConifg {
}
}

export interface IFetchHTMLConfig extends IRequestConfig {}

export interface IFileInfo {
fileName: string
mimeType: string
Expand Down
2 changes: 1 addition & 1 deletion test/start/index.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion test/start/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ const testXCrawl = new XCrawl({
// console.log(res)
// })

testXCrawl.fetchHTML('https://www.bilibili.com/').then((jsdom) => {
testXCrawl.fetchHTML({ url: 'https://www.bilibili.com/' }).then((jsdom) => {
const document = jsdom.window.document
const imgBoxEl = document.querySelectorAll('.bili-video-card__cover')

Expand Down

0 comments on commit 6e2947d

Please sign in to comment.