Skip to content

Commit

Permalink
feat: add oEmbed support (misskey-dev#6)
Browse files Browse the repository at this point in the history
* feat: add oEmbed support

* more safelisted features

* fix the syntax

* Update README.md

* permissions

* names

* playerを使うように

* fix type error

* support width (for size ratio)

* test for type: video

* nullable width

* restore max height test

* ignored permissions

* restore autoplay

* Use WHATWG URL

---------

Co-authored-by: tamaina <tamaina@hotmail.co.jp>
  • Loading branch information
saschanaz and tamaina authored Mar 13, 2023
1 parent 51f3870 commit eab3766
Show file tree
Hide file tree
Showing 46 changed files with 3,936 additions and 112 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
Unreleased
------------------
* oEmbed type=richの制限的なサポート

3.0.4 / 2023-02-12
------------------
* 不要な依存関係を除去
Expand Down
44 changes: 28 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import { summaly } from 'summaly';
summaly(url[, opts])
```

As Fastify plugin:
As Fastify plugin:
(will listen `GET` of `/`)

```javascript
Expand Down Expand Up @@ -60,27 +60,39 @@ interface IPlugin {

A Promise of an Object that contains properties below:

※ Almost all values are nullable. player shoud not be null.
※ Almost all values are nullable. player should not be null.

#### Root

| Property | Type | Description |
| :-------------- | :------- | :--------------------------------------- |
| **description** | *string* | The description of the web page |
| **icon** | *string* | The url of the icon of the web page |
| **sitename** | *string* | The name of the web site |
| **thumbnail** | *string* | The url of the thumbnail of the web page |
| **player** | *Player* | The player of the web page |
| **title** | *string* | The title of the web page |
| **url** | *string* | The url of the web page |
| Property | Type | Description |
| :-------------- | :------- | :------------------------------------------ |
| **description** | *string* | The description of the web page |
| **icon** | *string* | The url of the icon of the web page |
| **sitename** | *string* | The name of the web site |
| **thumbnail** | *string* | The url of the thumbnail of the web page |
| **oEmbed** | *OEmbedRichIframe* | The oEmbed rich iframe info of the web page |
| **player** | *Player* | The player of the web page |
| **title** | *string* | The title of the web page |
| **url** | *string* | The url of the web page |

#### Player

| Property | Type | Description |
| :-------------- | :------- | :--------------------------------------- |
| **url** | *string* | The url of the player |
| **width** | *number* | The width of the player |
| **height** | *number* | The height of the player |
| Property | Type | Description |
| :-------------- | :--------- | :---------------------------------------------- |
| **url** | *string* | The url of the player |
| **width** | *number* | The width of the player |
| **height** | *number* | The height of the player |
| **allow** | *string[]* | The names of the allowed permissions for iframe |

Currently the possible items in `allow` are:

* `autoplay`
* `clipboard-write`
* `fullscreen`
* `encrypted-media`
* `picture-in-picture`

See [Permissions Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/Permissions_Policy) in MDN for details of them.

### Example

Expand Down
6 changes: 3 additions & 3 deletions built/general.d.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import * as URL from 'node:url';
import Summary from './summary.js';
declare const _default: (url: URL.Url, lang?: string | null) => Promise<Summary | null>;
import { URL } from 'node:url';
import type { default as Summary } from './summary.js';
declare const _default: (_url: URL | string, lang?: string | null) => Promise<Summary | null>;
export default _default;
142 changes: 116 additions & 26 deletions built/general.js
Original file line number Diff line number Diff line change
@@ -1,11 +1,109 @@
import * as URL from 'node:url';
import { URL } from 'node:url';
import clip from './utils/clip.js';
import cleanupTitle from './utils/cleanup-title.js';
import { decode as decodeHtml } from 'html-entities';
import { head, scpaping } from './utils/got.js';
export default async (url, lang = null) => {
import { get, head, scpaping } from './utils/got.js';
import * as cheerio from 'cheerio';
/**
* Contains only the html snippet for a sanitized iframe as the thumbnail is
* mostly covered in OpenGraph instead.
*
* Width should always be 100%.
*/
async function getOEmbedPlayer($, pageUrl) {
const href = $('link[type="application/json+oembed"]').attr('href');
if (!href) {
return null;
}
const oEmbed = await get((new URL(href, pageUrl)).href);
const body = (() => {
try {
return JSON.parse(oEmbed);
}
catch { }
})();
if (!body || body.version !== '1.0' || !['rich', 'video'].includes(body.type)) {
// Not a well formed rich oEmbed
return null;
}
if (!body.html.startsWith('<iframe ') || !body.html.endsWith('</iframe>')) {
// It includes something else than an iframe
return null;
}
const oEmbedHtml = cheerio.load(body.html);
const iframe = oEmbedHtml("iframe");
if (iframe.length !== 1) {
// Somehow we either have multiple iframes or none
return null;
}
if (iframe.parents().length !== 2) {
// Should only have the body and html elements as the parents
return null;
}
const url = iframe.attr('src');
if (!url) {
// No src?
return null;
}
try {
if ((new URL(url)).protocol !== 'https:') {
// Allow only HTTPS for best security
return null;
}
}
catch (e) {
return null;
}
// Height is the most important, width is okay to be null. The implementer
// should choose fixed height instead of fixed aspect ratio if width is null.
//
// For example, Spotify's embed page does not strictly follow aspect ratio
// and thus keeping the height is better than keeping the aspect ratio.
//
// Spotify gives `width: 100%, height: 152px` for iframe while `width: 456,
// height: 152` for oEmbed data, and we treat any percentages as null here.
let width = Number(iframe.attr('width') ?? body.width);
if (Number.isNaN(width)) {
width = null;
}
const height = Math.min(Number(iframe.attr('height') ?? body.height), 1024);
if (Number.isNaN(height)) {
// No proper height info
return null;
}
// TODO: This implementation only allows basic syntax of `allow`.
// Might need to implement better later.
const safeList = [
'autoplay',
'clipboard-write',
'fullscreen',
'encrypted-media',
'picture-in-picture',
'web-share',
];
// YouTube has these but they are almost never used.
const ignoredList = [
'gyroscope',
'accelerometer',
];
const allowedPermissions = (iframe.attr('allow') ?? '').split(/\s*;\s*/g)
.filter(s => s)
.filter(s => !ignoredList.includes(s));
if (allowedPermissions.some(allow => !safeList.includes(allow))) {
// This iframe is probably too powerful to be embedded
return null;
}
return {
url,
width,
height,
allow: allowedPermissions
};
}
export default async (_url, lang = null) => {
if (lang && !lang.match(/^[\w-]+(\s*,\s*[\w-]+)*$/))
lang = null;
const url = typeof _url === 'string' ? new URL(_url) : _url;
const res = await scpaping(url.href, { lang: lang || undefined });
const $ = res.$;
const twitterCard = $('meta[property="twitter:card"]').attr('content');
Expand All @@ -21,7 +119,7 @@ export default async (url, lang = null) => {
$('link[rel="image_src"]').attr('href') ||
$('link[rel="apple-touch-icon"]').attr('href') ||
$('link[rel="apple-touch-icon image_src"]').attr('href');
image = image ? URL.resolve(url.href, image) : null;
image = image ? (new URL(image, url.href)).href : null;
const playerUrl = (twitterCard !== 'summary_large_image' && $('meta[property="twitter:player"]').attr('content')) ||
(twitterCard !== 'summary_large_image' && $('meta[name="twitter:player"]').attr('content')) ||
$('meta[property="og:video"]').attr('content') ||
Expand All @@ -44,53 +142,45 @@ export default async (url, lang = null) => {
if (title === description) {
description = null;
}
let siteName = $('meta[property="og:site_name"]').attr('content') ||
let siteName = decodeHtml($('meta[property="og:site_name"]').attr('content') ||
$('meta[name="application-name"]').attr('content') ||
url.hostname;
siteName = siteName ? decodeHtml(siteName) : null;
url.hostname);
const favicon = $('link[rel="shortcut icon"]').attr('href') ||
$('link[rel="icon"]').attr('href') ||
'/favicon.ico';
const sensitive = $('.tweet').attr('data-possibly-sensitive') === 'true';
const find = async (path) => {
const target = URL.resolve(url.href, path);
const target = new URL(path, url.href);
try {
await head(target);
await head(target.href);
return target;
}
catch (e) {
return null;
}
};
// 相対的なURL (ex. test) を絶対的 (ex. /test) に変換
const toAbsolute = (relativeURLString) => {
const relativeURL = URL.parse(relativeURLString);
const isAbsolute = relativeURL.slashes || relativeURL.path !== null && relativeURL.path[0] === '/';
// 既に絶対的なら、即座に値を返却
if (isAbsolute) {
return relativeURLString;
}
// スラッシュを付けて返却
return '/' + relativeURLString;
const getIcon = async () => {
return (await find(favicon)) || null;
};
const icon = await find(favicon) ||
// 相対指定を絶対指定に変換し再試行
await find(toAbsolute(favicon)) ||
null;
const [icon, oEmbed] = await Promise.all([
getIcon(),
getOEmbedPlayer($, url.href),
]);
// Clean up the title
title = cleanupTitle(title, siteName);
if (title === '') {
title = siteName;
}
return {
title: title || null,
icon: icon || null,
icon: icon?.href || null,
description: description || null,
thumbnail: image || null,
player: {
player: oEmbed ?? {
url: playerUrl || null,
width: Number.isNaN(playerWidth) ? null : playerWidth,
height: Number.isNaN(playerHeight) ? null : playerHeight
height: Number.isNaN(playerHeight) ? null : playerHeight,
allow: ['autoplay', 'encrypted-media', 'fullscreen'],
},
sitename: siteName || null,
sensitive,
Expand Down
4 changes: 2 additions & 2 deletions built/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
* summaly
* https://github.com/syuilo/summaly
*/
import * as URL from 'node:url';
import { URL } from 'node:url';
import tracer from 'trace-redirect';
import general from './general.js';
import { setAgent } from './utils/got.js';
Expand Down Expand Up @@ -30,7 +30,7 @@ export const summaly = async (url, options) => {
actualUrl = url;
}
}
const _url = URL.parse(actualUrl, true);
const _url = new URL(actualUrl);
// Find matching plugin
const match = plugins.filter(plugin => plugin.test(_url))[0];
// Get summary
Expand Down
6 changes: 3 additions & 3 deletions built/iplugin.d.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
/// <reference types="node" />
import * as URL from 'node:url';
import type { URL } from 'node:url';
import Summary from './summary.js';
export interface IPlugin {
test: (url: URL.Url) => boolean;
summarize: (url: URL.Url, lang?: string) => Promise<Summary>;
test: (url: URL) => boolean;
summarize: (url: URL, lang?: string) => Promise<Summary>;
}
6 changes: 3 additions & 3 deletions built/plugins/amazon.d.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/// <reference types="node" />
import * as URL from 'node:url';
import { URL } from 'node:url';
import summary from '../summary.js';
export declare function test(url: URL.Url): boolean;
export declare function summarize(url: URL.Url): Promise<summary>;
export declare function test(url: URL): boolean;
export declare function summarize(url: URL): Promise<summary>;
5 changes: 3 additions & 2 deletions built/plugins/amazon.js
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,9 @@ export async function summarize(url) {
player: {
url: playerUrl || null,
width: playerWidth ? parseInt(playerWidth) : null,
height: playerHeight ? parseInt(playerHeight) : null
height: playerHeight ? parseInt(playerHeight) : null,
allow: playerUrl ? ['fullscreen', 'encrypted-media'] : [],
},
sitename: 'Amazon'
sitename: 'Amazon',
};
}
6 changes: 3 additions & 3 deletions built/plugins/wikipedia.d.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/// <reference types="node" />
import * as URL from 'node:url';
import { URL } from 'node:url';
import summary from '../summary.js';
export declare function test(url: URL.Url): boolean;
export declare function summarize(url: URL.Url): Promise<summary>;
export declare function test(url: URL): boolean;
export declare function summarize(url: URL): Promise<summary>;
5 changes: 3 additions & 2 deletions built/plugins/wikipedia.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,9 @@ export async function summarize(url) {
player: {
url: null,
width: null,
height: null
height: null,
allow: [],
},
sitename: 'Wikipedia'
sitename: 'Wikipedia',
};
}
4 changes: 4 additions & 0 deletions built/summary.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,8 @@ export declare type Player = {
* The height of the player
*/
height: number | null;
/**
* The allowed permissions of the iframe
*/
allow: string[];
};
5 changes: 3 additions & 2 deletions built/utils/got.js
Original file line number Diff line number Diff line change
Expand Up @@ -84,14 +84,15 @@ async function getResponse(args) {
limit: 0,
},
});
return await receiveResponce({ req, typeFilter: args.typeFilter });
return await receiveResponse({ req, typeFilter: args.typeFilter });
}
async function receiveResponce(args) {
async function receiveResponse(args) {
const req = args.req;
const maxSize = MAX_RESPONSE_SIZE;
req.on('response', (res) => {
// Check html
if (args.typeFilter && !res.headers['content-type']?.match(args.typeFilter)) {
// console.warn(res.headers['content-type']);
req.cancel(`Rejected by type filter ${res.headers['content-type']}`);
return;
}
Expand Down
Loading

0 comments on commit eab3766

Please sign in to comment.