Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(middleware/etag): generate etag hash value from all chunks #3604

Merged
merged 4 commits into from
Nov 3, 2024

Conversation

usualoma
Copy link
Member

Fix #3536

There are several ways to generate digests from ReadableStream, but here we have used a method that includes the previous hash value for each read, so that unique digests are generated for the same continuous data.
Even if the overall data is the same, if the number of chunks read is different, a different digest will be generated, but since it is unlikely that the same architecture (server) will have different data divisions, I think it will work effectively in practice.
It uses only the web standard crypto.subtle, and is characterized by its low memory load even for large-sized files.

Other approaches

Self-implemented incremental SHA1 generator

We can write our own incremental sha1 generator in TypeScript, and using this it is possible to generate the correct sha1 for the entire data. However, I did not adopt this because an exact sha1 is not necessary for etags, and it is also slow.

incremental_sha1.ts
export class IncrementalSha1 {
  private state: number[]
  private buffer: number[]
  private byteCount: number

  constructor() {
    this.state = [0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476, 0xc3d2e1f0]
    this.buffer = []
    this.byteCount = 0
  }

  private static rotateLeft(n: number, s: number): number {
    return (n << s) | (n >>> (32 - s))
  }

  update(data: string | Uint8Array): void {
    const input = typeof data === 'string' ? new TextEncoder().encode(data) : data
    for (let i = 0; i < input.length; i++) {
      this.buffer.push(input[i])
      this.byteCount++
      if (this.buffer.length === 64) {
        this.processChunk()
        this.buffer = []
      }
    }
  }

  private processChunk(): void {
    const w = new Array(80)
    for (let i = 0; i < 16; i++) {
      w[i] =
        (this.buffer[i * 4] << 24) |
        (this.buffer[i * 4 + 1] << 16) |
        (this.buffer[i * 4 + 2] << 8) |
        this.buffer[i * 4 + 3]
    }

    for (let i = 16; i < 80; i++) {
      w[i] = IncrementalSha1.rotateLeft(w[i - 3] ^ w[i - 8] ^ w[i - 14] ^ w[i - 16], 1)
    }

    let [a, b, c, d, e] = this.state

    for (let i = 0; i < 80; i++) {
      let f: number, k: number
      if (i < 20) {
        f = (b & c) | (~b & d)
        k = 0x5a827999
      } else if (i < 40) {
        f = b ^ c ^ d
        k = 0x6ed9eba1
      } else if (i < 60) {
        f = (b & c) | (b & d) | (c & d)
        k = 0x8f1bbcdc
      } else {
        f = b ^ c ^ d
        k = 0xca62c1d6
      }

      const temp = (IncrementalSha1.rotateLeft(a, 5) + f + e + k + w[i]) | 0
      e = d
      d = c
      c = IncrementalSha1.rotateLeft(b, 30)
      b = a
      a = temp
    }

    this.state[0] = (this.state[0] + a) | 0
    this.state[1] = (this.state[1] + b) | 0
    this.state[2] = (this.state[2] + c) | 0
    this.state[3] = (this.state[3] + d) | 0
    this.state[4] = (this.state[4] + e) | 0
  }

  digest(): string {
    const bitCount = this.byteCount * 8
    this.update(new Uint8Array([0x80]))

    while (this.buffer.length !== 56) {
      this.update(new Uint8Array([0x00]))
    }

    this.update(
      new Uint8Array([
        (bitCount / 0x100000000) >>> 24,
        (bitCount / 0x100000000) >>> 16,
        (bitCount / 0x100000000) >>> 8,
        (bitCount / 0x100000000) >>> 0,
        (bitCount & 0xffffffff) >>> 24,
        (bitCount & 0xffffffff) >>> 16,
        (bitCount & 0xffffffff) >>> 8,
        (bitCount & 0xffffffff) >>> 0,
      ])
    )

    const hash = this.state.map((n) => n.toString(16).padStart(8, '0')).join('')
    return hash
  }
}
benchmark
import { run, group, bench } from 'mitata'
import { IncrementalSha1 } from './etag/incremental_sha1'

bench('noop', () => {})

function createReadableStream(): ReadableStream<Uint8Array> {
  const chunkSize = 1024 * 1024 // 1MB
  const totalSize = 2 * 1024 * 1024 // 2MB
  let bytesGenerated = 0

  return new ReadableStream({
    pull(controller) {
      if (bytesGenerated >= totalSize) {
        controller.close()
        return
      }

      const remainingBytes = totalSize - bytesGenerated
      const currentChunkSize = Math.min(chunkSize, remainingBytes)
      const chunk = new Uint8Array(currentChunkSize)

      for (let i = 0; i < currentChunkSize; i++) {
        chunk[i] = Math.floor(Math.random() * 256)
      }

      controller.enqueue(chunk)
      bytesGenerated += currentChunkSize
    },
  })
}

function mergeBuffers(buffer1: ArrayBuffer | undefined, buffer2: Uint8Array): Uint8Array {
  if (!buffer1) {
    return buffer2
  }
  const merged = new Uint8Array(buffer1.byteLength + buffer2.byteLength)
  merged.set(new Uint8Array(buffer1), 0)
  merged.set(buffer2, buffer1.byteLength)
  return merged
}

group('etag', () => {
  bench('IncrementalSha1', async () => {
    const stream = createReadableStream()
    const reader = stream.getReader()
    const incrementalSha1 = new IncrementalSha1()

    for (;;) {
      const { done, value } = await reader.read()
      if (done) {
        break
      }
      incrementalSha1.update(value)
    }
    incrementalSha1.digest()
  })

  bench('crypto.subtle', async () => {
    const stream = createReadableStream()
    const reader = stream.getReader()

    let result: ArrayBuffer | undefined = undefined
    for (;;) {
      const { done, value } = await reader.read()
      if (done) {
        break
      }
      result = await crypto.subtle.digest(
        {
          name: 'SHA-1',
        },
        mergeBuffers(result, value)
      )
    }

    return result
      ? Array.prototype.map
          .call(new Uint8Array(result), (x) => x.toString(16).padStart(2, '0'))
          .join('')
      : null
  })
})

run()
cpu: Apple M2 Pro
runtime: node v22.2.0 (arm64-darwin)

benchmark            time (avg)             (min … max)       p75       p99      p999
------------------------------------------------------- -----------------------------
noop                 74 ps/iter        (21 ps … 145 ns)     82 ps    102 ps    265 ps !

• etag
------------------------------------------------------- -----------------------------
IncrementalSha1  46'900 µs/iter (46'731 µs … 47'504 µs) 47'026 µs 47'504 µs 47'504 µs
crypto.subtle    13'215 µs/iter (12'947 µs … 14'070 µs) 13'366 µs 14'070 µs 14'070 µs

summary for etag
  crypto.subtle
   3.55x faster than IncrementalSha1
cpu: Apple M2 Pro
runtime: bun 1.1.30 (arm64-darwin)

benchmark            time (avg)             (min … max)       p75       p99      p999
------------------------------------------------------- -----------------------------
noop                 46 ps/iter       (0 ps … 25.04 ns)     41 ps     82 ps    122 ps !

• etag
------------------------------------------------------- -----------------------------
IncrementalSha1  40'356 µs/iter (39'951 µs … 40'810 µs) 40'458 µs 40'810 µs 40'810 µs
crypto.subtle     3'776 µs/iter   (3'585 µs … 4'418 µs)  3'808 µs  4'377 µs  4'418 µs

summary for etag
  crypto.subtle
   10.69x faster than IncrementalSha1
cpu: Apple M2 Pro
runtime: deno 2.0.0 (aarch64-apple-darwin)

benchmark            time (avg)             (min … max)       p75       p99      p999
------------------------------------------------------- -----------------------------
noop                507 ps/iter        (427 ps … 40 ns)    509 ps    671 ps    773 ps

• etag
------------------------------------------------------- -----------------------------
IncrementalSha1  46'244 µs/iter (45'222 µs … 48'608 µs) 47'449 µs 48'608 µs 48'608 µs
crypto.subtle    14'164 µs/iter (14'024 µs … 14'505 µs) 14'188 µs 14'505 µs 14'505 µs

summary for etag
  crypto.subtle
   3.26x faster than IncrementalSha1

Using "node:crypto"

“node:crypto” can be used in runtimes other than node, so we can use it to incrementally add data and generate sha1.
https://nodejs.org/api/crypto.html

However, I didn't adopt it because I thought that ‘node:*’ should not be used where crypto.subtle is sufficient.

The author should do the following, if applicable

  • Add tests
  • Run tests
  • bun run format:fix && bun run lint:fix to format the code

Copy link

codecov bot commented Oct 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.80%. Comparing base (4dd8b2b) to head (d7083e0).
Report is 15 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3604      +/-   ##
==========================================
- Coverage   94.71%   89.80%   -4.92%     
==========================================
  Files         158      159       +1     
  Lines        9555    10120     +565     
  Branches     2813     2932     +119     
==========================================
+ Hits         9050     9088      +38     
- Misses        505     1032     +527     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -8,7 +8,7 @@ type Algorithm = {
alias: string
}

type Data = string | boolean | number | object | ArrayBufferView | ArrayBuffer | ReadableStream
type Data = string | boolean | number | object | ArrayBufferView | ArrayBuffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type Data = string | boolean | number | object | ArrayBufferView | ArrayBuffer
type Data = string | boolean | number | object

I think we can remove ArrayBuffer, ArrayBufferView, etc. since they are subtypes of object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out! I see.
It's true that it allows any object (including those that cannot be JSON.stringify ), not just ArrayBufferView or ArrayBuffer.

Semantically, I think the following changes are appropriate.

diff --git a/src/utils/crypto.ts b/src/utils/crypto.ts
index 9ffb8310..68fcf8ca 100644
--- a/src/utils/crypto.ts
+++ b/src/utils/crypto.ts
@@ -3,12 +3,14 @@
  * Crypto utility.
  */
 
+import type { JSONValue } from './types'
+
 type Algorithm = {
   name: string
   alias: string
 }
 
-type Data = string | boolean | number | object | ArrayBufferView | ArrayBuffer
+type Data = string | boolean | number | JSONValue | ArrayBufferView | ArrayBuffer
 
 export const sha256 = async (data: Data): Promise<string | null> => {
   const algorithm: Algorithm = { name: 'SHA-256', alias: 'sha256' }

However, there is a slight possibility (I think it's very slight) that it could affect compatibility, so I think it would be better to consider it in a separate pull request.

Copy link
Contributor

@EdamAme-x EdamAme-x Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll try it.

Co-authored-by: Yusuke Wada <yusuke@kamawada.com>
@yusukebe
Copy link
Member

yusukebe commented Nov 3, 2024

@usualoma

Looks good to me! I'll merge this. Thanks!

@yusukebe yusukebe merged commit 22db73f into honojs:main Nov 3, 2024
16 checks passed
TinsFox pushed a commit to TinsFox/hono that referenced this pull request Nov 11, 2024
…s#3604)

* fix(middleware/etag): generate etag hash value from all chunks

* refactor(middleware/etag): returns immediately if digest cannot be generated.

* refactor(utils/crypto): remove ReadableStream support

* test(middleware/etag): add tests for empty stream and null body

Co-authored-by: Yusuke Wada <yusuke@kamawada.com>

---------

Co-authored-by: Yusuke Wada <yusuke@kamawada.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Etag middleware generating the same weak etag for different body content.
3 participants