crop text in middle instead of decimate it #809

wassname · 2024-10-29T05:27:40Z

A few problems:

for cropping we were discarding every N sentence, this creates garbage text
we were calculating the max tokens wrong (or at least the logic didn't make sense to me)

So I've

change the cropText for the whole extension to remove the excess text from the middle and replace it with a warning "this text has had X% removed"
use \n instead of , for more natural text
change the way we count the token limit to help us use all of it

PeterDaveHello · 2025-01-31T16:03:55Z

src/utils/crop-text.mjs

  } else {
-    maxLength -= 100 + clamp(userConfig.maxResponseTokenLength, 1, maxLength - 1000)
+    // if we don't have the models exact content limit use the default


So maybe the entire else block should be removed?

Copilot

Pull Request Overview

This PR updates the text cropping functionality to remove excess text from the middle rather than discarding every N sentence, while also adjusting token limit calculations and separator characters for a more natural output. Key changes include:

Modifying cropText to split text by an expanded set of delimiters and crop from the middle with an appended disclaimer.
Updating the token limit logic and default configurations for extended model contexts.
Adjusting the YouTube site adapter to use newline as a separator and refining the summary prompt.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
src/utils/crop-text.mjs	Revised crop text function: new cropping algorithm, delimiter update, and token limit adjustments.
src/content-script/site-adapters/youtube/index.mjs	Updated subtitle extraction and summary prompt adjustments.
src/config/index.mjs	Increased default maxResponseTokenLength to accommodate longer texts.

Copilot · 2025-05-31T20:19:26Z

src/utils/crop-text.mjs

+  croppedText = splits.slice(0, middleIndex).join('\n')
+  if (croppedTokens > 0) {
+    croppedText += `\n\n**Important disclaimer**, this text is incomplete! ${croppedTokens} or ${
+      (croppedTokens / totalTokens).toFixed(2) * 100


[nitpick] Consider explicitly converting the result of toFixed to a number (e.g., using Number(...)) to avoid any unexpected type coercion issues.

Suggested change

(croppedTokens / totalTokens).toFixed(2) * 100

Number((croppedTokens / totalTokens).toFixed(2)) * 100

Copilot · 2025-05-31T20:19:27Z

src/utils/crop-text.mjs

+  let firstHalfTokens = 0
+  let secondHalfTokens = 0
+  const halfTargetTokens = Math.floor(cropTargetLength / 2)
+  let middleIndex = -1


If the input text is short and the first pass never exceeds the halfTargetTokens, middleIndex will remain -1, which may lead to unexpected slicing. Consider adding a condition to bypass cropping when the text length is within limits.

PeterDaveHello · 2025-06-02T18:09:40Z

/review

qodo-merge-pro · 2025-06-02T18:10:00Z

Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 800 - Partially compliant Compliant requirements: • Fix the "decimation" approach to text cropping which creates garbage text Non-compliant requirements: • Fix the issue where the extension uses subtitles instead of transcript on YouTube • Ensure the extension can handle uncommon words that are missed in subtitles Requires further human verification: • Improve the quality of text used for summarization - The PR changes the cropping method but still uses subtitles, so verification is needed to see if the summary quality has improved despite still using subtitles
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Percentage Calculation The percentage calculation for the removed tokens might be incorrect. The expression `(croppedTokens / totalTokens).toFixed(2) * 100` first converts to a string with 2 decimal places, then multiplies by 100, which could lead to unexpected results. croppedText += `\n\nImportant disclaimer, this text is incomplete! ${croppedTokens} or ${ (croppedTokens / totalTokens).toFixed(2) * 100 }% of tokens have been removed from this location in the text due to lack limited model context of ${maxLength}\n\n` Incomplete Implementation The PR addresses the cropping method but doesn't solve the core issue from the ticket - it still uses subtitles instead of transcript data. The PR improves how text is cropped but doesn't change the data source. const splits = text.split(/[,，。?？!！;；\n]/).map((s) => s.trim()) const splitsLength = splits.map((s) => (tiktoken ? encode(s).length : s.length)) const cropTargetLength = maxLength - startLength - endLength let firstHalfTokens = 0 let secondHalfTokens = 0 const halfTargetTokens = Math.floor(cropTargetLength / 2) let middleIndex = -1 let endStartIndex = splits.length let totalTokens = splitsLength.reduce((sum, length) => sum + length + 1, 0) let croppedTokens = 0 let croppedText = '' let currentLength = 0 // First pass: find the middle for (let i = 0; i < splits.length; i++) { if (firstHalfTokens < halfTargetTokens) { firstHalfTokens += splitsLength[i] + 1 } else { middleIndex = i break } } // Second pass: find the start of the end section for (let i = splits.length - 1; i >= middleIndex; i--) { secondHalfTokens += splitsLength[i] + 1 if (secondHalfTokens >= halfTargetTokens) { endStartIndex = i break } } // Calculate cropped tokens croppedTokens = totalTokens - firstHalfTokens - secondHalfTokens // Construct the cropped text croppedText = splits.slice(0, middleIndex).join('\n') if (croppedTokens > 0) { croppedText += `\n\nImportant disclaimer, this text is incomplete! ${croppedTokens} or ${ (croppedTokens / totalTokens).toFixed(2) * 100 }% of tokens have been removed from this location in the text due to lack limited model context of ${maxLength}\n\n` } croppedText += splits.slice(endStartIndex).join('\n')

PeterDaveHello · 2025-06-02T18:10:25Z

/improve

qodo-merge-pro · 2025-06-02T18:10:55Z

Qodo Merge was enabled for this repository. To continue using it, please link your Git account with your Qodo account here.

PR Code Suggestions ✨

Category	Suggestion	Impact
Possible issue	Fix percentage calculation and grammar The disclaimer message has a grammatical error with "due to lack limited model context". Also, the percentage calculation is incorrect as `.toFixed(2)` returns a string, so multiplying by 100 doesn't work as expected. src/utils/crop-text.mjs [101-103] croppedText += `\n\nImportant disclaimer, this text is incomplete! ${croppedTokens} or ${ - (croppedTokens / totalTokens).toFixed(2) * 100 -}% of tokens have been removed from this location in the text due to lack limited model context of ${maxLength}\n\n` + ((croppedTokens / totalTokens) * 100).toFixed(2) +}% of tokens have been removed from this location in the text due to limited model context of ${maxLength}\n\n` Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies two real issues: a grammatical error ("lack limited" should be "limited") and a mathematical bug where `.toFixed(2)` returns a string that shouldn't be multiplied by 100. Both fixes improve code correctness and user experience.	Medium
More

wassname added 2 commits October 29, 2024 12:55

crop transcript in middle instead of decimate it, use \n instead of ,

9554492

better max Length

5a69226

wassname changed the title ~~crop transcript in middle instead of decimate it~~ crop text in middle instead of decimate it Oct 29, 2024

PeterDaveHello reviewed Jan 31, 2025

View reviewed changes

wip

b5d7247

PeterDaveHello requested a review from Copilot May 31, 2025 20:18

Copilot AI reviewed May 31, 2025

View reviewed changes

qodo-merge-pro bot added the Review effort 2/5 label Jun 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

crop text in middle instead of decimate it #809

crop text in middle instead of decimate it #809

Uh oh!

wassname commented Oct 29, 2024 •

edited

Loading

Uh oh!

PeterDaveHello Jan 31, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI May 31, 2025

Uh oh!

Copilot AI May 31, 2025

Uh oh!

PeterDaveHello commented Jun 2, 2025

Uh oh!

qodo-merge-pro bot commented Jun 2, 2025

Uh oh!

PeterDaveHello commented Jun 2, 2025

Uh oh!

qodo-merge-pro bot commented Jun 2, 2025

Uh oh!

Uh oh!

	(croppedTokens / totalTokens).toFixed(2) * 100
	Number((croppedTokens / totalTokens).toFixed(2)) * 100

crop text in middle instead of decimate it #809

Are you sure you want to change the base?

crop text in middle instead of decimate it #809

Uh oh!

Conversation

wassname commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PeterDaveHello Jan 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI May 31, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 31, 2025

Choose a reason for hiding this comment

Uh oh!

PeterDaveHello commented Jun 2, 2025

Uh oh!

qodo-merge-pro bot commented Jun 2, 2025

PR Reviewer Guide 🔍

Uh oh!

PeterDaveHello commented Jun 2, 2025

Uh oh!

qodo-merge-pro bot commented Jun 2, 2025

PR Code Suggestions ✨

Uh oh!

Uh oh!

wassname commented Oct 29, 2024 •

edited

Loading