-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit 51425dd
⚡️ Speed up function
Saurabh's comments - The changes look good, especially because they have
been rigorously tested with a variety of cases, which makes me feel
confident
### 📄 59% (0.59x) speedup for ***`sentence_count` in
`unstructured/partition/text_type.py`***
⏱️ Runtime : **`190 milliseconds`** **→** **`119 milliseconds`** (best
of `39` runs)
### 📝 Explanation and details
Major speedups.
- Replace list comprehensions with generator expressions in counting
scenarios to avoid building intermediate lists.
- Use a simple word count (split by space or with str.split()) after
punctuation removal, rather than expensive word_tokenize call, since
only token count is used and punctuation is already stripped.
- Avoid calling remove_punctuation and word_tokenize on already very
short sentences if there's a min_length filter: filter quickly if text
length is zero.
-
If you wish to maximize compatibility with sentences containing
non-whitespace-separable tokens (e.g. CJK languages), consider further
optimization on the token counting line as needed for your domain.
Otherwise, `str.split()` after punctuation removal suffices and is far
faster than a full NLP tokenizer.
✅ **Correctness verification report:**
| Test | Status |
| --------------------------- | ----------------- |
| ⚙️ Existing Unit Tests | ✅ **21 Passed** |
| 🌀 Generated Regression Tests | ✅ **92 Passed** |
| ⏪ Replay Tests | ✅ **695 Passed** |
| 🔎 Concolic Coverage Tests | 🔘 **None Found** |
|📊 Tests Coverage | 100.0% |
<details>
<summary>⚙️ Existing Unit Tests and Runtime</summary>
| Test File::Test Function | Original ⏱️ | Optimized ⏱️ | Speedup |
|:-----------------------------------------------------------------------------------------|:--------------|:---------------|:----------|
| `partition/test_text_type.py::test_item_titles` | 92.5μs | 47.8μs |
✅93.6% |
| `partition/test_text_type.py::test_sentence_count` | 47.8μs | 4.67μs |
✅924% |
|
`test_tracer_py__replay_test_0.py::test_unstructured_partition_text_type_sentence_count`
| 4.37ms | 2.11ms | ✅107% |
|
`test_tracer_py__replay_test_2.py::test_unstructured_partition_text_type_sentence_count`
| 13.0ms | 4.69ms | ✅177% |
|
`test_tracer_py__replay_test_3.py::test_unstructured_partition_text_type_sentence_count`
| 5.29ms | 3.02ms | ✅75.3% |
</details>
<details>
<summary>🌀 Generated Regression Tests and Runtime</summary>
```python
from __future__ import annotations
import random
import string
import sys
import unicodedata
from functools import lru_cache
from typing import Final, List, Optional
# imports
import pytest # used for our unit tests
from nltk import sent_tokenize as _sent_tokenize
from nltk import word_tokenize as _word_tokenize
from unstructured.cleaners.core import remove_punctuation
from unstructured.logger import trace_logger
from unstructured.nlp.tokenize import sent_tokenize, word_tokenize
from unstructured.partition.text_type import sentence_count
# unit tests
# --------------------
# BASIC TEST CASES
# --------------------
def test_empty_string():
# Empty string should return 0 sentences
codeflash_output = sentence_count("") # 7.66μs -> 6.98μs (9.74% faster)
def test_single_sentence():
# Single sentence with period
codeflash_output = sentence_count("This is a sentence.") # 36.7μs -> 9.46μs (288% faster)
def test_single_sentence_no_punctuation():
# Single sentence, no punctuation (should still count as 1 by NLTK)
codeflash_output = sentence_count("This is a sentence") # 29.5μs -> 7.63μs (286% faster)
def test_two_sentences():
# Two sentences separated by period
codeflash_output = sentence_count("This is one. This is two.") # 72.4μs -> 37.0μs (95.7% faster)
def test_multiple_sentences_with_various_punctuation():
# Sentences ending with ! and ?
codeflash_output = sentence_count("Is this working? Yes! It is.") # 93.2μs -> 44.5μs (109% faster)
def test_sentence_with_abbreviation():
# Abbreviations shouldn't split sentences
codeflash_output = sentence_count("Dr. Smith went home. He was tired.") # 80.0μs -> 42.3μs (89.0% faster)
def test_sentence_with_ellipsis():
# Ellipsis should not split sentences
codeflash_output = sentence_count("Wait... what happened? I don't know.") # 76.9μs -> 42.5μs (81.0% faster)
def test_sentence_with_newlines():
# Sentences separated by newlines
codeflash_output = sentence_count("First sentence.\nSecond sentence.\nThird sentence.") # 91.7μs -> 43.4μs (111% faster)
def test_sentence_with_min_length_met():
# min_length is met for all sentences
codeflash_output = sentence_count("One two three. Four five six.", min_length=2) # 63.1μs -> 30.2μs (109% faster)
def test_sentence_with_min_length_not_met():
# Only one sentence meets min_length
codeflash_output = sentence_count("One. Two three four.", min_length=3) # 62.8μs -> 31.9μs (97.1% faster)
def test_sentence_with_min_length_none_met():
# No sentence meets min_length
codeflash_output = sentence_count("A. B.", min_length=2) # 60.9μs -> 32.3μs (88.5% faster)
def test_sentence_with_min_length_equals_length():
# Sentence with exactly min_length words
codeflash_output = sentence_count("One two three.", min_length=3) # 27.3μs -> 9.54μs (187% faster)
def test_sentence_with_trailing_space():
# Sentence with trailing spaces
codeflash_output = sentence_count("Hello world. ") # 27.8μs -> 8.60μs (223% faster)
# --------------------
# EDGE TEST CASES
# --------------------
def test_only_punctuation():
# Only punctuation, no words
codeflash_output = sentence_count("...!!!") # 39.1μs -> 33.5μs (16.8% faster)
def test_only_whitespace():
# Only whitespace
codeflash_output = sentence_count(" \n\t ") # 5.21μs -> 5.49μs (5.05% slower)
def test_sentence_with_numbers_and_symbols():
# Sentence with numbers and symbols
codeflash_output = sentence_count("12345! $%^&*()") # 66.6μs -> 32.7μs (104% faster)
def test_sentence_with_unicode_characters():
# Sentences with unicode and emoji
codeflash_output = sentence_count("Hello 😊. How are you?") # 75.9μs -> 37.7μs (102% faster)
def test_sentence_with_mixed_scripts():
# Sentences with mixed scripts (e.g., English and Japanese)
codeflash_output = sentence_count("Hello. こんにちは。How are you?") # 71.2μs -> 34.9μs (104% faster)
def test_sentence_with_multiple_spaces():
# Sentences with irregular spacing
codeflash_output = sentence_count("This is spaced. And so is this.") # 69.5μs -> 30.3μs (129% faster)
def test_sentence_with_no_word_characters():
# Only punctuation and numbers
codeflash_output = sentence_count("... 123 ...") # 42.1μs -> 25.5μs (65.2% faster)
def test_sentence_with_long_word():
# Sentence with a single long word
long_word = "a" * 100
codeflash_output = sentence_count(f"{long_word}.") # 42.0μs -> 7.55μs (457% faster)
def test_sentence_with_long_word_and_min_length():
# Sentence with long word, min_length > 1
long_word = "a" * 100
codeflash_output = sentence_count(f"{long_word}.", min_length=2) # 43.1μs -> 10.7μs (303% faster)
def test_sentence_with_only_abbreviation():
# Sentence is only an abbreviation
codeflash_output = sentence_count("U.S.A.") # 23.0μs -> 7.46μs (208% faster)
def test_sentence_with_nonbreaking_space():
# Sentence with non-breaking space
text = "Hello\u00A0world. How are you?"
codeflash_output = sentence_count(text) # 74.4μs -> 37.3μs (99.6% faster)
def test_sentence_with_tab_characters():
# Sentences separated by tabs
text = "Hello world.\tHow are you?\tFine."
codeflash_output = sentence_count(text) # 100μs -> 44.5μs (125% faster)
def test_sentence_with_multiple_punctuation_marks():
# Sentences ending with multiple punctuation marks
text = "Wait!! What?? Really..."
codeflash_output = sentence_count(text) # 88.9μs -> 48.3μs (83.9% faster)
def test_sentence_with_leading_and_trailing_punctuation():
# Sentence surrounded by punctuation
text = "...Hello world!..."
codeflash_output = sentence_count(text) # 25.9μs -> 8.64μs (200% faster)
def test_sentence_with_quotes():
# Sentences with quotes
text = '"Hello," she said. "How are you?"'
codeflash_output = sentence_count(text) # 85.3μs -> 46.6μs (83.1% faster)
def test_sentence_with_parentheses():
# Sentence with parentheses
text = "This is a sentence (with parentheses). This is another."
codeflash_output = sentence_count(text) # 74.7μs -> 33.5μs (123% faster)
def test_sentence_with_semicolons():
# Semicolons should not split sentences
text = "This is a sentence; this is not a new sentence."
codeflash_output = sentence_count(text) # 37.1μs -> 8.74μs (324% faster)
def test_sentence_with_colons():
# Colons should not split sentences
text = "This is a sentence: it continues here."
codeflash_output = sentence_count(text) # 33.2μs -> 8.44μs (294% faster)
def test_sentence_with_dash():
# Dashes should not split sentences
text = "This is a sentence - it continues here."
codeflash_output = sentence_count(text) # 34.8μs -> 8.52μs (309% faster)
def test_sentence_with_multiple_dots():
# Multiple dots but not ellipsis
text = "This is a sentence.... This is another."
codeflash_output = sentence_count(text) # 79.9μs -> 38.2μs (109% faster)
def test_sentence_with_min_length_and_punctuation():
# min_length with sentences containing only punctuation
text = "!!! ... ???"
codeflash_output = sentence_count(text, min_length=1) # 80.2μs -> 77.0μs (4.21% faster)
def test_sentence_with_min_length_and_numbers():
# min_length with numbers as words
text = "1 2 3 4. 5 6."
codeflash_output = sentence_count(text, min_length=4) # 69.6μs -> 34.6μs (101% faster)
def test_sentence_with_min_length_and_unicode():
# min_length with unicode
text = "😊 😊 😊 😊. Hello!"
codeflash_output = sentence_count(text, min_length=4) # 76.1μs -> 41.7μs (82.5% faster)
def test_sentence_with_non_ascii_punctuation():
# Sentence with non-ASCII punctuation (e.g., Chinese full stop)
text = "Hello world。How are you?"
codeflash_output = sentence_count(text) # 31.9μs -> 9.34μs (242% faster)
def test_sentence_with_repeated_newlines():
# Sentences separated by multiple newlines
text = "First sentence.\n\n\nSecond sentence."
codeflash_output = sentence_count(text) # 71.9μs -> 33.4μs (115% faster)
# --------------------
# LARGE SCALE TEST CASES
# --------------------
def test_large_number_of_sentences():
# 1000 sentences, each "Sentence X."
n = 1000
text = " ".join([f"Sentence {i}." for i in range(n)])
codeflash_output = sentence_count(text) # 21.2ms -> 8.43ms (151% faster)
def test_large_number_of_sentences_with_min_length():
# 1000 sentences, every even-indexed has 3 words, odd-indexed has 1 word
n = 1000
sentences = []
for i in range(n):
if i % 2 == 0:
sentences.append(f"Word1 Word2 Word3.")
else:
sentences.append(f"Word.")
text = " ".join(sentences)
# Only even-indexed sentences should count for min_length=3
codeflash_output = sentence_count(text, min_length=3)
def test_large_sentence():
# One very long sentence (999 words)
sentence = " ".join(["word"] * 999) + "."
codeflash_output = sentence_count(sentence) # 1.15ms -> 29.0μs (3854% faster)
codeflash_output = sentence_count(sentence, min_length=999) # 18.7μs -> 41.1μs (54.5% slower)
codeflash_output = sentence_count(sentence, min_length=1000) # 18.4μs -> 38.6μs (52.3% slower)
def test_large_text_with_varied_sentence_lengths():
# 500 short sentences, 500 long sentences (5 and 20 words)
n_short = 500
n_long = 500
short_sentence = "a b c d e."
long_sentence = " ".join(["word"] * 20) + "."
text = " ".join([short_sentence]*n_short + [long_sentence]*n_long)
# min_length=10 should only count long sentences
codeflash_output = sentence_count(text, min_length=10) # 8.33ms -> 7.46ms (11.7% faster)
# min_length=1 should count all
codeflash_output = sentence_count(text, min_length=1) # 412μs -> 722μs (42.9% slower)
def test_large_text_with_unicode_and_punctuation():
# 1000 sentences, each with emoji and punctuation
n = 1000
text = " ".join([f"Hello 😊! How are you?"] * n)
# Each repetition has 2 sentences
codeflash_output = sentence_count(text) # 15.8ms -> 15.3ms (3.27% faster)
def test_large_text_with_random_punctuation():
# 1000 sentences with random punctuation at the end
n = 1000
punctuations = [".", "!", "?"]
text = " ".join([f"Sentence {i}{random.choice(punctuations)}" for i in range(n)])
codeflash_output = sentence_count(text) # 20.6ms -> 8.02ms (157% faster)
def test_large_text_with_abbreviations():
# 1000 sentences, some with abbreviations
n = 1000
text = " ".join([f"Dr. Smith went home. He was tired."] * (n // 2))
# Each repetition has 2 sentences
codeflash_output = sentence_count(text) # 11.6ms -> 11.2ms (3.99% faster)
def test_large_text_with_newlines_and_tabs():
# 500 sentences separated by newlines, 500 by tabs
n = 500
text1 = "\n".join([f"Sentence {i}." for i in range(n)])
text2 = "\t".join([f"Sentence {i}." for i in range(n, 2*n)])
text = text1 + "\n" + text2
codeflash_output = sentence_count(text) # 21.3ms -> 8.57ms (149% faster)
def test_large_text_with_min_length_and_unicode():
# 1000 sentences, half with 5 emojis, half with 1 emoji
n = 1000
text = " ".join(["😊 " * 5 + "." if i % 2 == 0 else "😊." for i in range(n)])
# min_length=5 should count only even-indexed
codeflash_output = sentence_count(text, min_length=5) # 7.88ms -> 7.83ms (0.679% faster)
# min_length=1 should count all
codeflash_output = sentence_count(text, min_length=1) # 573μs -> 742μs (22.7% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations
import string
import sys
import unicodedata
from functools import lru_cache
from typing import Final, List, Optional
# imports
import pytest # used for our unit tests
from nltk import sent_tokenize as _sent_tokenize
from nltk import word_tokenize as _word_tokenize
from unstructured.partition.text_type import sentence_count
# Dummy trace_logger for test purposes (since real logger is not available)
class DummyLogger:
def detail(self, msg):
pass
trace_logger = DummyLogger()
from unstructured.partition.text_type import sentence_count
# unit tests
# ---------------- BASIC TEST CASES ----------------
def test_single_sentence():
# A simple sentence
codeflash_output = sentence_count("This is a sentence.") # 34.7μs -> 10.1μs (244% faster)
def test_multiple_sentences():
# Two distinct sentences
codeflash_output = sentence_count("This is the first sentence. This is the second.") # 77.6μs -> 33.9μs (129% faster)
def test_sentence_with_min_length_met():
# Sentence with enough words for min_length
codeflash_output = sentence_count("This is a long enough sentence.", min_length=5) # 33.7μs -> 10.9μs (209% faster)
def test_sentence_with_min_length_not_met():
# Sentence with too few words for min_length
codeflash_output = sentence_count("Too short.", min_length=3) # 28.4μs -> 11.1μs (156% faster)
def test_multiple_sentences_with_min_length():
# Only one of two sentences meets min_length
text = "Short. This one is long enough."
codeflash_output = sentence_count(text, min_length=4) # 73.7μs -> 37.0μs (99.1% faster)
def test_sentence_with_punctuation():
# Sentence with internal punctuation
text = "Hello, world! How are you?"
codeflash_output = sentence_count(text) # 66.2μs -> 29.9μs (121% faster)
def test_sentence_with_abbreviations():
# Sentence with abbreviation that should not split sentences
text = "Dr. Smith went to Washington. He arrived at 3 p.m. It was sunny."
codeflash_output = sentence_count(text) # 117μs -> 61.3μs (92.3% faster)
# ---------------- EDGE TEST CASES ----------------
def test_empty_string():
# Empty string should yield 0 sentences
codeflash_output = sentence_count("") # 5.53μs -> 5.59μs (1.02% slower)
def test_whitespace_only():
# String with only whitespace
codeflash_output = sentence_count(" ") # 5.11μs -> 5.48μs (6.73% slower)
def test_no_sentence_ending_punctuation():
# No periods, exclamation or question marks
codeflash_output = sentence_count("This is not split into sentences") # 35.0μs -> 7.85μs (346% faster)
def test_sentence_with_only_punctuation():
# String with only punctuation marks
codeflash_output = sentence_count("!!!...???") # 47.1μs -> 41.7μs (12.9% faster)
def test_sentence_with_newlines():
# Sentences split by newlines
text = "First sentence.\nSecond sentence.\n\nThird sentence."
codeflash_output = sentence_count(text) # 98.5μs -> 47.0μs (110% faster)
def test_sentence_with_multiple_spaces():
# Sentences separated by multiple spaces
text = "Sentence one. Sentence two. Sentence three."
codeflash_output = sentence_count(text) # 95.3μs -> 42.1μs (126% faster)
def test_sentence_with_unicode_punctuation():
# Sentences with unicode punctuation (em dash, ellipsis, etc.)
text = "Hello… How are you—good?"
codeflash_output = sentence_count(text) # 32.4μs -> 11.0μs (196% faster)
def test_sentence_with_non_ascii_characters():
# Sentences with non-ASCII (e.g., accented) characters
text = "C'est la vie. Voilà!"
codeflash_output = sentence_count(text) # 73.6μs -> 34.4μs (114% faster)
def test_sentence_with_numbers_and_periods():
# Numbers with periods should not split sentences
text = "Version 3.2 is out. Please update."
codeflash_output = sentence_count(text) # 66.2μs -> 29.1μs (128% faster)
def test_sentence_with_emoji():
# Sentences with emoji
text = "I am happy 😊. Are you?"
codeflash_output = sentence_count(text) # 73.7μs -> 33.8μs (118% faster)
def test_sentence_with_tabs_and_spaces():
# Sentences separated by tabs and spaces
text = "First sentence.\tSecond sentence. Third sentence."
codeflash_output = sentence_count(text) # 45.6μs -> 44.1μs (3.20% faster)
def test_sentence_with_min_length_zero():
# min_length=0 should count all sentences
text = "One. Two. Three."
codeflash_output = sentence_count(text, min_length=0) # 88.5μs -> 40.5μs (119% faster)
def test_sentence_with_min_length_equals_num_words():
# min_length equal to the number of words in a sentence
text = "This is five words."
codeflash_output = sentence_count(text, min_length=5) # 31.6μs -> 12.3μs (157% faster)
def test_sentence_with_min_length_greater_than_any_sentence():
# min_length greater than any sentence's word count
text = "Short. Tiny. Small."
codeflash_output = sentence_count(text, min_length=10) # 79.2μs -> 47.8μs (65.7% faster)
def test_sentence_with_trailing_and_leading_spaces():
# Sentences with leading/trailing spaces
text = " First sentence. Second sentence. "
codeflash_output = sentence_count(text) # 50.5μs -> 29.3μs (72.2% faster)
def test_sentence_with_only_newlines():
# Only newlines
text = "\n\n\n"
codeflash_output = sentence_count(text) # 5.11μs -> 5.40μs (5.31% slower)
def test_sentence_with_multiple_punctuation_marks():
# Sentences ending with multiple punctuation marks
text = "What?! Really?! Yes."
codeflash_output = sentence_count(text) # 99.1μs -> 49.6μs (99.9% faster)
def test_sentence_with_quoted_text():
# Sentences with quoted text
text = '"Hello there." She said. "How are you?"'
codeflash_output = sentence_count(text) # 97.5μs -> 58.4μs (66.9% faster)
def test_sentence_with_parentheses():
# Sentences with parentheses
text = "This is a sentence (with extra info). Another sentence."
codeflash_output = sentence_count(text) # 77.1μs -> 32.4μs (138% faster)
def test_sentence_with_semicolons_and_colons():
# Semicolons and colons should not split sentences
text = "First part; still same sentence: more info. Next sentence."
codeflash_output = sentence_count(text) # 72.1μs -> 28.5μs (153% faster)
def test_sentence_with_single_word():
# Single word, with and without punctuation
codeflash_output = sentence_count("Hello.") # 23.6μs -> 7.43μs (218% faster)
codeflash_output = sentence_count("Hello") # 3.91μs -> 5.14μs (24.0% slower)
def test_sentence_with_multiple_periods():
# Ellipsis should not split into multiple sentences
text = "Wait... What happened?"
codeflash_output = sentence_count(text) # 52.3μs -> 29.7μs (76.4% faster)
def test_sentence_with_uppercase_acronyms():
# Acronyms with periods should not split sentences
text = "I work at U.S.A. headquarters. It's nice."
codeflash_output = sentence_count(text) # 90.6μs -> 50.5μs (79.4% faster)
def test_sentence_with_decimal_numbers():
# Decimal numbers should not split sentences
text = "The value is 3.14. That's pi."
codeflash_output = sentence_count(text) # 75.2μs -> 35.4μs (112% faster)
def test_sentence_with_bullet_points():
# Bullet points without ending punctuation
text = "• First item\n• Second item\n• Third item"
codeflash_output = sentence_count(text) # 35.3μs -> 10.4μs (240% faster)
def test_sentence_with_dash_and_hyphen():
# Dashes and hyphens should not split sentences
text = "Well-known fact—it's true. Next sentence."
codeflash_output = sentence_count(text) # 55.1μs -> 35.1μs (56.9% faster)
# ---------------- LARGE SCALE TEST CASES ----------------
def test_large_text_many_sentences():
# Test with a large number of sentences
text = " ".join([f"Sentence number {i}." for i in range(1, 501)])
codeflash_output = sentence_count(text) # 11.5ms -> 4.39ms (162% faster)
def test_large_text_with_min_length():
# Large text, only some sentences meet min_length
text = "Short. " * 500 + "This is a sufficiently long sentence for counting. " * 200
# Only the long sentences (7 words) should be counted
codeflash_output = sentence_count(text, min_length=7) # 6.24ms -> 6.19ms (0.716% faster)
def test_large_text_no_sentences():
# Large text with no sentence-ending punctuation
text = " ".join(["word"] * 1000)
codeflash_output = sentence_count(text) # 1.12ms -> 27.9μs (3935% faster)
def test_large_text_all_sentences_filtered_by_min_length():
# All sentences too short for min_length
text = "A. B. C. D. " * 250
codeflash_output = sentence_count(text, min_length=5) # 7.01ms -> 6.93ms (1.12% faster)
def test_large_text_with_varied_sentence_lengths():
# Mix of short and long sentences
short = "Hi. " * 300
long = "This is a longer sentence for testing. " * 100
text = short + long
codeflash_output = sentence_count(text, min_length=6) # 3.39ms -> 3.33ms (1.73% faster)
def test_large_text_with_unicode_and_emoji():
# Large text with unicode and emoji in sentences
text = "😊 Hello world! " * 400 + "C'est la vie. Voilà! " * 100
codeflash_output = sentence_count(text) # 5.41ms -> 5.16ms (4.84% faster)
def test_large_text_with_newlines_and_tabs():
# Large text with newlines and tabs between sentences
text = "\n".join([f"Sentence {i}.\t" for i in range(1, 501)])
codeflash_output = sentence_count(text) # 11.0ms -> 4.52ms (144% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
```
</details>
To edit these changes `git checkout
codeflash/optimize-sentence_count-mcglwwcn` and push.
[](https://codeflash.ai)
---------
Signed-off-by: Saurabh Misra <misra.saurabh1@gmail.com>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>sentence_count by 59% (#4080)1 parent 57cadf8 commit 51425ddCopy full SHA for 51425dd
File tree
Expand file treeCollapse file tree
2 files changed
+12
-9
lines changedOpen diff view settings
Filter options
- unstructured/partition
Expand file treeCollapse file tree
2 files changed
+12
-9
lines changedOpen diff view settings
Collapse file
+1Lines changed: 1 addition & 0 deletions
- Display the source diff
- Display the rich diff
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
4 | 5 | | |
5 | 6 | | |
6 | 7 | | |
| |||
Collapse file
unstructured/partition/text_type.py
Copy file name to clipboardExpand all lines: unstructured/partition/text_type.py+11-9Lines changed: 11 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
219 | 219 | | |
220 | 220 | | |
221 | 221 | | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
231 | 233 | | |
232 | 234 | | |
233 | 235 | | |
| |||
0 commit comments