bug: Why isn’t space preprocessing consistent between Longest Matching and Multi-Cut?

### Description

Hi,

I noticed that Multi-Cut preprocesses spaces (e.g., grouping consecutive spaces into one token), while Longest Matching does not. Why not preprocess spaces the same way for both tokenizers to ensure consistency?

Thanks for your clarification!

### Expected results

A clear explanation

### Current results

Multi-Cut preprocesses spaces (e.g., grouping consecutive spaces into one token), while Longest Matching does not.

### Steps to reproduce

-

### PyThaiNLP version

5.0.5

### Python version

3.9.6

### Operating system and version

Google Colab Latest

### More info

_No response_

### Possible solution

_No response_

### Files

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bug: Why isn’t space preprocessing consistent between Longest Matching and Multi-Cut? #1061

Description

Expected results

Current results

Steps to reproduce

PyThaiNLP version

Python version

Operating system and version

More info

Possible solution

Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Why isn’t space preprocessing consistent between Longest Matching and Multi-Cut? #1061

Description

Description

Expected results

Current results

Steps to reproduce

PyThaiNLP version

Python version

Operating system and version

More info

Possible solution

Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions