Spaces (that do not exist in the original PDF) appear in the output of extract_text()

I am trying to parse [this PDF](https://www.joinville.sc.gov.br/wp-content/uploads/2023/11/Pesquisa-de-Precos-Combustiveis-novembro-2023.pdf). However, I am getting on the output of extract_text() a bunch of spaces that are not in the original PDF.

See the screenshot - the original PDF on the left, the output of  for what I mean (e.g. "Av. Beir a Rio" should be "Av. Beira Rio", "Cen tro" should be "Centro"):

![image](https://github.com/py-pdf/pypdf/assets/553325/d818a8d5-bdb5-49ed-a780-74faafa62ed7)

If I copy/paste from Okular or other PDF reader to a text document, it is copied correctly, so I know the PDF file is not broken.

## Environment

I am using Python 3.12 in Fedora 39.

```bash
$ python -m platform
Linux-6.6.4-200.fc39.x86_64-x86_64-with-glibc2.38

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.1, crypt_provider=('pycryptodome', '3.19.0'), PIL=10.1.0
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
from pypdf import PdfReader
reader = PdfReader('Pesquisa-de-Precos-Combustiveis-novembro-2023.pdf')
text = reader.pages[0].extract_text()
```




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spaces (that do not exist in the original PDF) appear in the output of extract_text() #2336

Environment

Code + PDF

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spaces (that do not exist in the original PDF) appear in the output of extract_text() #2336

Description

Environment

Code + PDF

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions