Teach Kaun's text reader about encodings

## Why it matters
Training logs often contain accented characters or non-English text. Right now `Kaun.Dataset.from_text_file` ignores its `encoding` argument and just returns raw bytes. As soon as you point it at UTF-8 or Latin-1 files, you risk broken strings or exceptions, which makes the monitoring dashboard unusable on real datasets.

## How to see the gap
Skim `kaun/lib/kaun/dataset/dataset.ml`, around `from_text_file`. The function stores `encoding` in `_` and never decodes the file. If you create a small UTF-8 file with emoji and iterate over the dataset, the text comes back as mangled characters.

## Your task
- Honor the `encoding` parameter in `from_text_file` (and the helpers that call it) by decoding each chunk before splitting on newlines.
- Add tests in `kaun/test/test_dataset.ml` that cover UTF-8 and Latin-1 snippets so we know the decoding works.
- Make sure the default behaviour stays the same when callers do not pass `~encoding`.

## Tips
- The `Uutf` library is already available through Raven; it can decode incrementally from a Bigarray-backed string.
- Keep the chunked reading logic intact—just convert the bytes to OCaml strings with the right encoding as they arrive.
- Use `Filename.temp_file` (already in the test helpers) to build short fixtures that contain characters outside plain ASCII (for example, describe an emoji with its code point).

## Done when
- Passing a non-default `~encoding` produces correctly decoded strings.
- The dataset tests cover at least one UTF-8 example and one Latin-1 example.
- `dune runtest kaun` passes after your changes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Teach Kaun's text reader about encodings #108

Why it matters

How to see the gap

Your task

Tips

Done when

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Teach Kaun's text reader about encodings #108

Description

Why it matters

How to see the gap

Your task

Tips

Done when

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions