Status of Windows build

This issues tracks the status of native Windows builds for the toolkit.

Windows is not currently a supported platform. Windows users are encouraged to run Linux builds of the tools using either [Windows Subsystem for Linux (WSL)](https://docs.microsoft.com/en-us/windows/wsl/) or [Docker for Windows](https://docs.docker.com/docker-for-windows/).

Acknowledging the above caveat, it is useful to track known issues on Windows and resolve them over time. At present, the toolkit is built on Windows as part of CI tests, and large parts of the test suite run successfully. Cases where the test suite does not run successfully are primarily due to limitations in the test suite, not the tools. At this time there are no known bugs in the tools, but there are gaps in the runnable test suite and limited real-world testing.

The two main issues for having Windows support are having a complete CI test suite that runs on Windows, and resolving inconsistencies in Windows newline handling. Most other known issues are minor, though a couple complicate having a test suite shared across Windows and Unix platforms. With PRs #314 and #320 newline consistency issues are largely addressed.

**Windows CI Test suite status**

A Windows CI test suite was setup using GitHub Actions. See PRs #313 and #315. Current status:

- [x] Build (compile/link) w/ `dub`, DMD 64-bit - Done (passses)
- [x] Build (compile/link) w/ `dub`, LDC  64-bit - Done (passses)
- [x] Build (compile/link) w/ `make`, DMD  64-bit - Done (passses)
- [x] Build (compile/link) w/ `make`, LDC  64-bit - Done (passses)
- [x] Built-in unit tests w/ `make unittest`, DMD  64-bit - Passes, with minor caveats. See "Disabled unit tests" below.
- [ ] Built-in unit tests w/ `make unittest`, LDC  64-bit - **Fails** in `tsv-sample` due to differences in floating point format output. As an example, a floating point value formatted as `0.75346697377181959` on Linux/MacOs is formatted as `0.75346697377181970` on Windows. This looks like a round-off difference in a mathematical calculation, nothing more.
- [ ] Command line unit tests w/ `make test`, both DMD and LDC. These fail from the beginning because of newline differences between the golden set files from the repo and the results produced in the tests. In particular, the "error" and "help" tests that generate messages to end users are getting written using Windows newlines, but the "gold" files with expected results were generated on Unix. The main complication is that many expected results should be newline equivalent, so different strategies are needed for each.
- [ ] Input files with Windows newlines - The test suite needs to include files with Windows newlines, for every tool.

Notes:
* Disabled unit tests - A pair of rounding tests for `common.tsv-utils.numerics.formatNumber` function fail on Windows and were disabled. The cases are `formatNumber(0.6, 0)` and `formatNumber(-0.6, 0)`. These should return `"1"` and `"-1"` respectively, but on Windows return `"0"` and `"-0"`. This is due to incorrect results from the calls being made to `std.format.format`. `std.format.format` is likely calling `snprintf` from platform's C library. The disabled unit tests are in common.tsv-utils.numerics.d starting at lines [220](https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/numerics.d#L220) and [334](https://github.com/eBay/tsv-utils/blob/master/common/src/tsv_utils/common/numerics.d#L334). <br>**Update (03/14/21)**: This has been addressed in DMD 2.096, [PR #7757](https://github.com/dlang/phobos/pull/7757) by moving the work into Phobos. Unit tests have been conditioned on version.

**Windows Newline handling**

On Unix and MacOs `tsv-utils` requires and generates Unix newlines. However, a newline handling policy has never been identified for running on a Windows platform. As a result, tools are inconsistent in the manner they handled Windows newlines when running on Windows. Some possible newline handling policies:

1. Read and write Unix newlines only, on all platforms.
2. Read and write using platform preferred linefeeds.
3. Read either Windows or Unix linefeeds; Write Unix linefeeds
4. Full customization via command line arguments.

Option 1 is simplest policy to support and what is being done initially. It is the easiest to enforce in the current code, and easiest to support in the current test suite. And, it is a reasonable choice in many environments, especially in circumstances where a mix of Unix and Windows platforms are in use. If data files are being shared, Unix newlines will normally be preferred. Option 1 is also consistent with other choices made in the toolkit. In particular, supporting only one file format (UTF-8 TSV), and delegating conversion to that format to other tools (e.g. `dos2unix`, `csv2tsv`).

Option 1 is largely in place with PRs #314 and #320, but the test suite still needs work to test it fully. Tasks:
- [x] Detection of Windows newlines when on a Window platform, same as it is being done on Unix. (PR #320)
- [ ] Have files with Windows newlines (or platform newlines) in the test suite so these cases are tested.

Option 2 might be the preferred option in many traditional applications, but it is not clear if this is a good choice for data mining tools. In particular, it is very common to share data files between people, platforms, and tools. In such environments Unix newlines will be preferred. Switching to Windows newlines on Windows machines may be more an annoyance than a benefit.

Option 3, reading both newline forms, but writing Unix newlines, has some nice properties. And, it might be easier done than expected, as most tools use bufferedByLine. In particular, bufferedByLine.front handles newlines. However, a number of tools have their own reader functionality, so it would still be necessary to have a test suite for each tool. And, it is not really necessary given the availability of tools like `dos2unix`. Still, this option may be worth consideration.

Option 4, full customization of newline handling, would provide the most complete solution. However, it has a material downsides. It creates additional user complexity in the form of additional command line arguments. It also creates complexity in the tools and test suite. At present these downsides seem to outweigh the benefits.

**Other issues**
* Floating point number formatting - This is described under "Disabled unit tests" above. This affects primarily `tsv-pretty` when printing floats in formatted forms. Appears to affect a relatively small number of forms, though it is undesirable when it occurs.<br>**Update (03/14/21)**: This has been addressed in DMD 2.096, [PR #7757](https://github.com/dlang/phobos/pull/7757) by moving the work into Phobos. This should eliminate this problem, especially when the LDC release with this version is available.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status of Windows build #317

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Status of Windows build #317

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions