Description
This issues tracks the status of native Windows builds for the toolkit.
Windows is not currently a supported platform. Windows users are encouraged to run Linux builds of the tools using either Windows Subsystem for Linux (WSL) or Docker for Windows.
Acknowledging the above caveat, it is useful to track known issues on Windows and resolve them over time. At present, the toolkit is built on Windows as part of CI tests, and large parts of the test suite run successfully. Cases where the test suite does not run successfully are primarily due to limitations in the test suite, not the tools. At this time there are no known bugs in the tools, but there are gaps in the runnable test suite and limited real-world testing.
The two main issues for having Windows support are having a complete CI test suite that runs on Windows, and resolving inconsistencies in Windows newline handling. Most other known issues are minor, though a couple complicate having a test suite shared across Windows and Unix platforms. With PRs #314 and #320 newline consistency issues are largely addressed.
Windows CI Test suite status
A Windows CI test suite was setup using GitHub Actions. See PRs #313 and #315. Current status:
- Build (compile/link) w/
dub
, DMD 64-bit - Done (passses) - Build (compile/link) w/
dub
, LDC 64-bit - Done (passses) - Build (compile/link) w/
make
, DMD 64-bit - Done (passses) - Build (compile/link) w/
make
, LDC 64-bit - Done (passses) - Built-in unit tests w/
make unittest
, DMD 64-bit - Passes, with minor caveats. See "Disabled unit tests" below. - Built-in unit tests w/
make unittest
, LDC 64-bit - Fails intsv-sample
due to differences in floating point format output. As an example, a floating point value formatted as0.75346697377181959
on Linux/MacOs is formatted as0.75346697377181970
on Windows. This looks like a round-off difference in a mathematical calculation, nothing more. - Command line unit tests w/
make test
, both DMD and LDC. These fail from the beginning because of newline differences between the golden set files from the repo and the results produced in the tests. In particular, the "error" and "help" tests that generate messages to end users are getting written using Windows newlines, but the "gold" files with expected results were generated on Unix. The main complication is that many expected results should be newline equivalent, so different strategies are needed for each. - Input files with Windows newlines - The test suite needs to include files with Windows newlines, for every tool.
Notes:
- Disabled unit tests - A pair of rounding tests for
common.tsv-utils.numerics.formatNumber
function fail on Windows and were disabled. The cases areformatNumber(0.6, 0)
andformatNumber(-0.6, 0)
. These should return"1"
and"-1"
respectively, but on Windows return"0"
and"-0"
. This is due to incorrect results from the calls being made tostd.format.format
.std.format.format
is likely callingsnprintf
from platform's C library. The disabled unit tests are in common.tsv-utils.numerics.d starting at lines 220 and 334.
Update (03/14/21): This has been addressed in DMD 2.096, PR #7757 by moving the work into Phobos. Unit tests have been conditioned on version.
Windows Newline handling
On Unix and MacOs tsv-utils
requires and generates Unix newlines. However, a newline handling policy has never been identified for running on a Windows platform. As a result, tools are inconsistent in the manner they handled Windows newlines when running on Windows. Some possible newline handling policies:
- Read and write Unix newlines only, on all platforms.
- Read and write using platform preferred linefeeds.
- Read either Windows or Unix linefeeds; Write Unix linefeeds
- Full customization via command line arguments.
Option 1 is simplest policy to support and what is being done initially. It is the easiest to enforce in the current code, and easiest to support in the current test suite. And, it is a reasonable choice in many environments, especially in circumstances where a mix of Unix and Windows platforms are in use. If data files are being shared, Unix newlines will normally be preferred. Option 1 is also consistent with other choices made in the toolkit. In particular, supporting only one file format (UTF-8 TSV), and delegating conversion to that format to other tools (e.g. dos2unix
, csv2tsv
).
Option 1 is largely in place with PRs #314 and #320, but the test suite still needs work to test it fully. Tasks:
- Detection of Windows newlines when on a Window platform, same as it is being done on Unix. (PR Change Windows newline detection to occur on all platforms #320)
- Have files with Windows newlines (or platform newlines) in the test suite so these cases are tested.
Option 2 might be the preferred option in many traditional applications, but it is not clear if this is a good choice for data mining tools. In particular, it is very common to share data files between people, platforms, and tools. In such environments Unix newlines will be preferred. Switching to Windows newlines on Windows machines may be more an annoyance than a benefit.
Option 3, reading both newline forms, but writing Unix newlines, has some nice properties. And, it might be easier done than expected, as most tools use bufferedByLine. In particular, bufferedByLine.front handles newlines. However, a number of tools have their own reader functionality, so it would still be necessary to have a test suite for each tool. And, it is not really necessary given the availability of tools like dos2unix
. Still, this option may be worth consideration.
Option 4, full customization of newline handling, would provide the most complete solution. However, it has a material downsides. It creates additional user complexity in the form of additional command line arguments. It also creates complexity in the tools and test suite. At present these downsides seem to outweigh the benefits.
Other issues
- Floating point number formatting - This is described under "Disabled unit tests" above. This affects primarily
tsv-pretty
when printing floats in formatted forms. Appears to affect a relatively small number of forms, though it is undesirable when it occurs.
Update (03/14/21): This has been addressed in DMD 2.096, PR #7757 by moving the work into Phobos. This should eliminate this problem, especially when the LDC release with this version is available.