Realistic input coordinate precision #2949

smallsaucepan · 2025-11-12T04:58:31Z

smallsaucepan
Nov 12, 2025
Maintainer

There have been a few bugs recently that seem to relate to extremely small distances being represented as geojson. One recent case involved a line segment 0.6 nanometers long. Which seems to leave the underlying Javascript math functions no room to move, and degenerate elements creep in.

I'm wondering if we should start pushing back a little on error cases that go beyond six or seven decimals of precision (approx 10cm) at least in the first instance. Start taking the "geo" in geojson at face value. An expectation has possibly developed that Turf should be fast and robust and handle an arbitrary number of decimal places, and that's probably not something we can deliver on with JS.

Tagging a few people recently involved with issues along those lines. Please weigh in with your thoughts 🙏

@SimplyPancake @HarelM @bratter @mfedderly @JamesLMilner

HarelM · 2025-11-12T05:55:53Z

HarelM
Nov 12, 2025

From my point of view, since I'm dealing with real world data (OSM) I'm totally fine with some "rounding".
The important part from my point of view is that this library will produce a good enough results and won't throw if the input is sightly problematic (duplicate like points due to precision limit etc).
While I am using real data, when converting it to different formats is when I get these odd precision issues, but I'm not sure the conversion tool are to blame here.
Another approach would be to split this into a method that will clean the data and then only use other methods and not guarantee that the methods will work without the cleaning first, but I don't think that's a great DX...

0 replies

smallsaucepan · 2025-11-12T12:13:04Z

smallsaucepan
Nov 12, 2025
Maintainer Author

Maybe there is a place for turf-lint? Though whether it's an advisory debugging tool only, or a runtime input data fixer, I'm not sure. The spec is probably vague enough in a few areas that it might be tough to nail down the latter.

There is @placemarkio/check-geojson though it specifically doesn't look at precision, or ring winding which is a pretty common issue.

1 reply

mfedderly Nov 12, 2025
Maintainer

For ring winding we do have @turf/rewind at least as a reference

mfedderly · 2025-11-12T12:38:28Z

mfedderly
Nov 12, 2025
Maintainer

The RFC suggests 6 digits of precision, which conveniently also fits in a float32 which doesn't help us in javascript runtimes but would be meaningful for people interoperating with another language.

2 replies

bratter Nov 13, 2025

I might be missing something but my read of that comment in the spec was that it was 6 decimal places in decimal degrees (which aligns with the 10cm they quoted). Wouldn't this mean that its 9 significant figures total give the lng domain of [-180, 180], and therefore not fit into a float 32?

mfedderly Nov 19, 2025
Maintainer

You may be right, I'm not an expert here but saw some references to this being the case when I was looking at it before. In any case single vs double precision floats is a bit of a weak argument anyhow. I think we can still choose some arbitrary number of decimal places (probably motivated by distance at a given precision) and push back on anyone having issues on scales smaller than we claim to support.

I'm pretty skeptical that any of our math is going to be close enough to correct once the scale gets small enough, given the errors coming from the underlying WGS84 constants, and positions coming from a GPS. There's a reason that people use local datums, adjust for continental drift over time, and hire proper surveyors.

bratter · 2025-11-13T20:19:37Z

bratter
Nov 13, 2025

Please correct me if I'm wrong, but I think it is fair to assume that the main use cases for a library like turf are going to be for visualizing real-world data and running lightweight analytics. I could very easily be wrong about the level of analytics that people use mind you.

I'm also working on the assumption that real world data is messy, and while its 100% reasonable to assume spec compliance, assuming anything beyond that (like no repeat points or certain spacing between points) is not reasonable.

So with those assumptions, my 2c is:

From a precision perspective, would consider a non-issue beyond that level suggested in the spec, so (a) for testing and therefore what the library promises, and (b) pushing back per @smallsaucepan's comment, seeking precision beyond the 9 significant figures (or maybe a slight buffer) should not be a goal of the library. This is not a scientific computing package :-).
Robustness to real world data however, as @HarelM said, I think is important. If there is nothing illegal about inputs, then the algorithms should be able to accept it and not fall over, even if the points are beyond the precision limit. I think it is totally fine for algorithms to assume that coords are coincident if within a precision limit if it makes it easier, but they should be able to take the input and process it whatever the choose to do in that processing. I don't feel confident suggesting whether functions should clean / check themselves or just suggest that cleansing / rewinding / quantiazation should be done where appropriate in the docs.
Degenerate cases, where possible, could have sane defaults rather than throwing. But when throwing is the right answer, try to do it close to the root cause with decent messages.
Speed is useful (certainly relative to precision) but a lower priority than cleanliness and ease of use.

As a testing aside, there is a mixture of exact precision tests (which definitely shouldn't exist) and approximate precision tests. To @mfedeerly's point about precision, I wonder if it would be a good idea to create a small set of helper assertion teset wrappers to use as approximately equal to an agreed number of significant figures then roll them in over time.

6 replies

bratter Nov 14, 2025

Yeah this makes total sense to me at least (primarily as a user of the package). I assume by your point 2 above you are referring to when there is an issue report?

FWIW I totally agree on the polyclip-ts topic and I am pondering your other discussion topic from earlier in the year. There shouldn't be any reason to need BigNumber for this sort of work other than the precision issues and I can't imagine anything else causing the slowdown.

smallsaucepan Nov 14, 2025
Maintainer Author

Ah yeah, point 2 is talking about issue reports.

On the legal / reasonable topic #1946 is an old linebuffer bug, part of which is passing in a line with length 0. Legal geojson per the spec, but not possible for us to do anything with. It's not like we can even offset a point either side because what would be the correct perpendicular?

Would this be an occasion where we introduce and start throwing a "degenerate geometry" exception?

HarelM Nov 14, 2025

I would always prefer an inaccurate response over an exception.
It's not always possible, if you request an offset to a line with only two points at the same location then it's not defined, but it only means you can return anything and it will still be valid in theory.
We are handling the same issues in line offset here for example:
maplibre/maplibre-gl-js#6625
And the way we choose to solve it is to clean the geometry before doing the calculations to avoid NaN or invalid input, but still produce valid output, even if not the best or the most accurate.
This is at least my approach, something valid but not 100% accurate is better than nothing/exception.

smallsaucepan Nov 15, 2025
Maintainer Author

I would always prefer an inaccurate response over an exception.

My thinking was that with no changes to client code, at least we're throwing something deliberately and providing more useful context for troubleshooting. As opposed to what happens now where the code instead chokes six levels deeper with a vague JS arithmetic exception.

Totally understand not wanting to ask people to wrap try / catches around everything though. I like the idea of internally pre-sanitising like you suggest (collapsing consecutive coincident points for example) though that has its own quirks. What if two points differ only an elevation or properties level? Deciding which to use or keep track of elements to add back in later would be a nightmare.

Then even with sanitisation it's possible we're passed something we can't rationalise e.g. #2478. We can pick a sensible default (North) and keep on going, but then there's no feedback to the user that their result might not be meaningful.

For each package, would taking a "garbage in -> documented default out + a console.warn()" approach be an option?

smallsaucepan Nov 16, 2025
Maintainer Author

Also reminded we throw when asked to generate an antipodean great circle: #2884

Just an observation, not implying it's any sort of binding precedent.

bratter · 2025-12-19T05:06:11Z

bratter
Dec 19, 2025

Apologies for the delay, got busy but didn't want to drop the conversation. And apologies also for the long comment.

I've been ruminating on this topic and thought that posing some opinionated takes in order to push the envelope and prompt discussion might help. I'm not a GIS super-expert, so this is more the perspective of an enthusiastic amateur. Some of the thoughts below are not necessarily the best for my personal turf use, they more come from where turf might fit into the ecosystem.

So my opinionated stance is that turf is:

Browser first, JS-based and therefore (even in node) computationally constrained. So shouldn't be used for performance.
Visualization oriented. Primary use cases involve ingest, analyze, display, with end outputs fed to mapping apps for display.
Used for lightweight analysis, not survey-grade (as mfedderly pointed out)
Exposed to end user data and all the challenges that entails.
Earth-oriented in that it must deal with arbitrary points on the Earth's surface at reasonable scale. It has to work at scales from large great circle arcs to small local coordinates up to a point.
Developer-facing, but not GIS-expert developer facing. Consumers can be expected to code decently well, but shouldn't be assumed to know GIS well.

With this background, it points to a solution of:

Restricting the promise of precision across the library (to the original point of the discussion :-)) to give reasonable results in the maximum number of cases given algorithms
Minimizing the number of throws for developer experience given use cases
Using sane defaults wherever possible again to provide a smooth experience
Proactive but minimal data cleaning to preserve performance but not cause preventable error cases

The obvious trade off is precision. But the big opinion here is that if one wants high-precision, use GEOS and/or research the use case, don't expect a JS library, albeit an awesome one, to give you that.

So what might some implications of this philosophy be?

Promise Minimum Precision

Turf "promises" a minimum level of precision wherever possible and explicitly mentions that precision beyond this is not guaranteed
This precision level should align with 6dp in degrees (9 sig fig decimal lng)
Test to this level of precision only
Allows the gentle rejection of issues that stem from anything requiring higher precision assuming the algorithm is functioning correctly
Nothing stopping higher precision, its just not promised or tested
Can build test helpers that enforce it, and even add a lint for the right assertions in the tests

Apply Cleaning

Within the precision limit, any algorithm reserves the right to truncate, snap or eliminate as required
Algorithms can perform cheap, basic cleaning themselves - a little inefficient but done for usability
Shouldn't mutate input geometries, cleaning has to be done on a copy

Avoid Throws - Assume, Null, Warn

Where not captured above, try to not through - make decent assumptions or return null geometries
Nulls aren't great, but does avoid try catch and gives more graceful handling options (mapbox ignores nulls, enables ? usage, etc.)
Where possible and not woefully wrong, make sensible assumptions, such as null, assume shortest distance, or closest to equator - this will create rendering or correctness issues on occasion, but the question is whether it is worth the trade off
Add warnings to outputs in the console and on _warnings key in properties

Conclusion

There are definitely holes in this - something that springs to mind is truncation → co-incidence → failed topology that may not have failed without the truncation. I'm sure there are other gaps in my thinking too.

Also in some cases (clipping...) it might be a decent philosophical change and therefore not be viable.

So there's some opinions that hopefully let folk agree or disagree.

0 replies

mfedderly · 2025-12-23T14:36:08Z

mfedderly
Dec 23, 2025
Maintainer

I really appreciate the detailed writeup here! For what its worth, I don't consider myself a deep GIS expert. I originally started contributing to work on getting the repo publishing reliably again.

I largely agree with your writeup, but want to push back on a few points.

Performance: Although we won't aim for performance as fast as optimized native code, we still care about performance. We can still try to use the fastest algorithms and write performant code along the way. @turf/union changed its underlying implementation for correctness fixes, which introduced a large performance regression. I'm working on a fork of polyclip-ts that swaps bignumber.js back to Javascript float64's in order to fix the regression. You also have access to many cores. Even in a browser you can use worker threads to do more expensive computations without blocking the user.

Precision: We are in Javascript and the language limits us by only having native float64's. Luckily, the precision of this type is much higher than any physical scale that we intend to deal with. Even 8 digits after the decimal gets us to ~1mm precision. Several fewer digits is more precise than you'd be able to see on a Leaflet visualization of the data at zoom 18. I do think that we need to add official documentation about the precision limitations of float64's, the realistic scale of the errors, and a policy around rejecting issues submitted that require precision past where we support. I'm on board with publicly stating that we officially state we support 6 decimal places, and perhaps we pair that with truncating test fixture output at 8 decimal places so we have some breathing room. We may also want to specifically call out packages that are sensitive to precision issues in their own README's (and therefore the public docs pages).

Data cleaning: I strongly perfer that individual methods do not clean their input data. It can be very expensive to do a clone and truncate operation on large inputs, and the cleaning won't even neccessarily be required depending on the input data itself. If someone does have data that requires truncation before operating on it, I think it is a reasonable tradeoff to make them manually do so in order to preserve performance for everyone else. To that end, we may want to update the TypeScript definitions to take in deeply readonly arguments, and return mutable results (assuming we're generating entirely new objects).

Throwing vs warning: If something is going wrong I'd much rather have that loudly declared instead of quietly patching over an issue. We already try to throw errors at runtime when input arguments are not valid, which is preferrable to getting a less obvious error when the operation itself fails. I think someone is very unlikely to find a properties._warning that got added to their return value. Putting messages in the console opens up issues around how you'd capture the messages within a larger app (do you have to mock console.warn before calling it?). Returning null geometries, or empty collections, etc are possibly reasonable solutions, but I'd probably have to evaluate it on a case-by-case basis.

Happy to hear thoughts from others on this as well before we start taking action on the precision thing.

I do think we're overall limited by how many contributors we have and how much time those contributors can spend on the project. The polyclip-ts regression is nearly a year old, and I'm just now getting around to working on it. We're currently in a relatively active point of development, but there's really only 2 people with maintainer status at the moment. I'm reluctant to just force merge my own PRs without an approval from another person, so we kind of need both of us to have time to make any changes.

2 replies

bratter Dec 24, 2025

Thanks for reading and thanks for the work on turf.

A few follow up comments:

Performance: Absolutely agree that best possible performance is good, and given some of the other constraints performance over maximizing precision makes sense too. 100% agree that having bignumber in polyclip is not the right way to go if there is another way to solve that particular problem.

Precision: Yes agree. Apologies if I was not clear, but I was not advocating abandoning more precision where it is available, but rather choosing a reasonable limit whereby failures relative to that limit are counted as bugs, but failures at higher precision than the documented limit are not considered bugs. Goal is to enable (a) maintainer sanity :-), (b) more efficient algorithms (c) reduction in edge cases (d) gets well out of the range where f64 issues are likely to cause problems. For this I think your 10cm / 6 d.p. makes sense. My comment of setting that up with a test helper then just provides a convenient way to repro and could be pushed into the code base over time.

Data cleaning: You make a good point. The challenge may come in a pipeline where you have user provided data -> degenerate calculation -> error. Maybe as you suggest the only answer is to advise users to clean before processing.

Throwing vs. warning: Totally fair. This was definitely the proactive point. Shame about the error handling semantics in js... Case-by-case treatment of either sane defaults or null geometries seems totally fair to me, but I do think that if this is the way you choose to go, that PR reviews should ensure that (a) the number of throw cases are minimized, (b) errors are localized as much as possible.

Cheers!

bratter Dec 25, 2025

Oh, and for precision limits it could also make sense to recommend coordinate transforms if users do need/want higher-than-supported precision on certain operations as most (all?) of the planar algorithms work fine when mapping out of WGS84 to an arbitrary local planar coordinate system before doing the computation. "6dp decimal degrees" only applies when using decimal degrees.

I guess this also yet another argument for not transforming in the methods.

Realistic input coordinate precision #2949

Uh oh!

smallsaucepan Nov 12, 2025 Maintainer

Replies: 6 comments · 11 replies

Uh oh!

Uh oh!

smallsaucepan Nov 12, 2025 Maintainer Author

Uh oh!

mfedderly Nov 12, 2025 Maintainer

Uh oh!

mfedderly Nov 12, 2025 Maintainer

Uh oh!

Uh oh!

mfedderly Nov 19, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

smallsaucepan Nov 14, 2025 Maintainer Author

Uh oh!

Uh oh!

smallsaucepan Nov 15, 2025 Maintainer Author

Uh oh!

smallsaucepan Nov 16, 2025 Maintainer Author

Uh oh!

Promise Minimum Precision

Apply Cleaning

Avoid Throws - Assume, Null, Warn

Conclusion

Uh oh!

mfedderly Dec 23, 2025 Maintainer

Uh oh!

Uh oh!

smallsaucepan
Nov 12, 2025
Maintainer

Replies: 6 comments 11 replies

smallsaucepan
Nov 12, 2025
Maintainer Author

mfedderly Nov 12, 2025
Maintainer

mfedderly
Nov 12, 2025
Maintainer

mfedderly Nov 19, 2025
Maintainer

smallsaucepan Nov 14, 2025
Maintainer Author

smallsaucepan Nov 15, 2025
Maintainer Author

smallsaucepan Nov 16, 2025
Maintainer Author

mfedderly
Dec 23, 2025
Maintainer