Skip to content

Default field size and decimal length when writing shapefiles #114

Open
@karimbahgat

Description

@karimbahgat

Due to recent changes since 1.2.10, the issue of field and value types have been raised as a concern by several users. Most recently, @klasko2 pointed out in #99 that saving a float value to an 'F' field will save it as an integer, because the default number of decimals is 0 when defining a new field. This begs the more general question for the next version of PyShp:

What should be the default field 'size' and 'decimal' for different field types?

I hope this thread can be used as a place for people to voice their concerns and share their experiences and expectations regarding shapefiles and dbf field types.

The Issue

Until now, field size (i.e. how many bytes) has been always set to 50, and decimal always to 0.
Instead, I think the case can be made that any numeric field should default to a decimal number. This leaves us with some open questions:

  1. ...what's a good default size number? Is 50 big enough to store most numbers that an average user would need and at the same time small enough to not waste filesize. For a negative decimal number, this could store a value as low as -100000000000000000000000000000000000000000000000.0, or as detailed as -0.000000000000000000000000000000000000000000000001 (provided the decimal arg is set accordingly)? That might actually seem excessively high for most users so perhaps it should be lowered to produce smaller shapefiles? What's the default in other software?
  2. ...what's a good default decimal number? Would 6 decimal places retain enough information for the average user not to feel they are losing information? This would mean floats being rounded to e.g. 0.123456. Perhaps this is too small, should it be instead 12 or 16? What's the default in other software?
  3. ...should size and decimal be the same for 'F' and 'N' fields? Float fields are decimals by definition, but Numeric fields can be both ints or floats. One might argue that both should default to decimal numbers, since defaulting to ints would result in lost information for unsuspecting users. Manually setting decimal=0 can be done if the user is certain they just want to save ints.

For the remaining field types I think the following would be non-controversial:

  • Type 'C': size=80, decimal irrelevant. Text fields are typically longer than numeric fields, and I believe that's the default QGIS text field size. This would save text values as long as abcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcde.
  • Type 'L': size=1, decimal irrelevant.
  • Type 'D': size=8, decimal irrelevant.

Any and all thoughts are appreciated!

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions