Releases: tidyverse/haven
haven 2.5.4
haven 2.5.3
- Fix for upcoming R-devel change.
haven 2.5.2
-
Updated to ReadStat 1.1.9.
-
The experimental
write_sas()
function has been deprecated (#224). The
sas7bdat file format is complex and undocumented, and as such writing SAS
files is not officially supported by ReadStat.write_xpt()
should be used
instead - it produces files in the SAS transport format, which has
limitations but will be reliably read by SAS. -
write_*()
functions gain a newadjust_tz
argument to allow more control
over time zone conversion for date-time variables (#702). Thanks to @jmobrien
for the detailed issue and feedback.Stata, SPSS and SAS do not have a concept of time zone. Since haven 2.4.0
date-time values in non-UTC time zones are implicitly converted when writing
to ensure the time displayed in Stata/SPSS/SAS will match the time displayed
to the user in R (see #555). This is the behaviour whenadjust_tz = TRUE
(the default). Although this is in line with general user expectations it can
cause issues when the time zone is important, for e.g. when looking at
differences between time points, since the underlying numeric data is changed
to preserve the displayed time. Useadjust_tz = FALSE
to write the time as
the corresponding UTC value, which will appear different to the user but
preserves the underlying numeric data. -
write_*()
functions previously returned the data frame with minor
alterations made to date-time variables. These functions now invisibly return
the original input data frame unchanged (@jmobrien, #702). -
Fix bug in string variable width calculation that treated
NA
values as width
2.NA
values are now treated as blanks for width calculations (#699).
haven 2.5.1
-
All
labelled()
vectors now have left-aligned column headers when printing
in tibbles for better alignment with labels (#676). -
write_*()
now accept functions as well as strings in the
.name_repair
argument in line with the documentation. Previously they only
supported string values (#684). -
write_sav()
variable name validation no longer treats all non-ASCII
characters as invalid (#689).
haven 2.5.0
New author
- @gorcha is now a haven author in recognition of his significant and sustained
contributions.
File writing improvements
-
All
write_
functions can now write custom variable widths by setting the
width
attribute (#650). -
When writing files, the minimum width for character variables is now 1. This
fixes issues with statistical software reading blank character variables with
width 0 (#650). -
write_dta()
now uses strL when strings are too long to be stored in an str#
variable (#437). strL is used when strings are longer than 2045 characters by
default, which matches Stata's behaviour, but this can be reduced with the
strl_threshold
argument. -
write_xpt()
can now write dataset labels with thelabel
argument, which
defaults to thelabel
attribute of the input data frame, if present (#562). -
write_sav()
now checks for case-insensitive duplicate variable names
(@juansebastianl, #641) and verifies that variable names are valid SPSS
variables. -
The
compress
argument forwrite_sav()
now supports all 3 SPSS compression
modes specified as a character string - "byte", "none" and "zsav" (#614).
TRUE
andFALSE
can be used for backwards compatibility, and correspond to
the "zsav" and "none" options respectively. -
write_sav()
successfully writes user missing values and ranges for
labelled()
integer vectors (#596). -
POSIXct and POSIXlt values with no time component (e.g. "2010-01-01") were
being converted toNA
when attempting to convert the output timezone to UTC.
These now output successfully (#634). -
Fix bug in output timezone conversion that was causing variable labels and
other variable attributes to disappear (#624).
Other improvements and fixes
-
Updated to ReadStat 1.1.8 RC.
-
labelled()
vectors now throw a warning when combining two vectors with
conflicting labels (#667). -
zap_labels()
gains auser_na
argument to control whether user-defined
missing values are converted toNA
or left as is (#638). -
vctrs casting and coercion generics now do less work when working with two
identicallabelled()
vectors. This significantly improves performance when
working withlabelled()
vectors in grouped data frames (#658). -
Errors and warnings now use
cli_abort()
andcli_warning()
(#661).
Dependency changes
-
R 3.4 is now the minimum supported version, in line with tidyverse policy.
-
cli >= 3.0.0 has been added to Imports to support new error messaging.
-
lifecycle has been added to Imports, and is now used to manage deprecations.
haven 2.4.3
- Fix build failure on Solaris.
haven 2.4.2
-
Updated to ReadStat 1.1.7 RC (#620).
-
read_dta()
no longer crashes if it sees StrL variables with missing values
(@gorcha, #594, #600, #608).
urlchecker::url_check() -
write_dta()
now correctly handles "labelled"-class numeric (double) variables
that don't have value labels (@jmobrien, #606, #609). -
write_dta()
now allows variable names up to 32 characters (@sbae, #605). -
Can now correctly combine
labelled_spss()
with identical labels
(@gorcha, #599).
haven 2.4.1
- Fix buglet when combining
labelled()
with identical labels.
haven 2.4.0
New features
-
labelled_spss()
gains full vctrs support thanks to the hard work of @gorcha
(#527, #534, #538, #557). This means that they should now work seamlessly
in dplyr 1.0.0, tidyr 1.0.0 and other packages that use vctrs. -
labelled()
vectors are more permissive when concatenating; output labels
will be a combination of the left-hand and the right-hand side, preferring
values assigned to the left-hand side (#543). -
Date-times are no longer forced to UTC, but instead converted to the
equivalent UTC (#555). This should ensure that you see the same date-time
in R and in Stata/SPSS/SAS.
Minor improvements and bug fixes
-
Updated to ReadStat 1.1.5. Most importantly this includes support for
SAS binary compression. -
as_factor(levels = "values")
preserves values of unlabelled elements (#570). -
labelled_spss()
is a little stricter: it preventsna_range
andna_value
from containing missing values, and ensures thatna_range
is in the correct
order (#574). -
read_spss()
now reads NA values and ranges of character variables (#409). -
write_dta()
now correctly writes tagged NAs (including tagged NAs in
labels) (#583) and once again validates length of variables names (#485). -
write_*()
now validate file and variable metadata with ReadStat. This
should prevent many invalid files from being written (#408). Additionally,
validation failures now provide more details about the source of the problem
(e.g. the column name of the problem) (#463). -
write_sav(compress = FALSE)
now uses SPSS bytecode compression instead of
the rarely-used uncompressed mode.compress = TRUE
continues to use the
newer (and not universally supported, but more compact) zlib format
(@oliverbock, #544).