Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tibble pkg #46

Merged
merged 90 commits into from
Jul 18, 2024
Merged

Tibble pkg #46

merged 90 commits into from
Jul 18, 2024

Conversation

alexkowa
Copy link
Member

No description provided.

{pillar} and {vctrs} are the backbone for
customizing tibbles. They are dependencies
of the {tibble} package and therefore
"free" once {tibble} is used as a dependency
package of {STATcubeR}
try this class only with sc_table_saved_list()
for now
make sure the objects of class
<sc_table_uri> are compatible with
sc_table_saved()
- don't import {tibble} since currently,
  only {vctrs} and {pillar} is used
- export as.character() for sc_schema_uri
- re-roxygenize
this is now handled as in sc_table(), od_table()
and so on
if this package is roxygenized insie of the
STAT firewall, the documentation links generated
by sc_browse*() will point to the internal server

re-roxygenize from the outside

TODO: find a way to avoid this in the future. Maybe
write a wrapper-function around devtools::document()
which temporarily sets the env-var STATCUBER_IN_STAT
another tweak for cli::style_hyperlink(). Hopefully,
this will get easier once these features mature
ad some notes that instead of VALUE and VALUESET
it is also possible to use uris for COUNT
resources in the "measures" parameter of
sc_table_custom()
this error was overlooked when the
error handling vignette was first written

fortunately, the API does a good job
of explaining the error in the json
body of the response so the error
handlerst do not need an upgrade

[ci skip]
the sc_table article now showcases the
print methods for all the example
datasets in german

[skip ci]
add those entries to the metadata.
NOTE: columns 5 and 7 are not used
in data.csv according the OGD standard
but some internale datasets provide these columns and therefore they are
imported as the description of the
measure/classification
add a patch release since the
additional metadata are needed for a
deployment

NEWS for 0.5.0.1 and 0.5.1 will be
merged when 0.5.1 is released
* since json-downloads requir a login,
  link to the login page
* link to the documentation page instead
  of the manual
- remove @Keywords internal
- add documentation for missing params

[skip ci]
first attempt to resolve #33. Recodes
can now be defined with an additional
parameter. However, type-checking is
very minimal.

TODO:
- better error handling when the
  request is constructed. This way
  users get quick and useful error
  messages - at least for semantic
  errors such as invalid usage of
  parameters
- with this implementation, users will
  have to make sure that the parameters
  "recodes" and "dimensions" are
  consistent. Maybe simplify the usage
- The naming sc_recode is almost
  conflicting with the class
  sc_recoder. Possibly rename this
  function
- extend the custom tables article to
  showcase some usecases for recodes
  and add a short discussion about
  usage limits
- maybe add sc_filter which only allows
  filter-type recodes and performs
  stricter type-checks?
showcase the usage of sc_recode in the
web documentation.
there are now several checks in place
that throw warnings if inputs in
sc_table_custom() or sc_recode() are
of the wrong schema-type or if other
inconsistencies are suspected. See
the section called "error handling"
in ?sc_table_custom for more details

some of those warnings might be
replaced with errors in the future

part of #33
add a minimum requirement to pillar
for the version from 2021-02-22
to make sure the S3 generics
format_tbl_footer() is available
don't use the .onLoad hook with
base::registerS3method but use the
import via NAMESPACE (roxygen)
instead

[skip ci]
reimplements #36 with a slightly
different approach in regards to
naming
links to cache files are now clickable and
last_modified and cached can will be
abbreviated if there is not enough horizontal
space
the resouce uris are now displayed similar
to sc_schema()
re-sync the roxygen-generated files
add a new parameter `dry_run` to
sc_table_custom() which allows to see what
request is generated without actually sending
it to the API

with this option, all type-checks are still
applied
GregorDeCillia and others added 29 commits February 28, 2023 17:42
check the argument against the list of
available schema types. the argument is
now also coerced via toupper() because
the spelling in schema uris uses lowercase
the nace classification in this database was
updadet. Reflect this in the example request

[ci skip]
cli_text uses the message channel to
generate the visible console outputs

this is not what to exprect from a
print method wich should always feed
into stdout

cli_text() is also used in other places
of STATcubeR but always wrapped into
cli_fmt() which means that output
channels do not matter in those
circumstances because the outputs are
captured to be formatted elsewhere
include another link to github into the
DESCRIPTION metadata. this is common
practice in most packages on CRAN

[ci skip]
if there are no saved tables, the
previous version generated an error of
the form "expected character but
got list"

now, a data.frame with zero rows is
returned instead

TODO: it is probably a good idea to
replace sapply() by vapply() everywhere
in STATcubeR. Most static code
alanyzers recommend this.

[ci skip]
there is a new namespace of datasets
coming up which will use the STAT_
prefix instead og OGD_ for the primary
id of the dataset. Relax the input
checks to allow OGD_ datasets to be
fetched. For external users, this will
only become relevant in a few months.
some internal datasets now use
$PublDateTime$ as a placeholder for
the deployment timestamp. Make sure
that those datasets can be used with
STATcubeR

The way this is implemented now,
reading and resaving a dataset is not
a no-op because the interpolated
value will be written in place of the
placholder. There might come a point
where it makes sense to implement this
differently in order to preserve the
placeholder

[ci skip]
[ci skip]
this is the first step to resolving
#27 by adding a function that creates
sc_table() like objects based on sdmx
archives

The sdmx format contains all metadata
that is necessary for STATcubeR to reuse
the existing $tabulate() workflow and this
first version already provides support for
various features via the base class (sc_data)

- $tabulate() to aggregate data
- $total_codes() to set/unset total codes
- $recoder to recode datasets (change labels)
  change codes, toggle visibility of
  elements, reorder elements, etc.
- importing german and english labels
  simultaniously (both languages are included
  in a zip download) and allowing to swhitch
  between them using $language<-().

New features
- sdmx arcives provide a $parent column
  in the $fields() table which are used
  to represent hierarchical classifications.
  Previously, this was only possible with
  od_table()

There are still some improvements. See
the issue #27 for more details

- properly parse time variables -
  currently they are treated as generic
  categories.
- parse element annotations (detailed
  descriptions for classification
  elements) and add them to
  $field()$de_desc just like with
  OGD dataset
- parse value annotations (see #39)
- provide a print/fromat method
- add a reasonable logic for total
  codes that takes the parent codes into
  account
- fill meta$measures$fun and
  $meta$measures$precision based on
  the sdmx metadata
- modify very long codes which use
  the @-symbol (probably for escapes)
- extend documentation
- possibly check SuperCROSS compability
import annotations from the sdmx metadata
and make them available as an additional
column in field()
ubuntu 18.04 is no loger supported
on gh-actions since 2023-04-01

bump up all the version numbers by two
years to check 22.04 and 20.04 instead
of 20.04 and 18.04

actions/runner-images#6002
in cases where several measures and
several fields are involved, the
previous logic produced incorrect
tabulations of the data
add a print mehod for descriptions
of sdmx files which are accessible
like so

x <- sdmx_table(...)
x$description
for some reason, sdmx archives use
escapes in the database ids such that
some characters are substututed like
this

    \x5f -> 5f@

undo this in the parser for the
underscore character, so the link
in the print method correctly
references a STATcube table

also, shorten the codes used in
$field()$code to omit everything
before the underscore

TODO: check if shortening field
codes like this might lead to
duplicate codes
avoid inconsistencies between
x$code and x$field(). Before this fix
simplification was only applied in
x$field() because of the anyDulicated()
check in sdmx_codes()

related: 215b05a

[skip ci]
resolve escapes as in @f5@ -> \uf5
for all codes in numeric columns

currently, there are only certain
symbols whitelisted which will be
resolved like this. possible
improvement: escape all character
sequences of this form by using a
regex

[skip ci]
suppress warnings if there is no newline
character at the end of a json request
file because that is the way the server
formats those files in the download
options

STATcubeR started doing this with
6b63a60

[ci skip]
@alexkowa alexkowa merged commit a24efcc into master Jul 18, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants