Skip to content
This repository has been archived by the owner on Feb 4, 2022. It is now read-only.

Provide access to formulas and raw (not formatted) numbers #152

Closed
jennybc opened this issue Jul 28, 2015 · 9 comments · Fixed by #223
Closed

Provide access to formulas and raw (not formatted) numbers #152

jennybc opened this issue Jul 28, 2015 · 9 comments · Fixed by #223

Comments

@jennybc
Copy link
Owner

jennybc commented Jul 28, 2015

http://stackoverflow.com/questions/31520605/read-cell-values-without-formatting-into-r-with-googlesheets

@jennybc
Copy link
Owner Author

jennybc commented Jul 28, 2015

in the course of answering that question, I made this Sheet:

https://docs.google.com/spreadsheets/d/19lRTCJDf9BYz9JepHx7y6u8vcxGbFpVSfIuxXnWpsL0/

register it like so:

ss <- gs_title("formatted-numbers-and-formulas")

@jennybc
Copy link
Owner Author

jennybc commented Jul 28, 2015

The definitive way to address this is probably to consult inputValue (where formulas show up) and numericValue (where unformatted numbers show up) in the cell node of the entry nodes. Which means it's connected to formulas and #18.

@jennybc jennybc mentioned this issue Jul 28, 2015
@jennybc
Copy link
Owner Author

jennybc commented Aug 31, 2015

@EricGoldsmith I know you care about this because of your stackoverflow question. Here is rather raw cell feed data from the Sheet mentioned above, processed with an experimental version of gs_read_cellfeed():

ss <- gs_title("formatted-numbers-and-formulas")
foo <- ss %>%
  gs_read_cellfeed()
#> Accessing worksheet titled "Sheet1"

foo %>% 
  arrange(col, row) %>% 
  select(-cell_alt, -row, -col) %>% 
  mutate(inputValue = substr(inputValue, 1, 25))
#> Source: local data frame [20 x 4]
#> 
#>     cell       cell_text                inputValue numericValue
#>    (chr)           (chr)                     (chr)        (chr)
#> 1     A1          Number                    Number           NA
#> 2     A2          123456                    123456     123456.0
#> 3     A3          123457                    123457     123457.0
#> 4     A4          123458                    123458     123458.0
#> 5     B1  Number_wFormat            Number_wFormat           NA
#> 6     B2         123,456                =R[0]C[-1]     123456.0
#> 7     B3         123,457                =R[0]C[-1]     123457.0
#> 8     B4         123,458                =R[0]C[-1]     123458.0
#> 9     C1         Percent                   Percent           NA
#> 10    C2          0.1234                    0.1234       0.1234
#> 11    C3          0.2345                    0.2345       0.2345
#> 12    C4          0.3456                    0.3456       0.3456
#> 13    D1 Percent_wFormat           Percent_wFormat           NA
#> 14    D2          12.34%                =R[0]C[-1]       0.1234
#> 15    D3          23.45%                =R[0]C[-1]       0.2345
#> 16    D4          34.56%                =R[0]C[-1]       0.3456
#> 17    E1        Formulas                  Formulas           NA
#> 18    E2          Google =HYPERLINK("http://www.go           NA
#> 19    E3          370371 =sum(R[-1]C[-4]:R[1]C[-4]     370371.0
#> 20    E4                 =IMAGE("https://www.googl           NA

right now gs_read_cellfeed() returns only the cell_text (the "literal value" described below). But the XML element attributes also contain inputValue and, optionally, numericValue. I'm going to ponder what to return and/or what choices to expose in gs_read_cellfeed(). But feel free to express on opinion here too. That goes for anyone listening or stumbling across this.

From the official API docs, here's a description of where info is in the cell feed:

The inputValue attribute of a cell entry always contains the value that a user would otherwise type into the Google Sheets user interface to manipulate the cell (i.e. either a literal value or a formula). To set a formula or a literal value on a cell, provide the text of the formula or value as the inputValue attribute. Remember that formulas must start with = to be considered a formula. If a cell contains a formula, the formula is always provided as the inputValue when cell entries are returned by the API.

The numericValue attribute of a cell entry, when present, indicates that the cell was determined to have a numeric value, and its numeric value is indicated with this attributed.

The literal value of the cell element is the calculated value of the cell, without formatting applied. If the cell contains a formula, the calculated value is given here. The Sheets API has no concept of formatting, and thus cannot manipulate formatting of cells.

@jennybc
Copy link
Owner Author

jennybc commented Aug 31, 2015

@jrosen48 Do you care to weigh in here? Your issue (#18) and this one are clearly converging.

@jennybc jennybc changed the title Deal with percentages and numbers with embedded commas Provide access to formulas and raw (not formatted) numbers Aug 31, 2015
@EricGoldsmith
Copy link

Patterning after the existing parameters for gs_read_cellfeed(), perhaps:

return_formats - logical; for numeric values, indicates whether to return values with cell formatting applied (TRUE), or only the numeric value (FALSE)
return_formulas - logical; indicates whether to return the text of any cell formulas

@jennybc
Copy link
Owner Author

jennybc commented Mar 2, 2016

@EricGoldsmith You can now install from the formulas branch and try out the new ability to get data without numeric formatting applied.

You can read a draft vignette and/or just try it out.

## devtools::install_github("jennybc/googlesheets@formulas")
library(googlesheets)
suppressMessages(library(dplyr))
## a sheet I've created w/ formatting and formula challenges
## subsitute your own sheet here?
ffs <- gs_key("19lRTCJDf9BYz9JepHx7y6u8vcxGbFpVSfIuxXnWpsL0", lookup = FALSE)
#> Worksheets feed constructed with public visibility
## go look at it?
## gs_browse(ffs)

## prove you can consume it the normal way
(ffs_read_csv <- gs_read_csv(ffs, verbose = FALSE))
#> No encoding supplied: defaulting to UTF-8.
#> Source: local data frame [5 x 5]
#> 
#>   Number Number_wFormat Character     Formulas Formula_wFormat
#>    (int)          (chr)     (chr)        (chr)           (chr)
#> 1 123456        654,321       one       Google        3.18E+05
#> 2 345678         12.34%        NA 1,271,591.00          52.63%
#> 3 234567       1.23E+09     three           NA            0.22
#> 4     NA          3 1/7      four         $A$1      123,456.00
#> 5 567890          $0.36      five           NA         317,898

## consume all of it but strip numeric formatting before type conversion
ffs %>% gs_read(literal = FALSE, verbose = FALSE)
#> Source: local data frame [5 x 5]
#> 
#>   Number Number_wFormat Character  Formulas Formula_wFormat
#>    (int)          (dbl)     (chr)     (chr)           (dbl)
#> 1 123456   6.543210e+05       one    Google    3.178978e+05
#> 2 345678   1.234000e-01        NA 1271591.0    5.263144e-01
#> 3 234567   1.234568e+09     three        NA    2.173942e-01
#> 4     NA   3.141593e+00      four      $A$1    1.234560e+05
#> 5 567890   3.600000e-01      five        NA    3.178978e+05

## consume some of it and get a data frame
ffs %>%  gs_read(range = cell_cols("D:E"), literal = FALSE, verbose = FALSE)
#> Source: local data frame [5 x 2]
#> 
#>    Formulas Formula_wFormat
#>       (chr)           (dbl)
#> 1    Google    3.178978e+05
#> 2 1271591.0    5.263144e-01
#> 3        NA    2.173942e-01
#> 4      $A$1    1.234560e+05
#> 5        NA    3.178978e+05

## consume some of it and make an atomic vector
ffs %>%
  gs_read_cellfeed(range = cell_cols(2), verbose = FALSE) %>%
  gs_simplify_cellfeed(literal = FALSE)
#>           B2           B3           B4           B5           B6 
#> 6.543210e+05 1.234000e-01 1.234568e+09 3.141593e+00 3.600000e-01

@EricGoldsmith
Copy link

This looks good to me. Thanks for implementing.

@jennybc
Copy link
Owner Author

jennybc commented Mar 7, 2016

@EricGoldsmith Have you actually tried it? Or are you looking at what I showed above. No hard feelings either way, but I'm just curious. Still some work to do before I will merge it in, so input now is most welcome!

@EricGoldsmith
Copy link

Yes, I actually tried it :-) Numbers with various formatting applied (e.g. comma separators, percentage, etc.) are being read correctly with the literal = FALSE argument.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants