Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streaming without hyperlinks #319

Closed

Conversation

pacoguzman
Copy link

Hi guys,

While using roo to parse xlsm files and using the each_row_streaming iteration we realized that a lot of memory consumption is due to the code related with the hyperlinks.

I'm not sure what the hyperlinks are maybe someone can explain to us, but we ended doing is adding an option where you can disable that part of code that in our case we don't need to get the data that we finally ingest from the excel files. Sorry but I cannot share the excel file here came from one of our customers.

Do you think this can be useful for other people? So it can be merge in the main repository and released as a new version.

If you need something else like some metrics processing the files, to add documentation in the README about that option, adding some specs about it usage. Just let me know I'll be glad to address all of that to get merged. We'll be easier for us to use the gem that keep updating with this our fork

Thanks in advance

@coveralls
Copy link

coveralls commented May 31, 2016

Coverage Status

Coverage increased (+0.009%) to 94.357% when pulling 1179278 on bebanjo:v2.4.0-streaming-without-hyperlinks into 211f89b on roo-rb:master.

@jaimerson
Copy link

:shipit:

@stevendaniels
Copy link
Contributor

Sorry for not getting on this sooner. It's a really interesting PR. Could I get some of the metrics for processing the file?

stevendaniels added a commit that referenced this pull request Aug 21, 2016
@stevendaniels
Copy link
Contributor

Thanks. I've rebased in another PR.

jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Oct 15, 2016
## [2.5.1] 2016-08-26
### Fixed
- Fixed NameError. [337](roo-rb/roo#337)

## [2.5.0] 2016-08-21
### Fixed
- Remove tempdirs via finalizers on garbage collection. This cleans them up in all known cases, rather than just when the #close method is called. The #close method can be used to cleanup early. [329](roo-rb/roo#329)
- Fixed README.md typo [318](roo-rb/roo#318)
- Parse sheets in ODS files once to improve performance [320](roo-rb/roo#320)
- Fix some Cell conversion issues [324](roo-rb/roo#324) and [331](roo-rb/roo#331)
- Improved memory performance [332](roo-rb/roo#332)
- Added `no_hyperlinks` option to improve streamig performance [319](roo-rb/roo#319) and [333](roo-rb/roo#333)

### Deprecations
- Roo::Base::TEMP_PREFIX should be accessed via Roo::TEMP_PREFIX
- The private Roo::Base#make_tempdir is now available at the class level in
  classes that use tempdirs, added via Roo::Tempdir
=======
### Added
- Discard hiperlinks lookups to allow streaming parsing without loading whole files

## [2.4.0] 2016-05-14
### Fixed
- Fixed opening spreadsheets with charts [315](roo-rb/roo#315)
- Fixed memory issues for Roo::Utils.number_to_letter [308](roo-rb/roo#308)
- Fixed Roo::Excelx::Cell::Number to recognize floating point numbers [306](roo-rb/roo#306)
- Fixed version number in Readme.md [304](roo-rb/roo#304)

### Added
- Added initial support for HTML formatting [278](roo-rb/roo#278)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants