Skip to content

Commit

Permalink
Fix LightGBM models locale sensitivity and improve R/W performance.
Browse files Browse the repository at this point in the history
When Java is used, the default C++ locale is broken. This is true for
Java providers that use the C API or even Python models that require JEP.

This patch solves that issue making the model reads/writes insensitive
to such settings.
To achieve it, within the model read/write codebase:
 - C++ streams are imbued with the classic locale
 - Calls to functions that are dependent on the locale are replaced
 - The default locale is not changed!

This approach means:
 - The user's locale is never tampered with, avoiding issues such as
    microsoft#2979 with the previous
    approach microsoft#2891
 - Datasets can still be read according the user's locale
 - The model file has a single format independent of locale

Changes:
 - Add CommonC namespace which provides faster locale-independent versions of Common's methods
 - Model code makes conversions through CommonC
 - Cleanup unused Common methods
 - Performance improvements. Use fast libraries for locale-agnostic conversion:
   - value->string: https://github.com/fmtlib/fmt
   - string->double: https://github.com/lemire/fast_double_parser (10x
      faster double parsing according to their benchmark)

Bugfixes:
 - microsoft#2500
 - microsoft#2890
 - ninia/jep#205 (as it is related to LGBM as well)
  • Loading branch information
AlbertoEAF committed Nov 7, 2020
1 parent da6c6ea commit 0ccd15b
Show file tree
Hide file tree
Showing 7 changed files with 298 additions and 192 deletions.
6 changes: 6 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
[submodule "include/boost/compute"]
path = compute
url = https://github.com/boostorg/compute
[submodule "external_libs/fmt"]
path = external_libs/fmt
url = https://github.com/fmtlib/fmt.git
[submodule "external_libs/fast_double_parser"]
path = external_libs/fast_double_parser
url = https://github.com/lemire/fast_double_parser.git
5 changes: 5 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,11 @@ if(__BUILD_FOR_R)
endif(MSVC)
endif(__BUILD_FOR_R)

# fmtlib/fmt
add_subdirectory(external_libs/fmt)
TARGET_LINK_LIBRARIES(lightgbm PUBLIC fmt::fmt)
TARGET_LINK_LIBRARIES(_lightgbm PUBLIC fmt::fmt)

install(TARGETS lightgbm _lightgbm
RUNTIME DESTINATION ${CMAKE_INSTALL_PREFIX}/bin
LIBRARY DESTINATION ${CMAKE_INSTALL_PREFIX}/lib
Expand Down
1 change: 1 addition & 0 deletions external_libs/fast_double_parser
Submodule fast_double_parser added at 98c751
1 change: 1 addition & 0 deletions external_libs/fmt
Submodule fmt added at 2e620d
Loading

0 comments on commit 0ccd15b

Please sign in to comment.