Fix LightGBM models locale sensitivity and improve R/W performance.

When Java is used, the default C++ locale is broken. This is true for Java providers that use the C API or even Python models that require JEP. This patch solves that issue making the model reads/writes insensitive to such settings. To achieve it, within the model read/write codebase: - C++ streams are imbued with the classic locale - Calls to functions that are dependent on the locale are replaced - The default locale is not changed! This approach means: - The user's locale is never tampered with, avoiding issues such as microsoft#2979 with the previous approach microsoft#2891 - Datasets can still be read according the user's locale - The model file has a single format independent of locale Changes: - Add CommonC namespace which provides faster locale-independent versions of Common's methods - Model code makes conversions through CommonC - Cleanup unused Common methods - Performance improvements. Use fast libraries for locale-agnostic conversion: - value->string: https://github.com/fmtlib/fmt - string->double: https://github.com/lemire/fast_double_parser (10x faster double parsing according to their benchmark) Bugfixes: - microsoft#2500 - microsoft#2890 - ninia/jep#205 (as it is related to LGBM as well)
feedzai · Nov 7, 2020 · 0ccd15b · 0ccd15b
1 parent da6c6ea
commit 0ccd15b
Show file tree

Hide file tree

Showing 7 changed files with 298 additions and 192 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -1,3 +1,9 @@
 [submodule "include/boost/compute"]
 	path = compute
 	url = https://github.com/boostorg/compute
+[submodule "external_libs/fmt"]
+	path = external_libs/fmt
+	url = https://github.com/fmtlib/fmt.git
+[submodule "external_libs/fast_double_parser"]
+	path = external_libs/fast_double_parser
+	url = https://github.com/lemire/fast_double_parser.git
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -444,6 +444,11 @@ if(__BUILD_FOR_R)
   endif(MSVC)
 endif(__BUILD_FOR_R)
 
+# fmtlib/fmt
+add_subdirectory(external_libs/fmt)
+TARGET_LINK_LIBRARIES(lightgbm PUBLIC fmt::fmt)
+TARGET_LINK_LIBRARIES(_lightgbm PUBLIC fmt::fmt)
+
 install(TARGETS lightgbm _lightgbm
         RUNTIME DESTINATION ${CMAKE_INSTALL_PREFIX}/bin
         LIBRARY DESTINATION ${CMAKE_INSTALL_PREFIX}/lib

diff --git a/external_libs/fast_double_parser b/external_libs/fast_double_parser
diff --git a/external_libs/fmt b/external_libs/fmt