Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressions cause segfault on Windows (MSVC 2019 v16) #1336

Closed
laxnpander opened this issue Nov 28, 2022 · 9 comments
Closed

Expressions cause segfault on Windows (MSVC 2019 v16) #1336

laxnpander opened this issue Nov 28, 2022 · 9 comments
Assignees
Milestone

Comments

@laxnpander
Copy link

Description

Expressions seem to cause a segfault on Windows 2019 v16 (potentially others as well). None of the examples with expressions work. According to the debugger, the following line causes the crash (ExpressionNode.h, L279):

    Record(const Values& values, const ExpressionNode<A1>& expression1, ExecutionTraceStorage* ptr)
        : value1(expression1.traceExecution(values, trace1, ptr + upAligned(sizeof(Record)))) {}

Stacktrace:

 	SFMExampleExpressions.exe!gtsam::internal::CallRecord<2>::CallRecord<2>()	C++
 	SFMExampleExpressions.exe!gtsam::internal::CallRecordImplementor<gtsam::internal::UnaryExpression<Eigen::Matrix<double,2,1,0,2,1>,Eigen::Matrix<double,3,1,0,3,1>>::Record,2>::CallRecordImplementor<gtsam::internal::UnaryExpression<Eigen::Matrix<double,2,1,0,2,1>,Eigen::Matrix<double,3,1,0,3,1>>::Record,2>()	C++
>	SFMExampleExpressions.exe!gtsam::internal::UnaryExpression<Eigen::Matrix<double,2,1,0,2,1>,Eigen::Matrix<double,3,1,0,3,1>>::Record::Record(const gtsam::Values & values, const gtsam::internal::ExpressionNode<Eigen::Matrix<double,3,1,0,3,1>> & expression1, boost::detail::aligned_storage::aligned_storage_imp<1,32> * ptr) Line 280	C++
 	SFMExampleExpressions.exe!gtsam::internal::UnaryExpression<Eigen::Matrix<double,2,1,0,2,1>,Eigen::Matrix<double,3,1,0,3,1>>::traceExecution(const gtsam::Values & values, gtsam::internal::ExecutionTrace<Eigen::Matrix<double,2,1,0,2,1>> & trace, boost::detail::aligned_storage::aligned_storage_imp<1,32> * ptr) Line 322	C++
 	SFMExampleExpressions.exe!gtsam::internal::BinaryExpression<Eigen::Matrix<double,2,1,0,2,1>,gtsam::Cal3_S2,Eigen::Matrix<double,2,1,0,2,1>>::Record::Record(const gtsam::Values & values, const gtsam::internal::ExpressionNode<gtsam::Cal3_S2> & expression1, const gtsam::internal::ExpressionNode<Eigen::Matrix<double,2,1,0,2,1>> & expression2, boost::detail::aligned_storage::aligned_storage_imp<1,32> * ptr) Line 404	C++
 	SFMExampleExpressions.exe!gtsam::internal::BinaryExpression<Eigen::Matrix<double,2,1,0,2,1>,gtsam::Cal3_S2,Eigen::Matrix<double,2,1,0,2,1>>::traceExecution(const gtsam::Values & values, gtsam::internal::ExecutionTrace<Eigen::Matrix<double,2,1,0,2,1>> & trace, boost::detail::aligned_storage::aligned_storage_imp<1,32> * ptr) Line 432	C++
 	SFMExampleExpressions.exe!gtsam::Expression<Eigen::Matrix<double,2,1,0,2,1>>::traceExecution(const gtsam::Values & values, gtsam::internal::ExecutionTrace<Eigen::Matrix<double,2,1,0,2,1>> & trace, void * traceStorage) Line 193	C++
 	SFMExampleExpressions.exe!gtsam::Expression<Eigen::Matrix<double,2,1,0,2,1>>::valueAndJacobianMap(const gtsam::Values & values, gtsam::internal::JacobianMap & jacobians) Line 218	C++
 	SFMExampleExpressions.exe!gtsam::ExpressionFactor<Eigen::Matrix<double,2,1,0,2,1>>::linearize(const gtsam::Values & x) Line 139	C++
 	gtsamDebug.dll!gtsam::NonlinearFactorGraph::linearize(const gtsam::Values & linearizationPoint) Line 270	C++
 	gtsamDebug.dll!gtsam::DoglegOptimizer::iterate() Line 87	C++
 	gtsamDebug.dll!gtsam::NonlinearOptimizer::defaultOptimize() Line 95	C++
 	gtsamDebug.dll!gtsam::NonlinearOptimizer::optimize() Line 98	C++
 	SFMExampleExpressions.exe!main(int argc, char * * argv) Line 95	C++

Steps to reproduce

CMake configuration is with GTSAM_BUILD_UNSTABLE=OFF and GTSAM_BUILD_PYTHON=OFF. Compiler used is MSVC2019 v16 with Boost 1.72 and GTSAM version of Eigen3.

D:\Workspace\spear\spear\3rd\gtsam\cmake-build>cmake --build . --config debug
CMake is re-running because D:/Workspace/spear/spear/3rd/gtsam/cmake-build/examples/CMakeFiles/generate.stamp is out-of-date.
  the file 'D:/Workspace/spear/spear/3rd/gtsam/examples/CMakeLists.txt'
  is newer than 'D:/Workspace/spear/spear/3rd/gtsam/cmake-build/examples/CMakeFiles/generate.stamp.depend'
  result='-1'
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.19044.
-- GTSAM Version: 4.2a7
-- GTSAM_POSE3_EXPMAP=ON, enabling GTSAM_ROT3_EXPMAP as well
-- Found Eigen version: 3.3.7
-- Could NOT find MKL (missing: MKL_INCLUDE_DIR MKL_LIBRARIES)
-- Could NOT find TBB (missing: TBB_INCLUDE_DIRS TBB_LIBRARIES tbb tbbmalloc) (Required is at least version "4.4")
-- Building 3rdparty
-- Could NOT find GeographicLib (missing: GeographicLib_LIBRARY_DIRS GeographicLib_LIBRARIES GeographicLib_INCLUDE_DIRS)
-- Building base
-- Building basis
-- Building geometry
-- Building inference
-- Building symbolic
-- Building discrete
-- Building hybrid
-- Building linear
-- Building nonlinear
-- Building sam
-- Building sfm
-- Building slam
-- Building navigation
-- Adding precompiled header for MSVC
-- GTSAM Version: 4.2.0
-- Install prefix: D:/Workspace/spear/spear/3rd/gtsam/install
-- Building GTSAM - shared: ON
-- Wrote D:/Workspace/spear/spear/3rd/gtsam/cmake-build/GTSAMConfig.cmake
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- ===============================================================
-- ================  Configuration Options  ======================
--  CMAKE_CXX_COMPILER_ID type                       : MSVC
--  CMAKE_CXX_COMPILER_VERSION                       : 19.29.30146.0
--  CMake version                                    : 3.23.1
--  CMake generator                                  : Visual Studio 16 2019
--  CMake build tool                                 : C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/MSBuild/Current/Bin/MSBuild.exe
-- Build flags
--  Build Tests                                      : Enabled
--  Build examples with 'make all'                   : Enabled
--  Build timing scripts with 'make all'             : Disabled
--  Build shared GTSAM libraries                     : Enabled
--  Put build type in library name                   : Enabled
--  Build libgtsam_unstable                          : Disabled
--  Build GTSAM unstable Python                      : Enabled
--  Build MATLAB Toolbox for unstable                : Disabled
--  GTSAM_COMPILE_FEATURES_PUBLIC                    : cxx_std_11
--  GTSAM_COMPILE_OPTIONS_PUBLIC                     : /wd4267
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC                 : EIGEN_NO_STATIC_ASSERT;EIGEN_NO_STATIC_ASSERT
--  GTSAM_COMPILE_OPTIONS_PUBLIC_DEBUG               :
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC_DEBUG           :
--  GTSAM_COMPILE_OPTIONS_PUBLIC_RELEASE             :
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC_RELEASE         :
--  GTSAM_COMPILE_OPTIONS_PUBLIC_TIMING              :
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC_TIMING          :
--  GTSAM_COMPILE_OPTIONS_PUBLIC_PROFILING           :
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC_PROFILING       :
--  GTSAM_COMPILE_OPTIONS_PUBLIC_RELWITHDEBINFO      :
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC_RELWITHDEBINFO  :
--  GTSAM_COMPILE_OPTIONS_PUBLIC_MINSIZEREL          :
--  GTSAM_COMPILE_DEFINITIONS_PUBLIC_MINSIZEREL      :
--  Use System Eigen                                 : OFF (Using version: 3.3.7)
--  Use System Metis                                 : OFF
--  Using Boost version                              : 107200
--  Use Intel TBB                                    : TBB not found
--  Eigen will use MKL                               : MKL not found
--  Eigen will use MKL and OpenMP                    : OpenMP found but GTSAM_WITH_EIGEN_MKL is disabled
--  Default allocator                                : STL
--  Cheirality exceptions enabled                    : YES
-- Packaging flags
--  CPack Source Generator                           : TGZ
--  CPack Generator                                  : TGZ
-- GTSAM flags
--  Quaternions as default Rot3                      : Disabled
--  Runtime consistency checking                     : Disabled
--  Rot3 retract is full ExpMap                      : Enabled
--  Pose3 retract is full ExpMap                     : Enabled
--  Allow features deprecated in GTSAM 4.1           : Enabled
--  Metis-based Nested Dissection                    : Enabled
--  Use tangent-space preintegration                 : Enabled
-- MATLAB toolbox flags
--  Install MATLAB toolbox                           : Disabled
-- Python toolbox flags
--  Build Python module with pybind                  : Disabled
-- ===============================================================
CMake Warning at cmake/HandleFinalChecks.cmake:3 (message):
  TBB 4.4 or newer was not found - this is ok, but note that GTSAM
  parallelization will be disabled.  Set GTSAM_WITH_TBB to 'Off' to avoid
  this warning.

Expected behavior

It should not segfault.

Environment

MSVC2019 v16, CMake 3.23, Boost 1.72, no TBB, Eigen3 shipped with GTSAM

Additional information

@laxnpander
Copy link
Author

Update: I removed (ptr) in all lines like these:

Record* record = new (ptr) Record(values, *expression1_, ptr);

so it's like

Record* record = new Record(...);

instead of

Record* record = new (ptr) Record(...);

Now the examples seem to work (no crash, low final error). Can someone smarter than me say if this modification is safe?

@laxnpander laxnpander changed the title Expressions cause segfault on Windows (MSVC 16) Expressions cause segfault on Windows (MSVC 2019 v16) Nov 28, 2022
@dellaert
Copy link
Member

Interesting. This is the placement new operator and I remember it being crucial for performance and correctness. I have no idea why it would give problems on windows, unless there is some change in how they handle this (fairly rarely used) C++ feature.

@ProfFan
Copy link
Collaborator

ProfFan commented Nov 30, 2022

@laxnpander Could you try modifying

static const unsigned TraceAlignment = 32;
to be 64 instead of 32?

@laxnpander
Copy link
Author

laxnpander commented Nov 30, 2022

@ProfFan Still segfaulting. I just realised that Pose2SLAMExampleExpressions.exe works. However, Pose3SLAMExampleExpressions_BearingRangeWithTransform.exe as well as SFMExampleExpressions.exe are crashing. Not sure if this helps. Will recompile your change with debug flag and see if stacktrace changed or not.

Edit: Maybe I was mistaken, Pose2SLAMExampleExpressions seems to be crashing as well.

Stack trace for this:

	Pose2SLAMExampleExpressions.exe!gtsam::internal::ExecutionTrace<gtsam::Pose2>::~ExecutionTrace<gtsam::Pose2>() Zeile 174	C++
 	Pose2SLAMExampleExpressions.exe!gtsam::Expression<gtsam::Pose2>::valueAndJacobianMap(const gtsam::Values & values, gtsam::internal::JacobianMap & jacobians) Zeile 224	C++
 	Pose2SLAMExampleExpressions.exe!gtsam::ExpressionFactor<gtsam::Pose2>::linearize(const gtsam::Values & x) Zeile 139	C++
 	gtsamDebug.dll!gtsam::NonlinearFactorGraph::linearize(const gtsam::Values & linearizationPoint) Zeile 270	C++
 	gtsamDebug.dll!gtsam::GaussNewtonOptimizer::iterate() Zeile 49	C++
 	gtsamDebug.dll!gtsam::NonlinearOptimizer::defaultOptimize() Zeile 95	C++
 	gtsamDebug.dll!gtsam::NonlinearOptimizer::optimize() Zeile 98	C++
 	Pose2SLAMExampleExpressions.exe!main(int argc, char * * argv) Zeile 72	C++

@laxnpander
Copy link
Author

@ProfFan: Out of curiosity: Did you try running these examples on a windows system? I tried on two different laptops and both were throwing the same error. Same MSVC version though. The CI also just checks for compilation. I just wonder if this is a common issue or related to my configuration...

@ProfFan ProfFan self-assigned this Jan 9, 2023
@ProfFan ProfFan added this to the GTSAM 5 milestone Jan 9, 2023
@oicchris
Copy link
Contributor

It appears the issue is caused by destruction order. In MSVC, the pointer is being deallocated prior to internal::ExecutionTrace being destroyed.
gtsam/nonlinear/Expression-inl.h (209-224)

#ifdef _MSC_VER
  auto traceStorage = static_cast<internal::ExecutionTraceStorage*>(_aligned_malloc(size, internal::TraceAlignment));
#else
  internal::ExecutionTraceStorage traceStorage[size];
#endif

  internal::ExecutionTrace<T> trace;
  T value(this->traceExecution(values, trace, traceStorage));
  trace.startReverseAD1(jacobians);

#ifdef _MSC_VER
  _aligned_free(traceStorage); // <-- Deallocation and trace.content.ptr is now dangling.
#endif

  return value;
} // <-- internal::ExecutionTrace<T>::~ExecutionTrace() dtor calls trace.content.ptr if kind==Function

In the destructor, it is invoking a call on the dangling pointer.
gtsam/nonlinear/internal/ExecutionTrace.h (173-177)

  ~ExecutionTrace() {
    if (kind == Function) {
      content.ptr->~CallRecord<Dim>(); // <-- content.ptr is invalid
    }
  }

In my workaround, I used std::unique_ptr<> with a custom deleter. After this change, my test succeeded. Note that I seem to encounter the error when running in Debug, Release seems to be happy accessing the dangling pointer.

#ifdef _MSC_VER
  std::unique_ptr<void, void(*)(void*)> traceStorageDeleter(
    _aligned_malloc(size, internal::TraceAlignment),
    [](void *ptr) { _aligned_free(ptr); } );
  auto traceStorage = static_cast<internal::ExecutionTraceStorage*>(traceStorageDeleter.get());
#else
  internal::ExecutionTraceStorage traceStorage[size];
#endif

  internal::ExecutionTrace<T> trace;
  T value(this->traceExecution(values, trace, traceStorage));
  trace.startReverseAD1(jacobians);

  return value;
} // <-- traceStorage is deallocated after trace's dtor.

@dellaert
Copy link
Member

Hi @oicchris, great sleuthing! Would you be willing to do a quick PR with your fix for the _MSC_VER path?

@oicchris
Copy link
Contributor

oicchris commented Jan 25, 2023

Pfff please ignore the force pushes.. was attempting to reword a comment to make it palatable. Didn't realize it would retain the extra pushes. I'm new to Github, is there a way to clean up the above mess?

dellaert added a commit that referenced this issue Jan 27, 2023
Fix (issue #1336) dangling pointer access violation.
oicchris added a commit to oicchris/gtsam that referenced this issue Jan 28, 2023
@ProfFan
Copy link
Collaborator

ProfFan commented Jan 28, 2023

Closing, thank you @oicchris !

@ProfFan ProfFan closed this as completed Jan 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants