atksh
diff --git a/‎FLOAT32_PRECISION_ANALYSIS.md‎
Lines changed: 110 additions & 6 deletions b/‎FLOAT32_PRECISION_ANALYSIS.md‎
Lines changed: 110 additions & 6 deletions
diff --git a/‎test_different_sources.py‎
Lines changed: 252 additions & 0 deletions b/‎test_different_sources.py‎
Lines changed: 252 additions & 0 deletions
@@ -135,24 +135,128 @@ float32の精度限界：
 
 ## 検証スクリプト
 
-以下の3つのテストスクリプトを作成しました：
+以下のテストスクリプトを作成しました：
 
 1. `test_float32_overlap_issue.py`: 基本的な偽陰性テスト
 2. `test_float32_refined.py`: より厳密な精度境界テスト
 3. `test_float32_extreme.py`: 極端なエッジケーステスト
+4. `test_rounding_direction.py`: 丸め方向の不一致テスト
+5. `test_different_sources.py`: 異なるソースからの値の丸めテスト
+6. `test_false_negative_found.py`: 偽陰性の系統的探索
 
 実行方法:
 ```bash
 python test_float32_overlap_issue.py
 python test_float32_refined.py
 python test_float32_extreme.py
+python test_rounding_direction.py
+python test_different_sources.py
+python test_false_negative_found.py
 ```
 
+## 更新: 丸め方向の調査
+
+「丸める方向が違う場合」という指摘に基づき、追加調査を実施しました。
+
+### 重要な発見：偽陽性の検出
+
+`test_different_sources.py`の`test_accumulated_computation()`で**偽陽性**を検出：
+
+```python
+# 累積計算による丸め誤差
+accumulated_f64 = sum(0.1 for _ in range(1000))  # ≈ 99.999...
+direct_f64 = 100.0
+
+# Float64: accumulated < direct (重ならない)
+# Float32: 両方が 100.0 に丸まる (重なる！)
+
+Result:
+- Float64 tree: 0 pairs (正しい)
+- Float32 tree: 1 pair (偽陽性！)
+```
+
+これは報告された問題（偽陰性）の逆パターンです。float32の丸め込みにより、本来重ならないボックスが重なっていると誤判定されています。
+
+### 偽陰性が再現できない理由の分析
+
+1. **閉区間セマンティクス**: `<=` 比較により、境界で接触するボックスは常に交差と判定される
+2. **一貫した丸め込み**: 同じfloat64値は常に同じfloat32値に丸められる
+3. **内部一貫性**: すべての計算がfloat32で行われるため、比較は一貫している
+
+### 理論的な偽陰性発生シナリオ
+
+報告された問題が発生する可能性のある状況：
+
+1. **異なる計算パス**:
+   ```
+   Box A: 外部計算 -> float64 -> float32 (ツリー構築時)
+   Box B: 別の計算 -> float64 -> float32 (ツリー構築時)
+   ```
+   計算履歴の違いにより、本来重なるべき値が異なるfloat32表現になる可能性
+
+2. **コンパイラの最適化による中間精度**:
+   - C++コンパイラがfloat64中間精度を使用する場合がある
+   - `-ffloat-store`や`-fexcess-precision=standard`フラグの影響
+   - 最適化レベル（-O2, -O3）による挙動の違い
+
+3. **FPU設定とレジスタ精度**:
+   - x87 FPUの80bit拡張精度レジスタの影響
+   - SSE/AVX命令セットの使用有無
+   - 丸めモードの設定（RN, RZ, RP, RM）
+
+4. **データパイプラインの不整合**:
+   ```
+   Box A: ファイル読込 -> 文字列 -> float64 -> float32
+   Box B: 直接計算    -> float64 -> float32
+   ```
+   これらが微妙に異なる値になる可能性
+
+5. **プラットフォーム依存の挙動**:
+   - Windows vs Linux vs macOS での浮動小数点演算の違い
+   - ハードウェアアーキテクチャ（x86, ARM）の違い
+
 ## 結論
 
-1. **コードレベルでの脆弱性確認**: float32入力時の補正メカニズムの欠如を確認
-2. **実際の偽陰性の再現**: テストケースでは再現できず
-3. **理論的なリスク**: 特定の条件下（大きな座標値、微小な重なり）で偽陰性が発生する可能性あり
-4. **推奨事項**: 高精度が必要な場合はfloat64入力を使用すること
+1. **コードレベルでの脆弱性確認**:
+   - float32入力時の補正メカニズムの欠如を確認
+   - `include/prtree/core/prtree.h:157` で明示的に補正なしと記載
+
+2. **偽陰性の再現**:
+   - 合成テストケースでは再現できず
+   - すべての境界接触ケースで正しく検出される
+
+3. **偽陽性の発見**:
+   - 累積計算による丸め誤差で偽陽性を確認
+   - float64では重ならないがfloat32では重なる
+
+4. **理論的なリスク**:
+   - 偽陰性: 異なる計算パス、コンパイラ最適化、FPU設定の違い
+   - 偽陽性: 累積計算による丸め誤差
+
+5. **推奨事項**:
+   - **重要**: 高精度が必要な場合はfloat64入力を使用
+   - 累積計算を避け、直接計算を使用
+   - データパイプラインの一貫性を確保
+   - クリティカルな用途では float64 + 補正メカニズムに依存
+
+## 次のステップ
+
+この問題を完全に検証・解決するには：
+
+1. **報告者からの情報収集**:
+   - 具体的な失敗するデータセット（座標値）
+   - 発生環境の詳細（OS、コンパイラ、最適化フラグ）
+   - データの生成方法や処理パイプライン
+   - ビルド時のCMakeオプション
+
+2. **再現テスト**:
+   - 実際のデータでの検証
+   - 異なるプラットフォームでのテスト
+   - コンパイラオプションを変えてのビルド
+
+3. **潜在的な修正**:
+   - float32入力でも `idx2exact` を保持するオプション追加
+   - 精度警告システムの実装
+   - ドキュメントでの精度制限の明示
 
-この問題を完全に解決するには、報告者から具体的なデータセットや再現手順の提供が必要です。
+これらの情報があれば、問題を再現し、適切な修正を行うことができます。
@@ -0,0 +1,252 @@
+#!/usr/bin/env python3
+"""
+Test for rounding issues when values come from different sources.
+
+The hypothesis: When Box A's max and Box B's min are computed or stored
+independently, float32 rounding can create gaps that don't exist in float64.
+"""
+import numpy as np
+from python_prtree import PRTree2D
+
+
+def test_computed_vs_literal():
+    """
+    Test when coordinates come from computations vs literals.
+
+    A computed value might round differently than a literal value
+    due to intermediate precision.
+    """
+    print("\n=== Test: Computed vs Literal Values ===\n")
+
+    # Computed value (with intermediate float64 precision)
+    computed_f64 = np.float64(1.0) / np.float64(3.0) * np.float64(300.0)  # = 100.0
+
+    # Literal value
+    literal_f64 = np.float64(100.0)
+
+    print(f"Computed (f64): {computed_f64:.20f}")
+    print(f"Literal (f64):  {literal_f64:.20f}")
+    print(f"Equal in f64: {computed_f64 == literal_f64}")
+
+    computed_f32 = np.float32(computed_f64)
+    literal_f32 = np.float32(literal_f64)
+
+    print(f"\nComputed (f32): {computed_f32:.20f}")
+    print(f"Literal (f32):  {literal_f32:.20f}")
+    print(f"Equal in f32: {computed_f32 == literal_f32}")
+
+    # Create boxes
+    boxes = np.array([
+        [0.0, 0.0, computed_f64, 100.0],  # Ends at computed value
+        [literal_f64, 0.0, 200.0, 100.0],  # Starts at literal value
+    ], dtype=np.float64)
+
+    boxes_f32 = boxes.astype(np.float32)
+
+    print(f"\nOverlap (f64): {boxes[0,2] >= boxes[1,0]}")
+    print(f"Overlap (f32): {boxes_f32[0,2] >= boxes_f32[1,0]}")
+
+    idx = np.array([0, 1], dtype=np.int64)
+
+    tree_f64 = PRTree2D(idx, boxes)
+    pairs_f64 = tree_f64.query_intersections()
+
+    tree_f32 = PRTree2D(idx, boxes_f32)
+    pairs_f32 = tree_f32.query_intersections()
+
+    print(f"\nFloat64 tree: {len(pairs_f64)} pairs")
+    print(f"Float32 tree: {len(pairs_f32)} pairs")
+
+    if len(pairs_f64) != len(pairs_f32):
+        print("\n❌ FALSE NEGATIVE!")
+        return True
+    return False
+
+
+def test_accumulated_computation():
+    """
+    Test when values are accumulated through multiple operations.
+    """
+    print("\n=== Test: Accumulated Computation ===\n")
+
+    # Create a value through accumulation
+    accumulated_f64 = np.float64(0.0)
+    step = np.float64(0.1)
+    for i in range(1000):
+        accumulated_f64 += step
+
+    # Direct value
+    direct_f64 = np.float64(100.0)
+
+    print(f"Accumulated (f64): {accumulated_f64:.20f}")
+    print(f"Direct (f64):      {direct_f64:.20f}")
+    print(f"Difference:        {abs(accumulated_f64 - direct_f64):.20e}")
+
+    accumulated_f32 = np.float32(accumulated_f64)
+    direct_f32 = np.float32(direct_f64)
+
+    print(f"\nAccumulated (f32): {accumulated_f32:.20f}")
+    print(f"Direct (f32):      {direct_f32:.20f}")
+    print(f"Equal in f32: {accumulated_f32 == direct_f32}")
+
+    # Create boxes with these values
+    boxes_f64 = np.array([
+        [0.0, 0.0, accumulated_f64, 100.0],
+        [direct_f64, 0.0, 200.0, 100.0],
+    ], dtype=np.float64)
+
+    boxes_f32 = boxes_f64.astype(np.float32)
+
+    print(f"\nOverlap (f64): {boxes_f64[0,2] >= boxes_f64[1,0]}")
+    print(f"Overlap (f32): {boxes_f32[0,2] >= boxes_f32[1,0]}")
+
+    idx = np.array([0, 1], dtype=np.int64)
+
+    tree_f64 = PRTree2D(idx, boxes_f64)
+    pairs_f64 = tree_f64.query_intersections()
+
+    tree_f32 = PRTree2D(idx, boxes_f32)
+    pairs_f32 = tree_f32.query_intersections()
+
+    print(f"\nFloat64 tree: {len(pairs_f64)} pairs")
+    print(f"Float32 tree: {len(pairs_f32)} pairs")
+
+    if len(pairs_f64) != len(pairs_f32):
+        print("\n❌ FALSE NEGATIVE!")
+        return True
+    return False
+
+
+def test_separate_float32_arrays():
+    """
+    Test when float32 values are created in separate arrays.
+
+    Key insight: If two float32 values are created independently,
+    they might have different representations even if they should be equal.
+    """
+    print("\n=== Test: Separate Float32 Arrays ===\n")
+
+    # Create a problematic float64 value
+    problematic = np.float64(100.0) + np.float64(1e-7)
+
+    print(f"Problematic value (f64): {problematic:.20f}")
+
+    # Create first array with this value as max
+    array1_f32 = np.array([0.0, 0.0, problematic, 100.0], dtype=np.float32)
+
+    # Create second array with this value as min
+    array2_f32 = np.array([problematic, 0.0, 200.0, 100.0], dtype=np.float32)
+
+    print(f"\nArray1[2] (max): {array1_f32[2]:.20f}")
+    print(f"Array2[0] (min): {array2_f32[0]:.20f}")
+    print(f"Equal: {array1_f32[2] == array2_f32[0]}")
+    print(f"Overlap: {array1_f32[2] >= array2_f32[0]}")
+
+    # Combine into boxes
+    boxes_f32 = np.vstack([array1_f32.reshape(1, -1), array2_f32.reshape(1, -1)])
+
+    print(f"\nCombined boxes (f32):\n{boxes_f32}")
+
+    idx = np.array([0, 1], dtype=np.int64)
+
+    tree = PRTree2D(idx, boxes_f32)
+    pairs = tree.query_intersections()
+
+    print(f"\nIntersections found: {len(pairs)}")
+    print(f"Pairs: {pairs}")
+
+    # Expected: should find intersection since they touch
+    if len(pairs) == 0:
+        print("\n❌ FALSE NEGATIVE: Touching boxes not detected!")
+        return True
+
+    return False
+
+
+def test_binary_representation():
+    """
+    Test values that have identical decimal representation but different binary.
+    """
+    print("\n=== Test: Binary Representation ===\n")
+
+    # Create a value that cannot be exactly represented in binary
+    decimal_val = 0.1
+
+    # In float64
+    val_f64 = np.float64(decimal_val)
+    print(f"0.1 in float64: {val_f64:.60f}")
+    print(f"Hex: {val_f64.hex()}")
+
+    # In float32
+    val_f32 = np.float32(decimal_val)
+    print(f"\n0.1 in float32: {val_f32:.60f}")
+    print(f"Hex: {val_f32.hex()}")
+
+    # Scale up
+    scale = 1000
+    scaled_f64 = val_f64 * scale
+    scaled_f32 = val_f32 * scale
+
+    print(f"\nScaled (f64): {scaled_f64:.60f}")
+    print(f"Scaled (f32): {scaled_f32:.60f}")
+
+    # Now use these as coordinates
+    boxes_f64 = np.array([
+        [0.0, 0.0, scaled_f64, 100.0],
+        [scaled_f64, 0.0, 200.0, 100.0],
+    ], dtype=np.float64)
+
+    # Create float32 version two ways:
+    # 1. Convert from float64
+    boxes_f32_converted = boxes_f64.astype(np.float32)
+
+    # 2. Create directly with float32
+    boxes_f32_direct = np.array([
+        [0.0, 0.0, val_f32 * scale, 100.0],
+        [val_f32 * scale, 0.0, 200.0, 100.0],
+    ], dtype=np.float32)
+
+    print(f"\nBoxes f32 (converted):\n{boxes_f32_converted}")
+    print(f"\nBoxes f32 (direct):\n{boxes_f32_direct}")
+    print(f"\nAre they equal? {np.array_equal(boxes_f32_converted, boxes_f32_direct)}")
+
+    idx = np.array([0, 1], dtype=np.int64)
+
+    tree_f64 = PRTree2D(idx, boxes_f64)
+    pairs_f64 = tree_f64.query_intersections()
+
+    tree_f32_conv = PRTree2D(idx, boxes_f32_converted)
+    pairs_f32_conv = tree_f32_conv.query_intersections()
+
+    tree_f32_dir = PRTree2D(idx, boxes_f32_direct)
+    pairs_f32_dir = tree_f32_dir.query_intersections()
+
+    print(f"\nFloat64 tree: {len(pairs_f64)} pairs")
+    print(f"Float32 (converted): {len(pairs_f32_conv)} pairs")
+    print(f"Float32 (direct): {len(pairs_f32_dir)} pairs")
+
+    if len(pairs_f64) != len(pairs_f32_conv) or len(pairs_f64) != len(pairs_f32_dir):
+        print("\n❌ FALSE NEGATIVE!")
+        return True
+
+    return False
+
+
+if __name__ == "__main__":
+    print("=" * 70)
+    print("Testing Different Sources of Rounding")
+    print("=" * 70)
+
+    issue_found = False
+
+    issue_found |= test_computed_vs_literal()
+    issue_found |= test_accumulated_computation()
+    issue_found |= test_separate_float32_arrays()
+    issue_found |= test_binary_representation()
+
+    print("\n" + "=" * 70)
+    if issue_found:
+        print("❌ FALSE NEGATIVE CONFIRMED!")
+    else:
+        print("⚠️  No false negatives in these tests")
+    print("=" * 70)