Add files via upload

opendatalab · Aug 18, 2024 · 80d3e01 · 80d3e01
1 parent 6a2837f
commit 80d3e01
Show file tree

Hide file tree

Showing 11 changed files with 76 additions and 33 deletions.
diff --git a/index.html b/index.html
@@ -113,7 +113,7 @@ <h4><sup><math xmlns="http://www.w3.org/1998/Math/MathML"><mo>†</mo></math></s
               </span>
             </div>
           </div>
-          <img src="./static/images/fig1_crossview_3.jpg"
+          <img src="./static/images/Fig1_flag.jpg"
               class="interpolation-image"
               alt="Interpolate start reference image."
               width="200%"/>
@@ -133,25 +133,26 @@ <h4><sup><math xmlns="http://www.w3.org/1998/Math/MathML"><mo>†</mo></math></s
         <h2 class="title is-3">Abstract</h2>
         <div class="content has-text-justified">
           <p>
-            Street-to-satellite image synthesis focuses on generating realistic satellite images
-            from corresponding ground street-view images while maintaining a consistent
-            content layout, similar to looking down from the sky. The significant differences
-            in perspectives create a substantial domain gap between the views, making this
-            cross-view generation task particularly challenging. In this paper, we introduce
-            SkyDiffusion, a novel cross-view generation method for synthesizing satellite
-            images from street-view images, leveraging diffusion models and Bird’s Eye View
-            (BEV) paradigm. First, we design a Curved-BEV method to transform streetview images to the satellite view, reformulating the challenging cross-domain
-            image synthesis task into a conditional generation problem. Curved-BEV also
-            includes a "Multi-to-One" mapping strategy for combining multiple street-view
-            images within the same satellite coverage area, effectively solving the occlusion
-            issues in dense urban scenes. Next, we design a BEV-controlled diffusion model
-            to generate satellite images consistent with the street-view content, which also
-            incorporates a light manipulation module to optimize the lighting condition of the
-            synthesized image using a reference satellite. Experimental results demonstrate
-            that SkyDiffusion outperforms state-of-the-art methods on both suburban (CVUSA
-            & CVACT) and urban (VIGOR-Chicago) cross-view datasets, with an average
-            SSIM increase of 14.5% and a FID reduction of 29.6%, achieving realistic and
-            content-consistent satellite image generation. 
+            Street-to-satellite image synthesis focuses on generating realistic satellite images 
+            from corresponding ground street-view images while maintaining a consistent content 
+            layout, similar to looking down from the sky. The significant differences in perspectives 
+            create a substantial domain gap between the views, making this cross-view generation task 
+            particularly challenging. In this paper, we introduce SkyDiffusion, a novel cross-view 
+            generation method for synthesizing satellite images from street-view images, leveraging 
+            diffusion models and Bird's Eye View (BEV) paradigm. First, we design a Curved-BEV method 
+            to transform street-view images to the satellite view, reformulating the challenging 
+            cross-domain image synthesis task into a conditional generation problem. Curved-BEV 
+            also includes a "Multi-to-One" mapping strategy for leveraging multiple street-view 
+            images within the same satellite coverage area, effectively solving the occlusion 
+            issues in dense urban scenes. Next, we design a BEV-controlled diffusion model to 
+            generate satellite images consistent with the street-view content, which also incorporates 
+            a light manipulation module to make the lighting conditions of the synthesized 
+            satellite images more flexible. Experimental results demonstrate that SkyDiffusion 
+            outperforms state-of-the-art methods on both suburban (CVUSA & CVACT) and urban 
+            (VIGOR-Chicago) cross-view datasets, with an average SSIM increase of 13.96% and a 
+            FID reduction of 20.54%, achieving realistic and content-consistent satellite image 
+            generation. The code and models of this work will be released at https://opendatalab.github.io/skydiffusion/
+
           </p>
         </div>
       </div>
@@ -171,7 +172,7 @@ <h2 class="title is-3">Method</h2>
         <div class="content has-text-justified">
 
         <div class="column is-centered has-text-centered">
-          <img src="./static/images/fig2_overview_2.jpg"
+          <img src="./static/images/fig2_pipeline.jpg"
                 class="interpolation-image"
                 alt="Interpolate start reference image."
                 width="150%"/>
@@ -272,6 +273,14 @@ <h3>Ablation Study</h3>
                 id="image2"/>
                 <figcaption style="font-size: 20px;text-align: center;"><b>Light manipulation</b></figcaption>
         </div>
+        <div class="item item-image3">
+          <img src="./static/images/CrossRegion.png"
+                class="interpolation-image"
+                alt="Interpolate start reference image."
+                width="150%"
+                id="image3"/>
+                <figcaption style="font-size: 20px;text-align: center;"><b>Synthesis results for cross-dataset generalization</b></figcaption>
+        </div>
       </div>
     </div>
     </div>
@@ -290,12 +299,11 @@ <h2 class="title is-3">Evaluation</h2>
           <!-- index evaluation-->
           <h3>Quantitative Evaluation</h3>
           <p style="font-size: 20px;">
-            We present a quantitative comparison of different methods on the CVUSA, CVACT and OmniCity datasets, 
-            evaluating them in terms of various metrics. Compared to the state-of-the-art method for cross-view synthesis (Sat2Density), 
-            our method achieved significant improvements in SSIM and FID scores by <b>9.44%</b> and <b>42.70%</b>
-            on CVUSA, respectively. Similarly, enhancements of <b>6.46%</b> and <b>10.94%</b> in SSIM and
-            FIDwere observed on CVACT. Our method achieved significant improvements in
-            SSIM and FID by <b>11.71%</b> and <b>52.21%</b> on OmniCity, respectively.
+            On the suburban CVUSA and CVACT datasets, our SkyDiffusion method achieved the outstanding 
+            results. Compared to state-of-the-art methods, it reduced FID by <b>25.83%</b> and increased SSIM 
+            by <b>13.89%</b>, demonstrating its superiority in synthesizing realistic and consistent satellite 
+            images. In the urban VIGOR-Chicago dataset, SkyDiffusion reduced FID by <b>9.96%</b> and improved 
+            SSIM by <b>14.11%</b> compared to the state-of-the-art method
           </p>
 
          <div class="column is-centered has-text-centered">
@@ -308,19 +316,54 @@ <h3>Quantitative Evaluation</h3>
 
           <!-- Ablation study evaluation-->
           <p style="font-size: 20px;">
-            "Street" represents directly using street-view
-            image inputs, "Curved-BEV" denotes using Curved-BEV transformation, and "Light" stands for the
-            Light Manipulation module.            
+            "Baseline" represents directly using street-view image,"C-BEV" 
+            denotes using Curved-BEV transformation, and "Multi" stands for 
+            Multi-to-One strategy.    
           </p>
 
          <div class="column is-centered has-text-centered">
-           <img src="./static/images/Ablation_CVACT.png"
+           <img src="./static/images/CBEV_ablation.png"
                  class="interpolation-image"
-                 alt="Interpolate start reference image."/>
-                 <figcaption style="font-size: 14px;text-align: center;">Ablation study of different modules on CVACT</figcaption>
+                 alt="Interpolate start reference image."
+                 width="60%"/>
+                 <figcaption style="font-size: 14px;text-align: center;">Ablation study of the Curved-BEV module</figcaption>
          </div>
 
           <!-- End Ablation study evaluation-->
+
+          <!-- Light Ablation study evaluation-->
+          <p style="font-size: 20px;">
+            The ablation experiments in the beneath table indicate that the Light Manipulation module aligns the 
+            lighting conditions of the synthesized images with those of the target domain images, improving SSIM and PSNR metrics.   
+          </p>
+
+         <div class="column is-centered has-text-centered">
+           <img src="./static/images/Light_Ablation.png"
+                 class="interpolation-image"
+                 alt="Interpolate start reference image."
+                 width="60%"/>
+                 <figcaption style="font-size: 14px;text-align: center;">Ablation study of the light manipulation module</figcaption>
+         </div>
+
+          <!-- End Light Ablation study evaluation-->
+          <!-- Light Ablation study evaluation-->
+          <p style="font-size: 20px;">
+            We trained the model on CVACT and tested it on the VIGOR-Chicago, and vice versa, to evaluate cross-dataset 
+            generation capability. Compared to Instruct pix2pix, our method (w/o light) demonstrates superior performance 
+            across metrics. Our method effectively preserves scene content such as road directions and intersections.
+          </p>
+
+          <div class="column is-centered has-text-centered">
+          <img src="./static/images/CrossRegion_Ablation.png"
+                class="interpolation-image"
+                alt="Interpolate start reference image."
+                width="60%"/>
+                <figcaption style="font-size: 14px;text-align: center;">Cross-dataset generalization assessment</figcaption>
+          </div>
+
+          <!-- End Light Ablation study evaluation-->
+
+
         </div>
       </div>
     </div>

diff --git a/static/images/CBEV_ablation.png b/static/images/CBEV_ablation.png
diff --git a/static/images/CVACT_results.png b/static/images/CVACT_results.png
diff --git a/static/images/CVUSA_results.png b/static/images/CVUSA_results.png
diff --git a/static/images/CrossRegion.png b/static/images/CrossRegion.png
diff --git a/static/images/CrossRegion_Ablation.png b/static/images/CrossRegion_Ablation.png
diff --git a/static/images/Fig1_flag.jpg b/static/images/Fig1_flag.jpg
diff --git a/static/images/Light_Ablation.png b/static/images/Light_Ablation.png
diff --git a/static/images/QuantitativeCom_3datasets.png b/static/images/QuantitativeCom_3datasets.png
diff --git a/static/images/VIGOR_results.png b/static/images/VIGOR_results.png
diff --git a/static/images/fig2_pipeline.jpg b/static/images/fig2_pipeline.jpg