Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
yejy53 authored Aug 18, 2024
1 parent 6a2837f commit 80d3e01
Show file tree
Hide file tree
Showing 11 changed files with 76 additions and 33 deletions.
109 changes: 76 additions & 33 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ <h4><sup><math xmlns="http://www.w3.org/1998/Math/MathML"><mo>†</mo></math></s
</span>
</div>
</div>
<img src="./static/images/fig1_crossview_3.jpg"
<img src="./static/images/Fig1_flag.jpg"
class="interpolation-image"
alt="Interpolate start reference image."
width="200%"/>
Expand All @@ -133,25 +133,26 @@ <h4><sup><math xmlns="http://www.w3.org/1998/Math/MathML"><mo>†</mo></math></s
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
Street-to-satellite image synthesis focuses on generating realistic satellite images
from corresponding ground street-view images while maintaining a consistent
content layout, similar to looking down from the sky. The significant differences
in perspectives create a substantial domain gap between the views, making this
cross-view generation task particularly challenging. In this paper, we introduce
SkyDiffusion, a novel cross-view generation method for synthesizing satellite
images from street-view images, leveraging diffusion models and Bird’s Eye View
(BEV) paradigm. First, we design a Curved-BEV method to transform streetview images to the satellite view, reformulating the challenging cross-domain
image synthesis task into a conditional generation problem. Curved-BEV also
includes a "Multi-to-One" mapping strategy for combining multiple street-view
images within the same satellite coverage area, effectively solving the occlusion
issues in dense urban scenes. Next, we design a BEV-controlled diffusion model
to generate satellite images consistent with the street-view content, which also
incorporates a light manipulation module to optimize the lighting condition of the
synthesized image using a reference satellite. Experimental results demonstrate
that SkyDiffusion outperforms state-of-the-art methods on both suburban (CVUSA
& CVACT) and urban (VIGOR-Chicago) cross-view datasets, with an average
SSIM increase of 14.5% and a FID reduction of 29.6%, achieving realistic and
content-consistent satellite image generation.
Street-to-satellite image synthesis focuses on generating realistic satellite images
from corresponding ground street-view images while maintaining a consistent content
layout, similar to looking down from the sky. The significant differences in perspectives
create a substantial domain gap between the views, making this cross-view generation task
particularly challenging. In this paper, we introduce SkyDiffusion, a novel cross-view
generation method for synthesizing satellite images from street-view images, leveraging
diffusion models and Bird's Eye View (BEV) paradigm. First, we design a Curved-BEV method
to transform street-view images to the satellite view, reformulating the challenging
cross-domain image synthesis task into a conditional generation problem. Curved-BEV
also includes a "Multi-to-One" mapping strategy for leveraging multiple street-view
images within the same satellite coverage area, effectively solving the occlusion
issues in dense urban scenes. Next, we design a BEV-controlled diffusion model to
generate satellite images consistent with the street-view content, which also incorporates
a light manipulation module to make the lighting conditions of the synthesized
satellite images more flexible. Experimental results demonstrate that SkyDiffusion
outperforms state-of-the-art methods on both suburban (CVUSA & CVACT) and urban
(VIGOR-Chicago) cross-view datasets, with an average SSIM increase of 13.96% and a
FID reduction of 20.54%, achieving realistic and content-consistent satellite image
generation. The code and models of this work will be released at https://opendatalab.github.io/skydiffusion/

</p>
</div>
</div>
Expand All @@ -171,7 +172,7 @@ <h2 class="title is-3">Method</h2>
<div class="content has-text-justified">

<div class="column is-centered has-text-centered">
<img src="./static/images/fig2_overview_2.jpg"
<img src="./static/images/fig2_pipeline.jpg"
class="interpolation-image"
alt="Interpolate start reference image."
width="150%"/>
Expand Down Expand Up @@ -272,6 +273,14 @@ <h3>Ablation Study</h3>
id="image2"/>
<figcaption style="font-size: 20px;text-align: center;"><b>Light manipulation</b></figcaption>
</div>
<div class="item item-image3">
<img src="./static/images/CrossRegion.png"
class="interpolation-image"
alt="Interpolate start reference image."
width="150%"
id="image3"/>
<figcaption style="font-size: 20px;text-align: center;"><b>Synthesis results for cross-dataset generalization</b></figcaption>
</div>
</div>
</div>
</div>
Expand All @@ -290,12 +299,11 @@ <h2 class="title is-3">Evaluation</h2>
<!-- index evaluation-->
<h3>Quantitative Evaluation</h3>
<p style="font-size: 20px;">
We present a quantitative comparison of different methods on the CVUSA, CVACT and OmniCity datasets,
evaluating them in terms of various metrics. Compared to the state-of-the-art method for cross-view synthesis (Sat2Density),
our method achieved significant improvements in SSIM and FID scores by <b>9.44%</b> and <b>42.70%</b>
on CVUSA, respectively. Similarly, enhancements of <b>6.46%</b> and <b>10.94%</b> in SSIM and
FIDwere observed on CVACT. Our method achieved significant improvements in
SSIM and FID by <b>11.71%</b> and <b>52.21%</b> on OmniCity, respectively.
On the suburban CVUSA and CVACT datasets, our SkyDiffusion method achieved the outstanding
results. Compared to state-of-the-art methods, it reduced FID by <b>25.83%</b> and increased SSIM
by <b>13.89%</b>, demonstrating its superiority in synthesizing realistic and consistent satellite
images. In the urban VIGOR-Chicago dataset, SkyDiffusion reduced FID by <b>9.96%</b> and improved
SSIM by <b>14.11%</b> compared to the state-of-the-art method
</p>

<div class="column is-centered has-text-centered">
Expand All @@ -308,19 +316,54 @@ <h3>Quantitative Evaluation</h3>

<!-- Ablation study evaluation-->
<p style="font-size: 20px;">
"Street" represents directly using street-view
image inputs, "Curved-BEV" denotes using Curved-BEV transformation, and "Light" stands for the
Light Manipulation module.
"Baseline" represents directly using street-view image,"C-BEV"
denotes using Curved-BEV transformation, and "Multi" stands for
Multi-to-One strategy.
</p>

<div class="column is-centered has-text-centered">
<img src="./static/images/Ablation_CVACT.png"
<img src="./static/images/CBEV_ablation.png"
class="interpolation-image"
alt="Interpolate start reference image."/>
<figcaption style="font-size: 14px;text-align: center;">Ablation study of different modules on CVACT</figcaption>
alt="Interpolate start reference image."
width="60%"/>
<figcaption style="font-size: 14px;text-align: center;">Ablation study of the Curved-BEV module</figcaption>
</div>

<!-- End Ablation study evaluation-->

<!-- Light Ablation study evaluation-->
<p style="font-size: 20px;">
The ablation experiments in the beneath table indicate that the Light Manipulation module aligns the
lighting conditions of the synthesized images with those of the target domain images, improving SSIM and PSNR metrics.
</p>

<div class="column is-centered has-text-centered">
<img src="./static/images/Light_Ablation.png"
class="interpolation-image"
alt="Interpolate start reference image."
width="60%"/>
<figcaption style="font-size: 14px;text-align: center;">Ablation study of the light manipulation module</figcaption>
</div>

<!-- End Light Ablation study evaluation-->
<!-- Light Ablation study evaluation-->
<p style="font-size: 20px;">
We trained the model on CVACT and tested it on the VIGOR-Chicago, and vice versa, to evaluate cross-dataset
generation capability. Compared to Instruct pix2pix, our method (w/o light) demonstrates superior performance
across metrics. Our method effectively preserves scene content such as road directions and intersections.
</p>

<div class="column is-centered has-text-centered">
<img src="./static/images/CrossRegion_Ablation.png"
class="interpolation-image"
alt="Interpolate start reference image."
width="60%"/>
<figcaption style="font-size: 14px;text-align: center;">Cross-dataset generalization assessment</figcaption>
</div>

<!-- End Light Ablation study evaluation-->


</div>
</div>
</div>
Expand Down
Binary file added static/images/CBEV_ablation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/images/CVACT_results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/images/CVUSA_results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/CrossRegion.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/CrossRegion_Ablation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/Fig1_flag.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/Light_Ablation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/images/QuantitativeCom_3datasets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified static/images/VIGOR_results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/fig2_pipeline.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 80d3e01

Please sign in to comment.