StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces

We propose

\texttt{StochSync}

, a method for generating images in arbitrary spaces—such as 360° panoramas or textures on 3D surfaces—using a pretrained image diffusion model. The main challenge is bridging the gap between the 2D images understood by the diffusion model (instance space

\mathcal{X}

) and the target space for image generation (canonical space

\mathcal{Z}

). Unlike previous methods that struggle without strong conditioning or lack fine details,

\texttt{StochSync}

combines the strengths of Diffusion Synchronization and Score Distillation Sampling to perform effectively even with weak conditioning. Our experiments show that

\texttt{StochSync}

outperforms prior finetuning-based methods, especially in 360° panorama generation.

💡Overall Idea

Diffusion Synchronization (DS): Excels in producing detailed images but struggles with coherence across views when instance and canonical spaces are not pixel-aligned, often requiring strong guidance like depth maps.

Score Distillation Sampling (SDS): Achieves coherent results, but lacks fine details and high-quality textures.

We observed that the coherence in SDS comes from using maximum stochasticity in the denoising process—specifically, setting the noise level

\sigma_t

to its highest value during each reverse diffusion step in DDIM.

By incorporating maximum stochasticity into the reverse diffusion process of DS, along with other methods to further enhance quality, we achieve the coherence benefits of SDS while retaining the detailed quality of DS.

🌍360° Panorama Generation

🎨3D Mesh Texturing