Scientists Propose Star-Shaped Diffusion Model

Scientists at the AI Research Centre and the Faculty of Computer Science at HSE University, the Artificial Intelligence Research Institute (AIRI), and Sber AI have come up with novel architecture for diffusion neural networks, making it possible to configure eight distinct types of noise distribution. Instead of the classical Markov chain model with Gaussian distribution, the scientists propose a star-shaped model where the distribution type can be selected and preset. This can aid in solving problems across various geometric modalities. The results were presented at the NeurIPS 2023 conference.

The study was supported by a grant for research centres in the field of AI provided by the Analytical Centre for the Government of the Russian Federation.

Over the past two decades, generative artificial intelligence systems have shown significant improvement in performance. Previously, these systems generated texts and images of relatively low quality in a single step. However, with the introduction of diffusion models—a special type of neural network—the process now involves multiple steps, resulting in higher-quality outcomes.

Diffusion neural networks rely on denoising diffusion probabilistic models (DDPM), which operate as follows: at each stage, random alterations are made to the data—for instance, colours or brightness levels may vary with each step. The noise is then gradually reduced, transforming the data to resemble the desired outcome, until the final image emerges from the chaos.

The model is based on the Markov chain that incrementally introduces noise and subsequently reverses the diffusion process to recover the original data, such as an image featuring a cat. The neural network learns these transformations from training data, which includes examples of the original image alongside its noisy variations.

While such models excel at generating images and sounds, they struggle with more complex tasks, such as generating three-dimensional structures. This is attributed to the fact that the iterative addition of noise in the diffusion model follows a normal (Gaussian) distribution. But if the source objects have constraints, these cannot be configured and maintained consistently throughout the steps.

The team of researchers propose a new model that optimises data processing by providing for alterations of noise distribution type. To accomplish this, the researchers have restructured the model into a star-shaped configuration, where all states diverge from the original object outward rather than being contained within the Markov chain.

Let us say the neural network's task is to generate a molecule. The molecule consists of three types of atoms, each specified using discrete data. If the data is noised according to a normal distribution, the atom types may assume values that do not exist in the real world. In a star-shaped model, we can select the desired distribution type, ensuring that the data remains undistorted.

Andrey Okhotin
Co-author of the paper, Research Assistant, Centre of Deep Learning and Bayesian Methods, AI and Digital Science Institute, Faculty of Computer Science, HSE University

The new model comprises two components: one for noising the object through iterative removal of information, and the other for learning to reverse the process by taking a step back in the chain. The model can be configured for eight types of distributions that support data constraints.

We have adopted a new structure for the reverse process. Previously, each subsequent state could be derived solely from one previous state, whereas now, each state of the object depends on all preceding ones. In this structure, information is aggregated into a single object, which we refer to as tail statistics , and then fed into the neural network to guide its subsequent steps. This enables us to train the model more effectively.

Dmitry Vetrov
Academic Supervisor, AI and Digital Science Institute, HSE University; Scientific Consultant, AIRI

The scientists compared the performance of their star-shaped model with that of conventional diffusion models. For text generation tasks in the standard mode, the new model performed at the same quality level. However, in the accelerated mode—with fewer generation steps—it outperformed conventional models for images, producing a dataset closer to the original.

In handling complex tasks which involved generating points in various geometric spaces—such as a sphere, a simplex, and a space of matrices describing ellipses—the star-shaped model demonstrated significantly better outcomes than the classical diffusion model.

In the problem of generating points on a sphere, the model was trained to mark locations on the Earth's surface where, according to the 2020 geodetic dataset, fires most frequently occurred. Following that, a comparison was made between the actual points and those generated by the model. The model was found to generate points in close proximity to the original dataset, and its output was comparable to that achieved by existing methods for solving this problem.

фон Мизес-Фишера	von Mises-Fischer
Дирихле	Dirichlet
Уишарта	Wishart
Настоящая	Real
Сгенерированная	Generated

In this paper, we propose a more versatile diffusion model capable of generating objects with complex structures. This will facilitate the application of such methods to a broader range of problems in the natural sciences, such as biology, physics, and chemistry, where there are structural constraints to generating objects like molecules, states of elementary particles, and chemical compounds.

Aibek Alanov
Co-author of the paper, Junior Research Fellow, Centre of Deep Learning and Bayesian Methods, AI and Digital Science Institute, HSE University; Research Fellow, AIRI

May 15, 2024

High Tech