
With the recent advances in technology and the field of Artificial Intelligence, there have been many innovations. Whether it is generating text using the super trending ChatGPT model or generating images from text, now everything is possible. Currently, there are several text-to-image models that not only produce a new image from a textual description, but also edit an existing image. Generating an image is often easier than editing an available image, since many fine details need to be maintained during editing. For precise text-based image editing, researchers have developed a new algorithm, EDICT: Exact Diffusion Inversion via Coupled Transformations. EDICT is a new algorithm capable of performing text-guided image editing with the help of diffusion models.
Text-to-image generation is a task in which a machine learning model is trained to produce an image based on a given text description. The model learns to associate text descriptions with images and generates new images that match the specified description. EDICT performs text-to-image broadcast generation using any existing broadcast model. In image generation, diffusion models are generative models that use a diffusion process to produce new images. The diffusion process starts from a random image and then iteratively filters it by applying a series of transformations until reaching a final image similar to the target image.

Diffusion models are trained to generate a noise-free image from a noisy image with the help of a textual description. To edit an image, noise is added to the original image, and this partial rendering is used to perform a new rendering using the given text. EDICT works on the concept of getting a noisy image that would produce exactly the original image when given the original text or prompt. It’s a kind of reverse noise technique. This way, if the original text is modified slightly, the edited image will remain almost unchanged with only the required modifications.
The team behind EDICT shares the results of the algorithm with the help of an example. When generating an image of a cat surfing in the water by editing an existing image of a dog surfing, many details and minute information are lost, such as the waves, the color of the board, etc. This is because, in this method, noise is simply added to the original image to generate the new one. In the EDICT technique, the reverse generation is performed by finding a noisy image that would exactly generate the original image. This noisy image generates the actual image of the surfer dog with the help of the textual caption. The noise from the generated image is copied to refer back to the model with the noise-free image. Following this, adjustment is made to the text simply by replacing the word dog with the word cat, and finally a comparatively detailed edited image of a cat surfing is obtained. EDICT works simply on the idea of making two identical copies of an image and alternatively enhancing each with details of the other in a reversible manner.
This new approach certainly looks promising, as current text-to-image rendering models are inconsistent and do not do justice to the details of the original image. By reversing the rendering process, important image content can be preserved. Considering the increasing innovations and demand for these imaging models, EDICT seems to be a great competition for all existing models.
review the Paper, GithubY science fiction blog. All credit for this research goes to the researchers of this project. Also, don’t forget to join our reddit page, discord channelY electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.