🕶️Upcoming Model Additions

Here are the other AI models that will soon be available for Core AI users :

Video-to-Anime : Mainly based on StyleGAN2 by rosalinity and partly from UGATIT.

Video-Object-Replacement : One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.

Video-Colorization : Colorization using a Generative Color Prior for Natural Images, an implementation of the ECCV 2022 Paper.

Make-Any-Image-Talk : Based on the framework of pix2pix-pytorch and MakeItTalk, ATVG, RhythmicHead, Speech-Driven Animation.

Image-to-Text : The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion to create cool art!

AI-Logo-Designer : Erlich is the text2image latent diffusion model from CompVis (with additions from glid-3-xl) finetuned on a dataset collected from LAION-5B named Large Logo Dataset. It consists of roughly 1000K images of logos with captions generated via BLIP using aggressive re-ranking.

Image-Style-Transfer : Image Style Transfer with a Single Text Condition.

Image-Segment : Image segmentation based on Segment Anything Model (SAM).

Image-Restoration : Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder.

Object-Removal : Combines Semantic segmentation and EdgeConnect architectures with minor changes in order to remove specified objects from photos.

Speech-to-Text : A general-purpose speech transcription model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech transcription as well as speech translation and language identification.

Text-to-Video : Based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description.

Image-Super-Resolution : Image super-resolution with Stable Diffusion 2.0.

Text-to-Music : Fine-tuned on images of spectrograms paired with text base on stable diffusion 2.0. Audio processing happens downstream of the model.

Text-Recognition : Based on PaddleOCR ch_ppocr_server_v2.0_xx model.

Generate-Detailed-Images-from-Scribbled-Drawings : This model is ControlNet adapting Stable Diffusion 2.0 to use a line drawing (or "scribble") in addition to a text input to generate an output image.

AI-Face-Swap : Based on GHOST (Generative High-fidelity One Shot Transfer).

Voice-Changing : This model adopts the end-to-end framework of VITS for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation.

AI-Clothes-Changer : Fine-tuned Stable Diffusion model trained on clothes changing.

AI-Interior-Design : Generate interior design based on stable diffussion.

Age-Prediction : Computes the similarity age with an input image, using CLIP.

Style-Your-Hair : Latent Optimization for Pose-Invariant Hairstyle Transfer via Local-Style-Aware Hair Alignment based on Barbershop.

Last updated