AI Avatar

- April 05, 2025

OmniTalker: Video to Video (alibaba)

https://humanaigc.github.io/omnitalker/

ACTalker: https://github.com/harlanhong/ACTalker

Voice Cloner and Text-to-Speech:

"Spark-TTS"

An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

https://sparkaudio.github.io/spark-tts/

---------------------------------------------------------------------------------------

Image + Audio = Video with Hands and Face movement

"EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation"

https://github.com/antgroup/echomimic_v2

Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 11.7
Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G)
Tested Python Version: 3.8 / 3.10 / 3.11

---------------------------------------------------------------------------------------

Image + Audio = Video with only Face movement

"Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation"

https://fudan-generative-vision.github.io/hallo/#/

https://github.com/sdbds/hallo-for-windows?tab=readme-ov-file

-----------------------------------------------------------------------------------------------

Best Model : "Hunyuan Video"

https://aivideo.hunyuan.tencent.com/

-------------------------------------------------------------------------------------------------

GPU Poor version by DeepBeepMeep

https://github.com/deepbeepmeep/HunyuanVideoGP

Reduce greatly the RAM requirements and VRAM requirements
5 profiles in order to able to run the model at a decent speed on a low end consumer config (32 GB of RAM and 12 VRAM) and to run it at a very good speed on a high end consumer config (48 GB of RAM and 24 GB of VRAM)
Support multiple pretrained Loras with 32 GB of RAM or less
Switch easily between Hunyuan and Fast Hunyuan models and quantized / non quantized models

--------------------------------------------------------------------------------------------------------------------

Intel OpenVino Avatar:

https://www.intel.com/content/www/us/en/developer/articles/technical/ai-avatar-talking-bot-with-pytorch-and-opea.html

Requires Proprietary Intel Hardware (Gaudi Accelerator)

--------------------------------------------------------------------------------------------------------------------

Reference Video:

https://www.youtube.com/watch?v=QbvaByhXR8U

Search This Blog

AI विश्व

Python

AI Avatar

OmniTalker: Video to Video (alibaba)

ACTalker: https://github.com/harlanhong/ACTalker

Voice Cloner and Text-to-Speech:

"Spark-TTS"

An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Image + Audio = Video with Hands and Face movement

"EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation"

Image + Audio = Video with only Face movement

"Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation"

Best Model : "Hunyuan Video"

GPU Poor version by DeepBeepMeep

Intel OpenVino Avatar:

--------------------------------------------------------------------------------------------------------------------

Reference Video:

Comments

Popular posts from this blog

Segmentation Models

AI Video