AI विश्व

Posts

Showing posts from April, 2025

Python

- April 24, 2025

Glossary : https://docs.python.org/3/glossary.html#glossary Tutorial : https://docs.python.org/3/tutorial/index.html Python 13 What's new : https://docs.python.org/3.13/whatsnew/3.13.html Python Standard Library : https://docs.python.org/3.13/library/index.html The Python Language Reference : https://docs.python.org/3.13/reference/index.html Books : Python Distilled Automate the Boring Stuff Quick Python

Click to open »

Blogs

- April 21, 2025

https://mohitdagarwal.substack.com/p/from-dominance-to-dilemma-nvidia https://cloud.google.com/blog https://ludic.mataroa.blog/blog/i-accidentally-saved-half-a-million-dollars/

Click to open »

OCR

- April 17, 2025

Docling : https://github.com/docling-project Docling Docs : https://docling-project.github.io/docling/installation/ OCR Engines : EasyOCR, Tesseract, OcrMac, RapidOCR, OnnxTR -- General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model GOT https://github.com/Ucas-HaoranWei/GOT-OCR2.0 https://huggingface.co/stepfun-ai/GOT-OCR-2.0-hf Gives CUDA version and Transformers library errors -- OLMOCR Main Page : https://olmocr.allenai.org/ https://github.com/allenai/olmocr --- ★ PDF-Extract-Kit (huge resource requirement) https://github.com/opendatalab/PDF-Extract-Kit -- Reddit https://www.reddit.com/r/LocalLLaMA/comments/172k9q2/best_model_for_document_layout_analysis_and_ocr/ -- ★ Facebook AI Research Nougat: Neural Optical Understanding for Academic Documents: https://facebookresearch.github.io/nougat/ -- Donut 🍩 : Document Understanding Transformer: https://github.com/clovaai/donut/ -- HURIDOCS New open-source AI tool unl...

Click to open »

LLM

- April 10, 2025

MAMBA https://www.datacamp.com/tutorial/introduction-to-the-mamba-llm-architecture

Click to open »

NVIDIA

- April 09, 2025

Developer.Nvidia.Com https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/ https://docs.nvidia.com/ai-workbench/user-guide/latest/reference/version-history.html Books : https://www.nvidia.com/en-in/training/books/ Main Book LDL Coding Examples : https://github.com/NVDLI/LDL NVIDIA On-Demand : https://www.nvidia.com/en-us/on-demand/ NVIDIA Research https://www.nvidia.com/en-us/research/

Click to open »

ML

- April 09, 2025

https://mixpeek.com/blog/turning-frames-into-dataframes

Click to open »

Websites of Interest

- April 09, 2025

37signals.com https://www.paulgraham.com https://www.paulgraham.com/ace.html https://artificialanalysis.ai/ https://www.skild.ai/ https://stackblitz.com/careers Unitree Bilibili : https://space.bilibili.com/521974986 https://lfaidata.foundation/projects/ - Docling : https://github.com/docling-project - Open Platform for Enterprise AI : https://opea.dev/ Linux Foundation OPEA Week Event : https://opea.dev/event/opea-genai-streamlined/#GenAI - OpenVINO: https://openvinotoolkit.github.io/openvino_notebooks/ https://openvinotoolkit.github.io/openvino_notebooks/?tasks=Image-to-Text https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks - https://www.thisiscolossal.com/ Photography News https://www.ycombinator.com/launches https://www.cekura.ai/ https://www.ycombinator.com/launches/M57-cekura-formerly-vocera-testing-monitoring-for-ai-voice-agents https://www.sievedata.com

Click to open »

Project Management

- April 09, 2025

https://www.paulgraham.com/makersschedule.html PMBOK

Click to open »

Research Papers

- April 09, 2025

https://garrickbrazil.com/omni3d/ https://github.com/facebookresearch/sam2 authors https://scholar.google.com/citations?user=4LWx24UAAAAJ&hl=en Puzzle: Distillation-Based NAS for Inference-Optimized LLMs (used for post-training llama to convert it into nemotron reasoning) https://arxiv.org/abs/2411.19146 Omnitalker: https://arxiv.org/abs/2504.02433v1 Empowering LLMs to Understand and Generate Complex Vector Graphics : https://arxiv.org/abs/2504.02433v1 Pushing the Limits of Large Language Model Quantization via the Linearity Theorem ---------------------- https://graphics.stanford.edu/~maneesh/ SCALING IN-THE-WILD TRAINING FOR DIFFUSION- BASED ILLUMINATION HARMONIZATION AND EDITING BY IMPOSING CONSISTENT LIGHT TRANSPORT Lvmin Zhang1, Anyi Rao2, Maneesh Agrawala Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Lvmin Zhang and Maneesh Agrawala Stanford University ---------------------

Click to open »

3D Generation

- April 08, 2025

Hi👋3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging https://stable-x.github.io/Hi3DGen/

Click to open »

Image Generation Models and Theory

- April 08, 2025

Image Generation ★ SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation https://nvlabs.github.io/Sana/Sprint/ ★ EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer (uses multiple control nets) https://easycontrolproj.github.io/ Ghibli Studio Control Image Generation with EasyControl https://huggingface.co/spaces/jamesliu1217/EasyControl_Ghibli ★ Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modeling https://github.com/Alpha-VLLM/Lumina-mGPT-2.0 A1111 SD Webui FOOOCUS Theory: PixelCNN Autoregressive Decoders Encoder - Decoder Architecture Diffusion random noise denoising attention mechanism DepthMap CannyEdge Map Pose Map u-net resnet control net vggnet VAE variational Autoencoder https://www.ibm.com/think/topics/variational-autoencoder https://www.tensorflow.org/tutorials/generative/cvae Gaussian Noise Compressed Latent Representation cross-attention ViT-L/14 Latent Diffusion Model (LDM) [backbone...

Click to open »

Meta AI Research // Metaverse

- April 08, 2025

Facebook AI Research Main Page https://ai.meta.com/research/ Meta Movito A first-of-its-kind behavioral foundation model for embodied humanoid virtual agents. https://metamotivo.metademolab.com/ DINOv2: Learning Robust Visual Features Without Supervision A family of models to encode visual features, evaluated across 30 different benchmarks covering 8 types of visual tasks from image classification to monocular depth estimation. https://dinov2.metademolab.com/ ----------------------------------------------------------------------------------------- Metaverse : Opportunity to begin developing early on Meta Avatars https://www.meta.com/avatars/ Meta Developers https://developers.meta.com/horizon/develop Meta Horizon Worlds https://developers.meta.com/horizon-worlds/

Click to open »

AI Video

- April 08, 2025

AI Video Generation Models Welcome to the Matrix Hunyuan Video Generation Main Page : https://aivideo.hunyuan.tencent.com/ Hunyuan GPU Requirements for generating 129 frames ------------------------------------------------------------------------------------------------- Hunyuan Video GPU Poor version by DeepBeepMeep https://github.com/deepbeepmeep/HunyuanVideoGP Reduce greatly the RAM requirements and VRAM requirements 5 profiles in order to able to run the model at a decent speed on a low end consumer config (32 GB of RAM and 12 VRAM) and to run it at a very good speed on a high end consumer config (48 GB of RAM and 24 GB of VRAM) Support multiple pretrained Loras with 32 GB of RAM or less Switch easily between Hunyuan and Fast Hunyuan models and quantized / non quantized models -------------------------------------------------------------------------------------------------------------------- Wan 2.1 Video Generation https://wan.video/ 👍 SOTA Performan...

Click to open »

Physical AI

- April 07, 2025

Le Robot: https://github.com/huggingface/lerobot SO-100 full tutorial : https://github.com/huggingface/lerobot/blob/main/examples/10_use_so100.md LeKIWI (With Wheels) : https://github.com/huggingface/lerobot/blob/main/examples/11_use_lekiwi.md SO-ARM100 Parts List : https://github.com/TheRobotStudio/SO-ARM100 French Physical AI Company : https://robots.phospho.ai/ ALOHA : https://tonyzhaozh.github.io/aloha/ Skild.ai : https://www.skild.ai/

Click to open »

AI Avatar

- April 05, 2025

OmniTalker: Video to Video (alibaba) https://humanaigc.github.io/omnitalker/ ACTalker: https://github.com/harlanhong/ACTalker Voice Cloner and Text-to-Speech: "Spark-TTS" An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens https://sparkaudio.github.io/spark-tts/ --------------------------------------------------------------------------------------- Image + Audio = Video with Hands and Face movement "EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation" https://github.com/antgroup/echomimic_v2 Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 11.7 Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G) Tested Python Version: 3.8 / 3.10 / 3.11 --------------------------------------------------------------------------------------- Image + Audio = Video with only Face movement "Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation" https://fudan-generative-vi...

Click to open »

HACKER NEWS

- April 04, 2025

24 April https://koomen.dev/essays/horseless-carriages/ https://terriblesoftware.org/2025/04/23/the-hidden-cost-of-ai-coding/ https://fedi.rib.gay/notes/a6xqityngfubsz0f Anti piracy advertisement font https://ludic.mataroa.blog/blog/i-accidentally-saved-half-a-million-dollars/ Morphik RAG https://github.com/morphik-org/morphik-core ★★★ https://www.pi.website/blog/pi05 Physical Intelligence Robot 21 April https://timsh.org/everyone-knows-your-location-part-2-try-it-yourself/ https://timsh.org/tracking-myself-down-through-in-app-ads/ https://www.ycombinator.com/companies/furtherai https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/ https://ai.google.dev/gemma/gemmaverse https://www.quiss.org/signal_carnival/ av cables old The appeal of serving your web pages with a single process https://github.com/valine/training-hot-swap/ Show HN: Keep your PyTorch model in VRAM by hot swapping code (github.com/valine...

Click to open »

Docker

- April 04, 2025

Watchtower A container-based solution for automating Docker container base image updates. With watchtower you can update the running version of your containerized app simply by pushing a new image to the Docker Hub or your own image registry. Watchtower will pull down your new image, gracefully shut down your existing container and restart it with the same options that were used when it was deployed initially. Run the watchtower container with the following command: https://containrrr.dev/watchtower/

Click to open »

UI UX

- April 03, 2025

Links: Penpot Open Source Figma Alternative Can work on Docker https://penpot.app/ IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers https://icon-shop.github.io/ OmniSVG : https://github.com/OmniSVG/OmniSVG

Click to open »

MCP Model Context Protocol

- April 03, 2025

Links: The S in MCP stands for Security https://elenacross7.medium.com/%EF%B8%8F-the-s-in-mcp-stands-for-security-91407b33ed6b OAuth's Role in MCP Security : https://defensiblesystems.substack.com/p/oauths-role-in-mcp-security ***MCP is not secure yet*** - https://github.com/modelcontextprotocol https://github.com/modelcontextprotocol/python-sdk - Open Web UI https://docs.openwebui.com/openapi-servers/mcp/ https://github.com/open-webui/mcpo https://docs.openwebui.com/openapi-servers/open-webui/ https://docs.openwebui.com/features/plugin/tools/development //tools SearXNG : https://docs.searxng.org/ - MCP Postgres Official: https://github.com/modelcontextprotocol/servers/tree/main/src/postgres - Github : Ollama-MCP-Postgres: https://github.com/robdodson/ollama-mcp-db/tree/main - Stuzero PG-MCP Server https://github.com/stuzero/pg-mcp - Gumloop MCP : Open-Source MCP for All https://www.gumloop.com/mcp https://www.gumloop.com/blog/announcing-gumcp - NPM - @modelcontextpro...

Click to open »

Segmentation Models

- April 03, 2025

Note: This page contains a large number of images and videos. Due to the size and quantity of the media, it may take some time for the content to fully load. Some images and videos may not appear immediately but will load shortly. May not work on Mobile

Click to open »