Posts

Showing posts from April, 2025

Python

 Glossary :  https://docs.python.org/3/glossary.html#glossary Tutorial :  https://docs.python.org/3/tutorial/index.html Python 13 What's new :  https://docs.python.org/3.13/whatsnew/3.13.html Python Standard Library :  https://docs.python.org/3.13/library/index.html The Python Language Reference :  https://docs.python.org/3.13/reference/index.html Books :  Python Distilled Automate the Boring Stuff Quick Python

Python

 Glossary :  https://docs.python.org/3/glossary.html#glossary Tutorial :  https://docs.python.org/3/tutorial/index.html Python 13 What's new :  https://docs.python.org/3.13/whatsnew/3.13.html Python Standard Library :  https://docs.python.org/3.13/library/index.html The Python Language Reference :  https://docs.python.org/3.13/reference/index.html Books :  Python Distilled Automate the Boring Stuff Quick Python

Blogs

https://mohitdagarwal.substack.com/p/from-dominance-to-dilemma-nvidia https://cloud.google.com/blog https://ludic.mataroa.blog/blog/i-accidentally-saved-half-a-million-dollars/

OCR

Docling :  https://github.com/docling-project Docling Docs :  https://docling-project.github.io/docling/installation/ OCR Engines : EasyOCR, Tesseract, OcrMac, RapidOCR, OnnxTR -- General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model GOT https://github.com/Ucas-HaoranWei/GOT-OCR2.0 https://huggingface.co/stepfun-ai/GOT-OCR-2.0-hf Gives CUDA version and Transformers library errors -- OLMOCR Main Page :  https://olmocr.allenai.org/ https://github.com/allenai/olmocr --- ★ PDF-Extract-Kit (huge resource requirement) https://github.com/opendatalab/PDF-Extract-Kit -- Reddit  https://www.reddit.com/r/LocalLLaMA/comments/172k9q2/best_model_for_document_layout_analysis_and_ocr/ -- ★   Facebook AI Research Nougat: Neural Optical Understanding for Academic Documents:   https://facebookresearch.github.io/nougat/ -- Donut 🍩 : Document Understanding Transformer:  https://github.com/clovaai/donut/ -- HURIDOCS New open-source AI tool unl...

LLM

 MAMBA https://www.datacamp.com/tutorial/introduction-to-the-mamba-llm-architecture

NVIDIA

  Developer.Nvidia.Com https://developer.nvidia.com/blog/build-enterprise-ai-agents-with-advanced-open-nvidia-llama-nemotron-reasoning-models/ https://docs.nvidia.com/ai-workbench/user-guide/latest/reference/version-history.html Books :  https://www.nvidia.com/en-in/training/books/ Main Book LDL Coding Examples :  https://github.com/NVDLI/LDL NVIDIA On-Demand :   https://www.nvidia.com/en-us/on-demand/ NVIDIA Research https://www.nvidia.com/en-us/research/

ML

  https://mixpeek.com/blog/turning-frames-into-dataframes

Websites of Interest

  37signals.com https://www.paulgraham.com https://www.paulgraham.com/ace.html https://artificialanalysis.ai/ https://www.skild.ai/ https://stackblitz.com/careers Unitree Bilibili :  https://space.bilibili.com/521974986 https://lfaidata.foundation/projects/ - Docling :  https://github.com/docling-project - Open Platform for Enterprise AI :  https://opea.dev/ Linux Foundation OPEA Week Event :  https://opea.dev/event/opea-genai-streamlined/#GenAI - OpenVINO: https://openvinotoolkit.github.io/openvino_notebooks/ https://openvinotoolkit.github.io/openvino_notebooks/?tasks=Image-to-Text https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks - https://www.thisiscolossal.com/       Photography News https://www.ycombinator.com/launches https://www.cekura.ai/ https://www.ycombinator.com/launches/M57-cekura-formerly-vocera-testing-monitoring-for-ai-voice-agents https://www.sievedata.com

Project Management

Image
  https://www.paulgraham.com/makersschedule.html PMBOK

Research Papers

https://garrickbrazil.com/omni3d/ https://github.com/facebookresearch/sam2              authors https://scholar.google.com/citations?user=4LWx24UAAAAJ&hl=en Puzzle: Distillation-Based NAS for Inference-Optimized LLMs (used for post-training llama to convert it into nemotron reasoning) https://arxiv.org/abs/2411.19146 Omnitalker:  https://arxiv.org/abs/2504.02433v1 Empowering LLMs to Understand and Generate Complex Vector Graphics :  https://arxiv.org/abs/2504.02433v1 Pushing the Limits of Large Language Model Quantization via the Linearity Theorem ---------------------- https://graphics.stanford.edu/~maneesh/ SCALING IN-THE-WILD TRAINING FOR DIFFUSION- BASED ILLUMINATION HARMONIZATION AND EDITING BY IMPOSING CONSISTENT LIGHT TRANSPORT Lvmin Zhang1, Anyi Rao2, Maneesh Agrawala Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Lvmin Zhang and Maneesh Agrawala Stanford University ---------------------

3D Generation

Hi👋3DGen: High-fidelity 3D Geometry Generation from Images via Normal Bridging https://stable-x.github.io/Hi3DGen/

Image Generation Models and Theory

Image Generation ★ SANA-Sprint: One-Step Diffusion with Continuous-Time  Consistency Distillation https://nvlabs.github.io/Sana/Sprint/ ★  EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer (uses multiple control nets) https://easycontrolproj.github.io/ Ghibli Studio Control Image Generation with EasyControl https://huggingface.co/spaces/jamesliu1217/EasyControl_Ghibli ★  Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modeling  https://github.com/Alpha-VLLM/Lumina-mGPT-2.0 A1111 SD Webui FOOOCUS Theory: PixelCNN Autoregressive Decoders Encoder - Decoder Architecture Diffusion random noise denoising attention mechanism DepthMap CannyEdge Map Pose Map u-net resnet control net vggnet VAE variational Autoencoder   https://www.ibm.com/think/topics/variational-autoencoder https://www.tensorflow.org/tutorials/generative/cvae Gaussian Noise Compressed Latent Representation cross-attention ViT-L/14  Latent Diffusion Model (LDM) [backbone...

Meta AI Research // Metaverse

Facebook AI Research Main Page   https://ai.meta.com/research/ Meta Movito A first-of-its-kind behavioral foundation model for embodied humanoid virtual agents. https://metamotivo.metademolab.com/ DINOv2: Learning Robust Visual Features Without Supervision A family of models to encode visual features, evaluated across 30 different benchmarks covering 8 types of visual tasks from image classification to monocular depth estimation. https://dinov2.metademolab.com/ ----------------------------------------------------------------------------------------- Metaverse : Opportunity to begin developing early on Meta Avatars https://www.meta.com/avatars/ Meta Developers https://developers.meta.com/horizon/develop Meta Horizon Worlds https://developers.meta.com/horizon-worlds/

AI Video

Image
 AI Video Generation Models Welcome to the Matrix Hunyuan Video Generation Main Page :  https://aivideo.hunyuan.tencent.com/ Hunyuan GPU Requirements for generating 129 frames  ------------------------------------------------------------------------------------------------- Hunyuan Video GPU Poor version by DeepBeepMeep   https://github.com/deepbeepmeep/HunyuanVideoGP Reduce greatly the RAM requirements and VRAM requirements 5 profiles in order to able to run the model at a decent speed on a low end consumer config (32 GB of RAM and 12 VRAM) and to run it at a very good speed on a high end consumer config (48 GB of RAM and 24 GB of VRAM) Support multiple pretrained Loras with 32 GB of RAM or less Switch easily between Hunyuan and Fast Hunyuan models and quantized / non quantized models -------------------------------------------------------------------------------------------------------------------- Wan 2.1 Video Generation https://wan.video/ 👍 SOTA Performan...

Physical AI

 Le Robot: https://github.com/huggingface/lerobot SO-100 full tutorial :  https://github.com/huggingface/lerobot/blob/main/examples/10_use_so100.md LeKIWI (With Wheels) :  https://github.com/huggingface/lerobot/blob/main/examples/11_use_lekiwi.md SO-ARM100 Parts List :  https://github.com/TheRobotStudio/SO-ARM100 French Physical AI Company :  https://robots.phospho.ai/ ALOHA :  https://tonyzhaozh.github.io/aloha/ Skild.ai :  https://www.skild.ai/

AI Avatar

OmniTalker: Video to Video (alibaba) https://humanaigc.github.io/omnitalker/ ACTalker:  https://github.com/harlanhong/ACTalker Voice Cloner and Text-to-Speech: "Spark-TTS" An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens https://sparkaudio.github.io/spark-tts/ --------------------------------------------------------------------------------------- Image + Audio  =  Video with Hands and Face movement  "EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation" https://github.com/antgroup/echomimic_v2 Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 11.7 Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G) Tested Python Version: 3.8 / 3.10 / 3.11 --------------------------------------------------------------------------------------- Image + Audio = Video with only Face movement "Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation" https://fudan-generative-vi...

HACKER NEWS

24 April https://koomen.dev/essays/horseless-carriages/ https://terriblesoftware.org/2025/04/23/the-hidden-cost-of-ai-coding/ https://fedi.rib.gay/notes/a6xqityngfubsz0f Anti piracy advertisement font https://ludic.mataroa.blog/blog/i-accidentally-saved-half-a-million-dollars/ Morphik RAG  https://github.com/morphik-org/morphik-core ★★★ https://www.pi.website/blog/pi05   Physical Intelligence Robot 21 April https://timsh.org/everyone-knows-your-location-part-2-try-it-yourself/ https://timsh.org/tracking-myself-down-through-in-app-ads/ https://www.ycombinator.com/companies/furtherai https://developers.googleblog.com/en/gemma-3-quantized-aware-trained-state-of-the-art-ai-to-consumer-gpus/ https://ai.google.dev/gemma/gemmaverse https://www.quiss.org/signal_carnival/  av cables old The appeal of serving your web pages with a single process https://github.com/valine/training-hot-swap/    Show HN: Keep your PyTorch model in VRAM by hot swapping code (github.com/valine...

Docker

Watchtower A container-based solution for automating Docker container base image updates. With watchtower you can update the running version of your containerized app simply by pushing a new image to the Docker Hub or your own image registry. Watchtower will pull down your new image, gracefully shut down your existing container and restart it with the same options that were used when it was deployed initially. Run the watchtower container with the following command: https://containrrr.dev/watchtower/

UI UX

Links: Penpot Open Source Figma Alternative Can work on Docker https://penpot.app/ IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers https://icon-shop.github.io/ OmniSVG :  https://github.com/OmniSVG/OmniSVG

MCP Model Context Protocol

Image
 Links: The S in MCP stands for Security https://elenacross7.medium.com/%EF%B8%8F-the-s-in-mcp-stands-for-security-91407b33ed6b OAuth's Role in MCP Security : https://defensiblesystems.substack.com/p/oauths-role-in-mcp-security ***MCP is not secure yet*** - https://github.com/modelcontextprotocol https://github.com/modelcontextprotocol/python-sdk - Open Web UI https://docs.openwebui.com/openapi-servers/mcp/ https://github.com/open-webui/mcpo https://docs.openwebui.com/openapi-servers/open-webui/ https://docs.openwebui.com/features/plugin/tools/development  //tools SearXNG :  https://docs.searxng.org/ - MCP Postgres Official: https://github.com/modelcontextprotocol/servers/tree/main/src/postgres - Github : Ollama-MCP-Postgres: https://github.com/robdodson/ollama-mcp-db/tree/main - Stuzero PG-MCP Server https://github.com/stuzero/pg-mcp - Gumloop MCP : Open-Source MCP for All https://www.gumloop.com/mcp https://www.gumloop.com/blog/announcing-gumcp - NPM - @modelcontextpro...

Segmentation Models

Image
Note: This page contains a large number of images and videos. Due to the size and quantity of the media, it may take some time for the content to fully load. Some images and videos may not appear immediately but will load shortly. May not work on Mobile