Hugging Face’s SmolVLM is a compact open multimodal model that accepts arbitrary sequences of image and text inputs to produce text outputs with unprecedented efficiency: it requires only 5.02 GB of GPU RAM December 2, 2024 // by Finnovate This content is for members only. Sign up for access to the latest trends and innovations in fintech. View subscription plans. Login