Robotics' Gemma moment

SmolVLA from Hugging Face is one of the more exciting open-source releases because it moves the AI debate out of the chat box. Here, the focus is on models that can connect vision, language, and action.

VLA stands for vision-language-action. The model is designed to understand visual input, language commands, and the actions a robot can perform.

2025
launch
LeRobot
ecosystem
VLA
vision, language and action

Why SmolVLA is interesting

Robotics has traditionally required expensive hardware, closed systems, and specialized labs. SmolVLA points toward a more open model: shared datasets, more affordable robot arms, GitHub code, and models on Hugging Face.

This means more people can experiment with robot learning without building everything from scratch.

Open-source robotics becomes important when the model, the data, and the hardware can be learned from together.
SmolVLA makes open-source robotics less science fiction - Bilde 1

LeRobot as infrastructure

SmolVLA builds on LeRobot, Hugging Face's open framework for robotics. This makes the model more practical: you don't just get a weight file, but an ecosystem for datasets, training, evaluation, and robot setup.

For Norway, this could prove useful in education, automation, aquaculture, warehousing, laboratories, and small industrial environments that want to test robot AI without purchasing a fully closed platform.

Not humanoid hype

SmolVLA should not be read as a sign that everyone will soon have perfect robots at home. The practical breakthrough is far less dramatic: a lower barrier to experimentation.

Robots fail physically. They can drop objects, collide, be miscalibrated, or misread their environment. This is why open-source robotics must be more safety-oriented than ordinary software.

A robot model doesn't just need to be smart. It needs to be safe in the space it moves through.

Conclusion

SmolVLA makes robotics more accessible, more learnable, and more community-driven. That matters because the AI of the future won't just write text — it will need to understand and act in the physical world.

For Norwegian communities, this is a good time to start small: a robot arm on the table, open datasets, clear safety boundaries, and hands-on learning.