smolagents

Team
community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

akseljoonasย  updated a Space about 7 hours ago
smolagents/ml-agent
akseljoonasย  published a Space 7 days ago
smolagents/ml-agent
akseljoonasย  updated a Space 8 days ago
smolagents/ml-agent
View all activity

victorย 
posted an update 6 days ago
view post
Post
339
Interesting article: use Claude Code to help open models write CUDA kernels (for eg) by turning CC traces into Skills. They made a library out of it ๐Ÿ‘€

https://huggingface.co/blog/upskill
evalstateย 
posted an update 7 days ago
view post
Post
171
Hugging Face MCP Server v0.3.1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Streamable HTTP used for Gradio Connectivity
- SSE Transport (as Server) removed
- Proxy Configuration added for launch of sub-agent tools

victorย 
posted an update about 2 months ago
view post
Post
3382
Nvidia is on a roll lately. Nemotron 3 Nano is my new fav local model, but here's the real flex: they published the entire evaluation setup. Configs, prompts, logs, all of it. This is how you do open models ๐Ÿ”ฅ

https://huggingface.co/blog/nvidia/nemotron-3-nano-evaluation-recipe

evalstateย 
posted an update 3 months ago
view post
Post
2536
Hugging Face MCP Server v0.2.46
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Add "discover" to Dynamic Space tool. Recommend deselecting "space_search" if using dynamic spaces.
evalstateย 
posted an update 3 months ago
view post
Post
3020
Hugging Face MCP Server v0.2.45
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- New! Experimental dynamic_space tool.
- Default Image Generator changed to Qwen-Image-Fast
evalstateย 
posted an update 3 months ago
view post
Post
2246
Hugging Face MCP Server v0.2.40
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Improved progressive disclosure and descriptions for Jobs tool.
abidlabsย 
posted an update 3 months ago
view post
Post
9523
Why I think local, open-source models will eventually win.

The most useful AI applications are moving toward multi-turn agentic behavior: systems that take hundreds or even thousands of iterative steps to complete a task, e.g. Claude Code, computer-control agents that click, type, and test repeatedly.

In these cases, the power of the model is not how smart it is per token, but in how quickly it can interact with its environment and tools across many steps. In that regime, model quality becomes secondary to latency.

An open-source model that can call tools quickly, check that the right thing was clicked, or verify that a code change actually passes tests can easily outperform a slightly โ€œsmarterโ€ closed model that has to make remote API calls for every move.

Eventually, the balance tips: it becomes impractical for an agent to rely on remote inference for every micro-action. Just as no one would tolerate a keyboard that required a network request per keystroke, users wonโ€™t accept agent workflows bottlenecked by latency. All devices will ship with local, open-source models that are โ€œgood enoughโ€ and the expectation will shift toward everything running locally. Itโ€™ll happen sooner than most people think.
ยท
evalstateย 
posted an update 3 months ago
view post
Post
367
Hugging Face MCP Server v0.2.35
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$HF_TOKEN is expanded in Jobs Secrets environment variables.
evalstateย 
posted an update 4 months ago
view post
Post
358
Hugging Face MCP Server v0.2.33
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Allow discovery of Product Documentation Library via the Search tool.
anditoย 
posted an update 4 months ago
view post
Post
2061
Finally, our new paper is out! "๐—™๐—ถ๐—ป๐—ฒ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป: ๐—ข๐—ฝ๐—ฒ๐—ป ๐——๐—ฎ๐˜๐—ฎ ๐—œ๐˜€ ๐—”๐—น๐—น ๐—ฌ๐—ผ๐˜‚ ๐—ก๐—ฒ๐—ฒ๐—ฑ"! ๐Ÿฅณ
FineVision: Open Data Is All You Need (2510.17269)

If you've ever trained a VLM, you know this problem: nobody shares their data mixtures. It's a black box, making replicating SOTA work impossible.
We wanted to change that.

FineVision unifies 200 sources into 24 million samples. With 17.3 million images and 9.5 billion answer tokens, it's the largest open resource of its kind.

In the paper, we share how we built it:
๐Ÿ” finding and cleaning data at scale
๐Ÿงน removing excessive duplicates across sources
๐Ÿค— decontaminating against 66 public benchmarks

My favorite part is Figure 6 (in the video!). It's our visual diversity analysis. It shows that FineVision isn't just bigger; it's more balanced and conceptually richer than other open datasets.
NVIDIA's Eagle 2 paper highlighted just how critical this visual diversity is, and our results confirm it: models trained on FineVision consistently outperform those trained on any other open dataset on 11 benchmarks!

๐ŸŽ‰ To celebrate the paper, Iโ€™m also releasing a concatenated and shuffled version of the full dataset! ๐Ÿ‘‰HuggingFaceM4/FineVision_full_shuffled

Itโ€™s ready to stream, so you can start training your own models right away:

from datasets import load_dataset
d = load_dataset("HuggingFaceM4/FineVision_full_shuffled", split="train", streaming=True)
print(next(iter(d)))

A big shoutout to the first authors: Luis Wiedmann and Orr Zohar. They are rockstars!
merveย 
posted an update 4 months ago
view post
Post
8637
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
ยท
evalstateย 
posted an update 4 months ago
view post
Post
267
Hugging Face MCP Server v0.2.31
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- OpenAI Apps SDK Support for Gradio Content Generation spaces