Revolutionary 26M Parameter Model Needle Brings Tool Calling to Consumer Devices

By

Breaking: Needle Open-Sourced—Runs Function-Calling at 6000 Tok/s on Phones

A 26 million parameter model called Needle has been open-sourced, enabling real-time tool calling on consumer devices such as phones, watches, and glasses. The model achieves a prefill speed of 6,000 tokens per second and a decode speed of 1,200 tokens per second on standard hardware. Its creators, Cactus, released the model under an MIT license along with weights on Hugging Face.

Revolutionary 26M Parameter Model Needle Brings Tool Calling to Consumer Devices
Source: hnrss.org

“We were always frustrated by the little effort made towards building agentic models that run on budget phones,” said Henry, a developer at Cactus. “So we conducted investigations that led to an observation: agentic experiences are built upon tool calling, and massive models are overkill for it.”

What Makes Needle Different

Needle uses a “Simple Attention Network” architecture: it contains only attention and gating layers, with no traditional multilayer perceptrons (MLPs). The team found that cross-attention is the right primitive for tool calling, which is fundamentally a retrieval-and-assembly task—matching a query to a tool name, extracting argument values, and emitting JSON.

“Tool calling is fundamentally retrieval-and-assembly, not reasoning,” Henry explained. “Cross-attention is the right primitive for this, and FFN parameters are wasted at this scale.”

The model was pretrained on 200 billion tokens using 16 TPU v6e chips over 27 hours, then post-trained on 2 billion tokens of synthesized function-calling data in just 45 minutes. That dataset was generated via Gemini across 15 tool categories, including timers, messaging, navigation, and smart home devices.

Background

Function calling, or tool use, is a key capability for agentic AI systems—allowing a model to invoke external APIs and services. Until now, most such models have required billions of parameters, making them too large to run locally on consumer devices. Needle proves that a model less than 1% the size of its counterparts can compete in single-shot function-calling benchmarks.

Revolutionary 26M Parameter Model Needle Brings Tool Calling to Consumer Devices
Source: hnrss.org

In comparative tests, Needle outperformed FunctionGemma-270M, Qwen-0.6B, Granite-350M, and LFM2.5-350M on single-shot function calling. However, the team notes that those models have broader conversational capabilities. “We encourage you to test on your own tools via the playground and finetune accordingly,” Henry said.

What This Means

Needle’s extreme efficiency opens the door to running agentic AI directly on edge devices without cloud dependency. This could enable real-time voice assistants, smart glasses, and wearable task automation that respects user privacy and reduces latency.

The findings also generalize beyond function calling. “We found that the ‘no FFN’ finding generalizes beyond function calling to any task where the model has access to external structured knowledge—like RAG, tool use, or retrieval-augmented generation,” the team wrote. “The model doesn’t need to memorize facts in FFN weights if the facts are provided in the input.”

Availability and Next Steps

Needle is available now on GitHub and Hugging Face. Developers can test it immediately and finetune the model on a standard Mac or PC. It is part of the broader Cactus project (Cactus), a custom inference engine for mobile, wearables, and custom hardware.

The model is fully MIT licensed, and the team invites the community to experiment and adapt it for their own agentic use cases.

Tags:

Related Articles

Recommended

Discover More

FDA Closes Loophole for Compounded Weight Loss Drugs: What Patients Need to KnowAnthropic Rejects Chinese Push for AI Access, Deepening US-China Technology RiftThe Pacific's Power: How a Strong El Niño Could Push Climate Beyond a Critical ThresholdThe Evolution of Reproductive Technology: From IVF to AI and BeyondSecuring Your Yarbo Robot Lawn Mower: A Guide to Backdoor Removal and User Control