
Welcome to the second field report from the Build Small Hackathon, where we dove deep into the fascinating world of emergent AI economies. Our latest endeavor, “Thousand Token Wood” version 2, transforms a simple sandbox into a high-stakes finance drama. This time, you’re not just an observer; you’re the puppet master, a shadow financier pulling strings in a woodland market fraught with intrigue.
The original “Thousand Token Wood” offered a weather-god sandbox, where five woodland creatures traded goods, and you watched market bubbles and crashes unfold. It was a neat concept, but ultimately a passive experience. Version 2 fundamentally shifts this dynamic, placing you, the Patron of the Wood, squarely in the driver’s seat.
As the Patron, you wield immense power: lending at interest, whispering market tips (both true and false), shorting assets, bribing key players, and forging crucial alliances. But beware, a watchful magistrate actively hunts you for insider trading, always ready to fine you or freeze your assets. The biggest game-changer, however, lies beneath the surface: each creature now operates with a distinct small model from a different lab, creating a truly dynamic and unpredictable economy.
Crafting a Dynamic Digital Economy with Diverse AI Minds
The core innovation in Thousand Token Wood v2 isn’t just about making a game; it’s about embracing genuine heterogeneity. Instead of having a council of agents powered by a single model with varied prompts, we’ve deployed four distinct small models. These include OpenAI’s gpt-oss-20b, OpenBMB’s MiniCPM3-4B, NVIDIA’s Nemotron-Mini-4B, and our own fine-tuned Qwen 0.5B.
This deliberate choice for diversity is crucial because a market truly comes alive when its participants think and react differently. Models from various labs, trained on unique datasets and with different post-training protocols, naturally foster genuine behavioral distinctions. Imagine an owl that hoards differently from a fox that speculates – the council isn’t a script, it’s a live, evolving argument.
One of our most significant discoveries during this process was that the primary friction point lay almost entirely within the serving layer, not the modeling itself. To elegantly manage the diverse outputs from these heterogeneous models, we implemented a robust **tolerant JSON parse-and-repair layer**. This critical component ensures that no matter how different tokenizers or formatting habits malform output, the simulation never crashes, simply dropping unsalvageable parts. Once this layer was established, adding a new model became as simple as a configuration entry, not a refactor.
Safeguarding Secrets in an AI-Driven World
The dramatic heart of Thousand Token Wood v2 is the insider tip you can whisper to a creature. These tips can be genuinely true forecasts of upcoming market movements or carefully planted bait. Acting on a true tip and profiting raises your “heat” with the magistrate, potentially leading to investigations, fines, or even exile.
For this mechanic to be truly engaging, the truthfulness of a tip must remain absolutely hidden from the creatures themselves. They should only see the rumor text, never a hidden “true” or “false” flag. This isn’t merely a UI nicety; it’s a fundamental security property, especially sharp when dealing with small models that are prone to repeating whatever they find in their prompts.
Our solution involves a stringent firewall strategy: the crucial truth flag lives entirely **off-prompt**, stored only on the player’s ledger. It is meticulously stripped from the public event record during construction, ensuring the narrator only summarizes publicly observable events. Crucially, a dedicated test continuously scans *every creature’s full prompt* each turn for any banned tokens. This test is arguably the most vital in our entire suite, reinforcing a core principle: when you entrust an AI agent with secret information, always assume it will leak unless a rigorous test explicitly proves it cannot.
Building Persistent Personalities Without Prompt Bloat
To infuse our AI creatures with a sense of continuous drama, we implemented persistent relationships. Each creature maintains a signed sentiment toward the Patron and other creatures, constantly nudged by game events—a shorted crop, a repaid loan, or a new alliance. A creature turning hostile might refuse your loans or offer worse terms, while allied creatures cease undercutting each other, behaving like a cohesive cartel.
The main challenge here was avoiding “prompt inflation.” Raw historical data grows boundlessly, and a small model quickly drowns in it, losing coherence and performance. Our elegant solution was to never place raw history directly into the prompt. Instead, the model only ever sees a concise, one-line **bucketed summary** of its current feelings.
This summary, such as “you feel warmly toward Oona, wary of the Patron,” is capped to represent only the few strongest sentiments and is derived from integer sentiment values. While internal notes are kept for tracing, they are bounded and never exposed to the prompt. This approach creates a behavioral bias that is partly emergent from the summary’s nudge and partly mechanical, ensuring a strongly hostile creature deterministically refuses certain actions. This makes the system both observable and testable, rather than relying on mere hope.
Key Takeaways for Building with Small Models
Our journey through Thousand Token Wood v2, including a successful seeded run exercising all its complex mechanics, yielded invaluable insights into building with small models. The emergent information warfare, dynamic relationships, and leverage all functioned as designed.
- A small model shines as a reliable format generator but is an unreliable reasoner. This gap is effectively closed through smart structure, precise prompting, and targeted fine-tuning, rather than by simply scaling up the model.
- A heterogeneous council of agents is inherently more engaging and realistic than a homogeneous one. The cost for this added complexity is minimal once a robust serving layer is in place.
- Entrusting secret information to an agent is fundamentally a firewall problem. The solution belongs firmly in the data flow, rigorously proven by tests, not merely as an instruction within a prompt.
- Persistent memory is an incredibly cost-effective way to make AI agents feel alive. The trick is ensuring the prompt only ever interacts with a bounded, summarized version of that memory, preventing information overload.
Small models, it turns out, can indeed power big adventures. We’re proud to share that the entire council’s operations, along with their detailed traces, are open for exploration.
Source: Hugging Face Blog