
Since its launch in November 2025, the original OlmoEarth model has empowered partners worldwide to tackle critical environmental challenges. From meticulously tracking mangrove deforestation to classifying drivers of forest loss and rapidly producing country-scale crop-type maps, OlmoEarth has scaled deployments to national, continental, and even global areas. Every new release brings us closer to our core mission: equipping organizations and communities with cutting-edge AI to protect people and our precious planet.
When you’re processing vast areas of satellite imagery—tens to hundreds of thousands of square kilometers—efficiency isn’t just a bonus; it’s essential. The entire operational lifecycle of OlmoEarth, encompassing data export, preprocessing, inference, and post-processing, reveals that compute costs are by far the most significant expenditure. A more efficient model translates directly into the ability to support more partners on the OlmoEarth Platform, while also allowing anyone running OlmoEarth independently to leverage this powerful technology faster and at a lower expense.
That’s precisely why we developed OlmoEarth v1.1, a brand-new family of models designed to revolutionize efficiency. This innovative update slashes compute costs by up to three times, all while maintaining the robust performance of OlmoEarth v1 across a diverse mix of research benchmarks and real-world tasks developed with our partners. It’s about doing more with less, empowering even greater impact.
Unlocking Greater Efficiency Through Token Optimization
At its heart, OlmoEarth leverages transformer-based models, an architecture that has become dominant in modern machine learning. To process the rich data from remote sensing, we first convert it into a sequence of discrete tokens that the model can understand and ingest. This tokenization process is a crucial step in preparing complex geospatial information for AI analysis.
Efficiency in transformer models is primarily controlled by two key levers: the overall model size, which is why we offer a family of models to fit various compute budgets, and the token sequence length. The computational cost for these models scales quadratically with the token sequence length, meaning even small reductions can lead to significant cost savings during runtime. A common metric, Multiply-Accumulate Operations (MACs), helps estimate the computational load for a single model pass, with lower MACs generally indicating cheaper and faster inference.
Rethinking Remote Sensing Tokens
The design of these tokens raises a fundamental question for transformer-based remote sensing models: what information should a single token represent? Consider Sentinel-2 imagery, a widely used data modality that we frequently process. A typical Sentinel-2 input is a tensor containing height, width (representing latitudinal and longitudinal pixels), a temporal dimension, and 12 distinct Sentinel-2 channels.
Our previous approach, in OlmoEarth v1, involved splitting this data into resolution-based spatial patches. Specifically, we would select a spatial patch size, say ‘p’ x ‘p’, and then create a unique token for each timestep and each resolution within that patch. For example, a Sentinel-2 input with two timesteps would yield six tokens per patch (two timesteps multiplied by three resolutions: 10m, 20m, and 60m), leading to a significant total token count.
While using a unique token per resolution is a common technique in processing Sentinel-2 data—employed by models like Galileo and SatMAE, with SatMAE showing improved results—it’s not universal. Models like CROMA, for instance, use a single token for all bands regardless of resolution. This approach drastically reduces token counts by three times, leading to substantial savings across pre-training, fine-tuning, and inference stages due to the multiplicative nature of token compounding.
However, simply combining these resolution-specific tokens naively into a single token comes with a major drawback: a significant drop in performance. We observed a 10 percentage point decrease on the m-eurosat kNN benchmark, a standard task for remote sensing models. Our hypothesis suggests that separating Sentinel-2 bands into distinct tokens made it easier for OlmoEarth to accurately model crucial cross-band relationships, which are vital for understanding complex environmental phenomena. To merge tokens effectively without sacrificing performance, we developed and implemented significant modifications to our pre-training regimen, details of which are thoroughly documented in our technical paper.
OlmoEarth v1.1: A Win for Everyone
The result of this innovative work is a model family that truly does more with less. At every size, OlmoEarth v1.1 operates up to three times more affordably than its predecessor, OlmoEarth v1. This makes frequent, planet-scale map refreshes significantly more accessible and cost-effective for every team leveraging OlmoEarth. If you’re currently utilizing a model from the original OlmoEarth family, we strongly encourage you to try OlmoEarth v1.1. It delivers comparable performance to v1 while demanding only a third of the compute, and although some minor regressions have been observed in specific cases (detailed in our technical report), most users will experience a significant speedup during fine-tuning and inference for their tasks.
For researchers, pretrained remote sensing models present a complex landscape with many degrees of freedom, making it challenging to isolate the impact of specific changes. OlmoEarth v1.1 was trained on the identical dataset as OlmoEarth v1, allowing any performance differences between the two to clearly highlight the effects of our methodological improvements. We are hopeful that this controlled comparison will advance our understanding of the scientific principles governing the pretraining of models for remote sensing applications.
Ready to experience the enhanced efficiency of OlmoEarth v1.1? You can get started today by exploring the updated weights and training code, which include our versatile Base, Tiny, and Nano models. Join us in making cutting-edge geospatial AI more accessible and sustainable for a better planet.
Source: Hugging Face Blog