Why There's a Tanker in Central Madrid
We ingest about a million raw AIS messages every hour. Roughly four out of ten never make it to our database.
That is not because they are all wrong. Most of those rejected messages aren't position reports at all — they are vessel name broadcasts, safety messages, channel management commands, or interrogation requests that arrive on the same feed. Once you strip those out, you are left with the actual position data. A 300-metre oil tanker reporting its position as central Madrid. A bulk carrier allegedly doing 90 knots — which would make it faster than most warships. A cargo vessel that teleports from the North Sea to the Sahara Desert between two consecutive reports, three seconds apart.
The genuinely bad position data — invalid coordinates, impossible jumps, sentinel values from transponders that lost GPS — is a smaller fraction, probably in the single-digit percentages based on what we see and what academic literature reports. But even a few percent, at the volumes AIS produces, means tens of thousands of phantom ships drifting across continents every day.
This is AIS — the Automatic Identification System — the backbone of global vessel tracking. Every ship over 300 gross tonnes on an international voyage is required by SOLAS to broadcast its identity and location over VHF radio, every few seconds, around the clock. Around 400,000 vessels do this simultaneously, generating over 300 million messages per day. It is one of the largest real-time geospatial data streams on the planet.
Nobody warns you about this part.
We are VesselAPI, a two-person company in Málaga, Spain. We built a REST API that takes this raw radio data and turns it into something you can actually use. What follows is the story of how we learned, the hard way, that maritime data wants to lie to you — and the filtering we built to catch it.
What “Not Available” Looks Like at 161.975 MHz
The AIS specification — ITU-R M.1371-5, if you want to look it up — was designed by people who understood that transponders would sometimes have no idea where they are. So they built in sentinel values: specific numbers that mean “I don’t know.”
These are not real coordinates. They are the AIS equivalent of a shrug. A transponder that has lost its GPS fix, or has just been powered on and hasn’t acquired satellites yet, is supposed to transmit these values. The problem is that plenty of systems downstream — tracking platforms, analytics tools, map renderers — don’t check for them. They plot the point. And suddenly you have a vessel at 91° latitude, which is one degree past the North Pole, in mathematical space that doesn’t physically exist.
We filter these out. Latitude over 90, longitude over 180, SOG at 102.3, heading at 511 — gone before they touch the database. The same goes for (0, 0) — Null Island, a fictional place in the Gulf of Guinea that is the most popular port on Earth if you believe unfiltered GPS data.
The Pipeline
When a raw AIS message arrives, it passes through four stages before it becomes an API response. We did not plan four stages. We started with coordinate bounds checking and kept adding layers as new classes of garbage revealed themselves.
Message Type Filtering
AIS has 27 message types. Types 1, 2, and 3 are Class A position reports — the bread and butter, broadcast every 2 seconds to 3 minutes depending on speed and navigational status. A container ship doing 20 knots and changing course reports every 2 seconds. The same ship at anchor drops to every 3 minutes. Types 18 and 19 are Class B position reports from smaller vessels. The other 22 message types — binary data, safety broadcasts, channel management, interrogation requests — are not positions.
We were surprised how often non-position messages leaked into position processing. A Type 5 message (static and voyage data — ship name, dimensions, destination) has no coordinates but arrives on the same feed. Our first week in production, we had phantom entries with zeroed-out positions because we weren’t filtering on message type.
switch cache.MessageType {
case string(aisstream.POSITION_REPORT),
string(aisstream.STANDARD_CLASS_B_POSITION_REPORT),
string(aisstream.EXTENDED_CLASS_B_POSITION_REPORT):
// valid position type
default:
return nil
}
Three lines. They fixed a category of bad data that had cost us two days of debugging.
MMSI Validation
Every AIS transponder has a Maritime Mobile Service Identity — a 9-digit number that encodes what kind of entity is broadcasting. Ship stations use MMSIs in the range 100,000,000 to 799,999,999, where the first three digits (the MID — Maritime Identification Digits) roughly indicate the flag state’s region: 2xx for Europe, 3xx for the Americas, 4xx for Asia, and so on.
Outside that range, you get coast stations (prefixed 00), SAR aircraft (prefixed 111), man-overboard devices (972), and EPIRBs (974). All of these broadcast AIS, and none of them are ships. Then there are MMSIs that shouldn’t exist at all — misconfigured transponders with default factory values, test transmissions. We reject anything outside the vessel range:
if cache.MMSI < 100000000 || cache.MMSI > 799999999 {
return nil
}
There is also a subtler problem: MMSI sharing. When multiple vessels use the same MMSI — whether through misconfiguration or deliberate sanctions evasion — a single identity appears to teleport across oceans. Your tracking system shows one ship doing 4,000 knots because it is actually two ships on opposite sides of the Indian Ocean, alternating transmissions. This is a documented tactic used by the dark fleet. Kpler identified 261 vessels that spoofed AIS before being sanctioned. An estimated 600 to 1,000 vessels — roughly 10% of the global large oil tanker fleet — operate this way.
Coordinate Validation
After message type and MMSI filtering, we validate the coordinates themselves:
if cache.Latitude == 0 && cache.Longitude == 0 {
return nil
}
if cache.Latitude < -90 || cache.Latitude > 90 ||
cache.Longitude < -180 || cache.Longitude > 180 {
return nil
}
The (0, 0) check catches Null Island — what happens when a GPS chipset defaults to zero. The bounds check catches both corrupted data and the sentinel values from the spec (91° and 181° both fall outside the valid range). Simple, fast, eliminates a remarkable amount of junk.
But it has a fundamental limitation: it cannot tell you whether a position is plausible, only whether it is possible. A ship reporting its position as downtown Lagos passes every check — valid latitude, valid longitude, valid MMSI. It is also on land. We could add coastline polygon checks, but at ~235 messages per second on a single t3.medium instance, the spatial computation cost does not justify the catch rate. Instead, we handle plausibility in the next layer.
Jump Detection
The first three stages examine each message in isolation. This one looks at the sequence.
For every MMSI, we keep the last known good position in a sync.Map — Go’s concurrent map, which fits here because reads vastly outnumber writes and the key set (active vessel MMSIs) is relatively stable. When a new position arrives, we compute the implied speed: how fast would this ship have to be moving to get from the last known position to the new one?
const maxSpeedKnots = 500
func approxDistanceKm(lat1, lon1, lat2, lon2 float64) float64 {
const kmPerDeg = 111.0
dLat := (lat2 - lat1) * kmPerDeg
midLat := (lat1 + lat2) / 2.0 * math.Pi / 180.0
dLon := (lon2 - lon1) * kmPerDeg * math.Cos(midLat)
return math.Sqrt(dLat*dLat + dLon*dLon)
}
// In the detection logic:
distKm := approxDistanceKm(last.Latitude, last.Longitude, lat, lon)
speedKnots := (distKm / elapsedSeconds) * 3600.0 / 1.852
if speedKnots > maxSpeedKnots {
suspectedGlitch = true
}
The threshold is 500 knots — about 926 km/h, roughly 10 times the speed of the fastest commercial vessel on Earth. If the implied speed exceeds that, the position is flagged as a suspected glitch.
Why 500 and not something tighter, like 30 or 50? Because AIS messages arrive out of order, with gaps, from multiple sources with different latencies. A container ship at 20 knots that misses a few reports and then sends a batch can look like a jump. Setting the threshold at physical impossibility means we catch genuine GPS failures — the Madrid tanker, the Sahara cargo ship — without flagging normal transmission delays. We had an earlier version with the threshold at 50 knots, and it was flagging container ships rounding headlands in the English Channel. That lasted about a day.
The equirectangular distance approximation is deliberate. For consecutive AIS reports — typically seconds to minutes apart, so sub-10 km distances — the error is well under 1% at normal shipping latitudes. Even at 70°N, above most commercial routes, it stays under 5%. And against a 500-knot threshold, none of that matters. Haversine would be wasted precision.
The Cache Trick
The position cache has one design decision that makes the whole thing work: glitch positions do not update the cache.
if !suspectedGlitch {
c.positionCache.Store(mmsi, &cachedPosition{
Latitude: lat,
Longitude: lon,
Timestamp: timestamp,
})
}
If a ship sends a glitched position — say, it briefly appears in the Sahara — and we update the cache with that position, then the next real report from the English Channel would look like an impossible jump from the Sahara. One bad message poisons all future comparisons for that vessel.
By only caching clean positions, the system self-heals. A single GPS spike gets flagged. The next legitimate report compares against the last good position and passes normally. The glitch never propagates.
The cache grows unboundedly right now — one entry per active MMSI, so around 60-70K entries in steady state. At ~100 bytes per entry, that is under 10 MB. We should probably add a TTL to evict vessels that stop reporting, but in practice it has not been a problem yet.
Production
Here is our monitoring map from February 5, 2026 — the day we deployed the jump detection layer:
Green dots: valid vessel positions. Red triangles: suspected GPS glitches. Production data, February 5, 2026.
Green dots: valid vessel positions, tracing coastlines and shipping lanes. Red triangles: suspected glitches, scattered across sub-Saharan Africa, the Brazilian interior, the South Atlantic.
Cintia pulled up the map the morning after we deployed it and called me over — “come look at Africa.” We’d been dismissing those positions as weird data for weeks. They were not weird. They were systematic GPS failures that had been flowing through to our API consumers the entire time.
In a 24-hour window: 20.3 million positions, 64,412 unique vessels. 966 H3 cells — a hexagonal spatial index, each cell about 250 km² at resolution 5 — contain only glitch positions. The Sahara, the Congo, landlocked South America. We should have caught it sooner.
Why AIS Data Is This Bad
How does a system used by 400,000 vessels, mandated by international convention, produce this much junk?
Start with the radio layer. Each AIS VHF channel gets exactly 2,250 time slots per minute. Class A transponders use SOTDMA — self-organizing time division multiple access — which lets them reserve a slot through a negotiation protocol. Class B CS transponders (used on smaller vessels) use carrier-sense TDMA, which is less deterministic: they listen for an opening and try to grab one. Newer Class B+ units also use SOTDMA, but the older CS units are still everywhere. In congested waters like the Singapore Strait, there are so many ships that Class B units cannot find empty slots. They fail to transmit, or their messages collide.
Then there is GPS itself. Multipath reflection — signals bouncing off containers, crane structures, bridge superstructure — introduces positioning errors. And there is a configuration problem that nobody talks about enough: if a navigator sets a GPS offset correction on the bridge, the AIS transponder broadcasts the wrong position by exactly that offset for the entire voyage. Not spoofing. Just a button nobody remembered to reset.
And then there is actual spoofing. In June 2017, around 20 vessels in the Black Sea simultaneously reported positions miles inland on Russian territory — the first widely documented case of mass maritime GPS spoofing. In 2019, hundreds of ships near Shanghai saw their positions form strange rotating circles — up to 200 metres in radius — near oil terminals and government buildings. C4ADS documented the pattern. MIT Technology Review described it as a form of GPS spoofing that had never been seen before. And in March 2026, per Windward’s GPS monitoring data, over 1,100 vessels in the Persian Gulf were disrupted in a single day following military strikes on Iran. Ships appeared inside airports, near nuclear facilities, deep in the Iranian interior.
AIS was designed for collision avoidance, not adversarial security. There is no authentication in the protocol. SOLAS can require you to carry a transponder. It cannot require the transponder to be honest.
Flag, Don’t Discard
Our first instinct was to delete glitch positions. Don’t store them, don’t serve them, pretend they never happened. We had it built that way for about three weeks before a customer asked whether we could expose the glitch data — they were building a monitoring tool that specifically needed to detect GPS manipulation patterns.
So we changed it. We store every glitch-flagged position with suspected_glitch: true and return it in API responses. If you are building a map, you filter them out. If you are doing sanctions compliance work, those anomalies are exactly what you are looking for.
Our filter pipeline is maybe 250 lines of Go. It runs on a single EC2 t3.medium — 2 vCPUs, 8 GB RAM, the kind of instance you forget exists until the bill shows up. About a million raw messages come in per hour; roughly 600,000 clean positions come out the other end. Nothing about this is impressive infrastructure. We built it in a week and have not thought about it much since, except when it catches something bizarre and you go “huh, a ship in Chad.”
The ITU got the spec right. The problem is everything between the specification and the antenna — the part where the real world gets involved. If you are thinking about building on AIS data, budget more time for filtering than you think you need. We did not, and we spent a month serving phantom ships to paying customers before we noticed.