Hi, I did this. The link that was posted doesn't include presenter notes by default, which is leading to some confusion. Check out https://docs.google.com/presentation/d/1sowJrQQfgxnLCErb-CvU... instead.

IIRC the term the FBI used to describe these activities was "confidential", which is why they registered the planes to front companies they made up. Congress wanted to know more, so the FBI gave them a confidential briefing (https://apnews.com/article/1240a8a42edf4a86aff72a0246525a95):

    The FBI assured Congress in an unusual, confidential briefing that
    its plane surveillance program is a by-the-books operation short
    on high-definition cameras — with some planes equipped with
    binoculars — and said only five times in five years has it tracked
    cellphones from the sky.

    The FBI would not openly answer some questions about its planes,
    which routinely orbit major U.S. cities and rural areas. Although
    the FBI has described the program as unclassified and not secret,
    it declined to disclose during an unclassified portion of a
    Capitol Hill briefing any details about how many planes it flies
    or how much the program costs. In a 2009 budget document, the FBI
    said it had 115 planes in its fleet.
In case you missed it, pretty much the first place I posted about what I'd found was here at HN (https://news.ycombinator.com/item?id=9508812).

Since then, I've done some other stuff in a similar vein.

I created the Advisory Circular network of twitter bots that post, in real-time, whenever they see aircraft circling (https://twitter.com/lemonodor/status/1294002338215034880). The code is all open source. The bots have helped me (and hopefully other people) discover all sorts of interesting things that aircraft are doing, often right over our heads: power line inspections, dropping sterile fruit flies, tests of new military technologies over the Mojave desert, retired attack helicopters fighting fires, and more.

ADS-B data includes information about navigation accuracy, and it turns out it's pretty easy to see when an aircraft is experiencing GPS/GNSS interference, and even map it. I created GPSJam (https://gpsjam.org) to make that data accessible to the public (instead of, say, paying tens of thousands of dollars to geospatial intelligence companies). On that map you can see things like conflict zones, U.S. military tests and training in the Southwest, and Russia's concern over increased risk of drone strikes deep into their territory.

The coolest thing about all this stuff is that it's not really very hard to do. It turns out as soon as you start paying attention to aircraft over the course of days, and weeks, you immediately find mysteries to solve.

One other less well-defined project I'll mention: Using whisper on aircraft radio traffic. I think of ATC radio as a completely unindexed, unsearchable, "dark web" of information, and Whisper can open it up and make it searchable. Whisper is the first speech recognition system I've seen that can handle not just the typically low quality audio, but also can take into account contextual information. E.g. some of the most useful information in a transmission on an ATC radio frequency is the call sign of the aircraft. But it's very hard for most speech recognizers to accurately transcribe: "7XY" is essentially just as likely as "1AC". Short, basically random utterances are hell on speech recognizers. But Whisper's killer feature IMO (but weirdly one that people rarely seem to use or even know about) is the powerful language model and its ability to be prompted.

Level 1 prompt engineering for Whisper is simply using a prompt like "Let's pretend we're air traffic controllers" or something to prime it to expect the specialized ATC lingo vs. any other thing people might be talking about. This prompt is specific to ATC, but is otherwise very general.

Level 2 becomes specific to the frequency you're transcribing: "Cessna, El Monte Tower, cleared for the option runway 01." Now Whisper knows that it's ATC, and that the name of the tower (which it will hear a lot) is El Monte, and that there's a runway numbered 01.

Level 3 is where you add additional time- and situation-dependent prompting to increase accuracy. If you look at ADS-B data, you can figure out which aircraft are/were in the area when the audio was recorded, that might be communicating on the radio. You can create prompts using those call signs, greatly increasing accuracy of transcription. (Some researchers have done work along these lines, pre-Whisper.)

An example of what Whisper makes possible: Here's a "supercut" of all the times either a pilot or ATC mentioned "laser", across multiple frequencies, across multiple days: https://twitter.com/lemonodor/status/1578516727549153280 Here's an example of what I'd like to be able to do (I created it manually, but I don't think it's too far out of reach), a video showing the aircraft map synchronized with ATC audio across frequencies, from a few days ago when a Cessna busted the presidential TFR near Philadelphia: https://twitter.com/lemonodor/status/1605293275333607424

Not Whisper-related, but just a fun proof-of-concept of a browser extension that lets you click on aircraft on the map and listen to them on the radio: https://twitter.com/lemonodor/status/1521551159206416384

Is that project public? I'd love to hack on it.

It’s really just a series of experiments so far, no code to share.