About — Jan. 6 DOJ Archive

This archive is an attempt to restore what was already deleted and to preemptively archive a raft of material that has not yet been deleted but probably will be, given its thematic relationship to the material that was 86ed.

The Jan. 6 investigation was one of the largest investigations and collections of prosecutions in Justice Department history. This is the record the Justice Department is now trying to delete.

Any effort to erase history and replace it with lies warrants concerted pushback. In this case, the department has deleted a large repository of accessible public information about the storming of the Capitol and the individuals who did it. That data, unlike the court documents that lay beneath them, are in lay language. They are easily digestible by anyone interested. And they contain fair-minded summaries of evidence that—in the overwhelming majority of cases—was either proven in court beyond a reasonable doubt or pleaded to by defendants who ultimately conceded their truth.

Our principle

There’s a broad principle here, and we want to state it very clearly: If the administration purges rule-of-law-sensitive materials from government websites, we will do everything in our power to restore them on Lawfare.

The principle, as we instructed Anthropic’s Claude in building the programs that recovered these statements, is that “net loss of information to the public should be zero.”

How we did it

We want to be very candid about how we did this: we used artificial intelligence—Claude, to be precise—and given the volume of material in question, we have not hand-checked every single public statement recovered, though we have spot-checked cases and found the work to be accurate. This project is a work in progress, and mistakes and omissions are both possible. Please bring any such matters to our attention at press@lawfaremedia.org.

The recovery involved three programs. The first conducted a systematic inventory of the deleted materials and which of them did and did not have mirrored copies on the Internet Wayback Machine—a service that takes regular snapshots of web sites as a kind of archive of the internet designed to make historical erasures difficult.

Second, we had Claude build a program to ingest all of the recoverable material. Here is how Claude describes its methodology with respect to this project:

The deletions used two different patterns: some pages were straightforwardly removed (the URL now returns a “not found” error), but a large number were soft-deleted—the URL still loads successfully, but the page is now an empty 2.6-kilobyte shell with no content and no title. A naive crawler that only checked for error codes would have missed the soft-deletes entirely and reported the pages “fine.” We had to detect both.

The recovery worked by reconstructing what should exist from the Internet Archive’s Wayback Machine. We queried the Archive’s index (its “CDX” interface) for every URL that had ever been captured under the relevant DOJ and FBI URL patterns, filtered to January-6-related content by keyword, and ended up with an inventory of 6,055 unique URLs. For each one, we asked Wayback for its best pre-deletion snapshot—biasing toward the largest version captured before the May 25 scrub, so that the empty post-deletion shells the Archive had also captured wouldn’t contaminate the recovery—pulled the raw page bytes, extracted the text, and walked the page for every linked document, image, video, and audio file, retrieving each of those from Wayback as well. A subsequent recovery pass, conducted in collaboration with the Internet Archive itself, audited the full inventory and confirmed every URL was accounted for: either recovered as a distinct page or documented as a filter variant of a hub page we already had. The archive comprises 5,772 distinct recovered pages, broken down as 4,166 FBI “Wanted” suspect pages, 1,144 USAO-DC defendant case press releases, 387 Capitol Breach case-database pages, 38 Main Justice (OPA) press releases, 31 FBI Washington Field Office press releases, and 6 FBI headquarters press releases, plus 4,783 linked images (predominantly the FBI Wanted suspect photographs, ~464 MB) and 1 linked PDF (the D.C. Metropolitan Police Department’s January 15, 2021 “Persons of Interest” list, the wanted-faces document linked from the USAO-DC investigation hub). Every recovered page carries a link back to the exact Wayback snapshot it was derived from, so any reader can independently verify that the recovered content matches the original government capture, byte for byte. A full audit of the inventory is published in this archive as recovery_report.csv.

The third program is the website you are reading now, which organizes the recovered material into a searchable archive with full-text search, filters by defendant, charge, and home state, and a verification link back to the Internet Archive on every record.

Help us find more

If you know of other rule-of-law-sensitive materials being purged, let us know at press@lawfaremedia.org.

About this archive

Our principle

How we did it

Help us find more