« Blacks In College, Blacks In Jail | Main | Outrage »

Technological Issues

Via Josh Marshall comes a copy of the SSCI Report featuring searchable text. Which leads to the question, why did we need some folks at MIT to put this together? Why didn't the committee release it this way? The document in question, pretty clearly, was typed up on a computer and the technological process for turning a word processor file into a searchable PDF is neither difficult to master, nor some kind of high-level secret. Instead, though, the committee had someone print out a copy of the report, literally black out the redactions (instead of doing virtual redaction ont he computer), and then scan the whole thing to create a non-searchable image file. This procedure seems to be both less useful to the public, and more logistically complicated than the alternative. Maybe it was just done by morons, but I can't help but think that the staff was deliberately trying to create a hard-to-use public version of the text so as to leave reporters maximally dependent on spin briefings from the staff rather than on the primary document. And as I've previously stated, an awful lot of stuff has been redacted -- I think there are people who don't really want us to know what this thing says.

July 13, 2004 | Permalink


TrackBack URL for this entry:

Listed below are links to weblogs that reference Technological Issues:

» The Technologically Impaired Congress from PoliBlog
Matthew Yglesias isn't happy that the SSCI Report was in the form an image, rather than a text document. Complains Matthew:Which leads to the question, why did we need some folks at MIT to put this together? Why didn't the... [Read More]

Tracked on Jul 13, 2004 4:50:55 PM

» Tech Savvy from Outside The Beltway
Matthew Yglesias sees either a conspiracy or gross incompetence on the part of those who released the Senate report on intelligence via a scanned photocopy rather than a searchable PDF document. Steven Taylor offers a simpler and more plausible expla... [Read More]

Tracked on Jul 14, 2004 8:32:10 AM

» Silicon Insider: The CIA Discovers the Blogosphere from the CIA Be Able
the CIA Be Able to Separate the Truth from Carefully Crafted Crazy Blogs? [Read More]

Tracked on May 6, 2006 10:24:31 PM

» Pacers Need O'Neal to Play, Calls to go Their Way from writes, The
politicking has begun. And can you blame the Indiana Pacers for taking to the campaign trail? For two straight games now, Jermaine O'Neal has been [Read More]

Tracked on May 22, 2006 7:55:00 AM

» Russia Said to Be on Edge of AIDS Crisis from to Be on Edge
to Be on Edge of AIDS Crisis As Neglect and Run-Down Health System Takes Toll [Read More]

Tracked on May 27, 2006 5:18:14 PM

» US and EU rule out new food laws to fight obesity from on Monday ruled
on the food industry to fight obesity, in spite of concern that the problem is becoming a serious threat to health worldwide. [Read More]

Tracked on May 28, 2006 3:23:03 PM

» Dunleavy: Rebraca Fine After Chest Pains from power forward
in three years. 'He had a little episode, but he was totally fine,' Coach Mike Dunleavy said of Rebraca. [Read More]

Tracked on May 28, 2006 11:04:18 PM

» Diabetes on the rise, study finds from ead full story
ead full story for latest details. [Read More]

Tracked on Jun 7, 2006 3:53:19 PM

» New Clues Show Al Qaeda's Trail in Pakistan from in Market Town
Town Describe Foreigners Buying Supplies, Driving Back Toward Afghan Border [Read More]

Tracked on Jun 7, 2006 10:19:38 PM

» Gates to Give Up Daily Role at Microsoft from Bill Gates Plans
Bill Gates Plans to Withdraw From Day-To-Day Role at Microsoft Over Next 2 Years [Read More]

Tracked on Jun 22, 2006 2:25:29 AM

» Vopak, Mitsui scrap plans for N.America tank JV from of all necessary
of a number of unresolved issues, including the timing and outcome of all necessary approvals," Vopak and Mitsui's U.S. unit said in a [Read More]

Tracked on Jul 25, 2006 12:04:34 AM

» TSX drops 47 points as golds and railways slump from index of the
main index of the TSX ended with a 47-point loss Friday on weakness in gold and railway stocks. [Read More]

Tracked on Jul 27, 2006 9:50:28 AM


Who gets to read the non-redacted versions? All other Senators? Just some? All house members? Cabinet members? Just curious.

Posted by: bob mcmanus | Jul 13, 2004 10:09:10 AM

I'm guessing the redactions caused problems. I'm not sure there's a way to prevent the original text from showing up - in some fashion (encrypted?) - that couldn't be decrypted by smart folks.

Of course, they could've just taken the original docs and snipped out all the redactions and substituted "redacted". but then we wouldn't know how much was removed, would we? The current format provides a better feel for that.

Posted by: SKD | Jul 13, 2004 10:12:15 AM

"Over the past few years there have been numerous cases in which classified information has leaked to the public domain because it was censored using Adobe Acrobat’s “black box“ feature."



Posted by: Ugh | Jul 13, 2004 10:13:53 AM

Even with the redaction in the scanned version, it's possible to figure out what the words are by their length. "Italian" or "SISMI" can be measured, you know.

Posted by: praktike | Jul 13, 2004 10:18:14 AM

Praktike -

That's easier done with a font like Arial in which all letters are the same width, whether upper or lower case. That eliminates one of the variable when measuring because you know the exact number of characters in the redacted portion (someone did this with the 9/11 commission report). However, the Iraq report was prepared in a font that does not have a uniform width, making it much harder to do because words like:


Can be the same width and yet vary by 3 characters.

Posted by: Ugh | Jul 13, 2004 10:24:35 AM

a font like Arial in which all letters are the same width

nitpick: Arial isn't a fixed width font.

Posted by: cleek | Jul 13, 2004 10:29:00 AM

my bad, was thinking of Courier.

Posted by: Ugh | Jul 13, 2004 10:32:02 AM

Can't you just pick the font and then compare?

Posted by: praktike | Jul 13, 2004 10:49:48 AM

Ugh's right. They don't do it because you can unredact electronic redactions. So set-tle.

Posted by: scda | Jul 13, 2004 10:51:39 AM

Actualy, I think variable width fonts are more useful. Fewer words will have exactly the same length allowing you to more accurately map length to word. For instance, initial and meme aren't actualy the same length in whatever font Matt uses. Meme is actualy somewhat longer.

With a fixed width font on the other hand, you are essentialy counting letters, which makes it much harder to decript.

Posted by: WilileStyle | Jul 13, 2004 10:56:23 AM

Gee whiz, for $350, they can have safe digital redacting!

What's the problem?

Posted by: Royko | Jul 13, 2004 11:01:24 AM

WillieStyle -

I think that might be true if you knew you had a single word and knew the length of the redaction. But most of the redactions in the report are of entire lines. So all you have is a redaction length and have to guess at the number of words and characters, as well as dealing with the fact that the characters are different widths.

In courier, you at least know the number of characters, which means you have an additional variable to the equation which reduces the number of possible solutions making guessing what was redacted easier.

Posted by: Ugh | Jul 13, 2004 11:13:47 AM

Mumble - Law - mumble - has to be a printed report - mumble - no law yet authorizing funds for electronic reporting of special senate committee reports?


Anyway - MIT did it. Open Source Government at work again, like the Miami Herald busting the Florida Felons List.

Posted by: Echo4Mike | Jul 13, 2004 11:40:16 AM

Congrats to the MIT folks. I hope they wrote it down because there's no way on God's green earth that the intel folks would let an electronic document with redactions out into the world.

Example: Back in my mis-spent youth I was a Technical lead in a CASE (Computer Aided Software Environment) company. No Such Agency bought some copies and I came out to install it. Once I got through the intense and remarkably personal screening (apparently I have no colon cancer), I was escorted into an office to install the software onto SUN boxes uses QIC tape. The tape's write tab was broken off, and an agency engineer typed in all the commands. But once the install was complete and I reached for the tape, I was told that I could no longer hold the tape. Once the read-only tape had been in the tape drive of a secure machine you had to have the same clearance level as the machine (word level) to even hold the tape. So now there were two escort parties, one for me and one for the tape. Needless to say I had to leave the tape there.

Nope, I can't see anyone with that kind of security training allowing electronic redaction of secure documents.

Posted by: Jon Gallagher | Jul 13, 2004 1:04:08 PM

I think there's a much simpler explanation -- most PCs in the Federal government may barely be capable of running Acrobat. For all the talk about government waste (or more likely because of it), most of their computers are unimaginably ancient compared to what we're used to in the private sector. It was only a few years ago that my wife had a job with the Federal government, and worked on a 286 sitting on a card table, with a Korean-war era office chair to sit on. No exaggeration.

Posted by: Redshift | Jul 13, 2004 2:19:31 PM

I'm sure that they're required to do it this way as there have been some serious problems with computer redactions. Namely, if you redact a Word file, the redacted segments are generally recoverable. Same for a pdf. And probably for a pdfed Word file. Redactions pre-scanning are pretty safe, however.

Posted by: Ethan | Jul 13, 2004 6:43:21 PM

I thought the cool part was the redaction at the top of the title page. WTF?

Posted by: serial catowner | Jul 13, 2004 8:08:41 PM

Matthew - I think this qualifies as "close enough for government work."

Posted by: Crank | Jul 15, 2004 1:51:47 PM

This is insane. It is childishly simple to safely delete from a Word document.

The problem is that this is simple in Unix or Linux, assuming you know, say, vim and groff. Basically, one has to delete first EVERYTHING that Word makes invisible, and at this point you have simply a text file. What you delete from a text file is truly deleted -- what you see is what is there. Then you format back so the document looks nice, groff is a relatively easy way to do it. Finally, you convert to pdf format by typing magic word ps2pdf.

OK, so perhaps one needs an adult to do it. I did it on occasion because I avoid using Word, so to edit whocares.doc, I convert it first into whocares.txt, and then I delete irrelevant info like the ownership of the computers on which the document was prepared.

Actually, one can also do almost the same thing with pff file that results from scanning the text. Afterwards, every single delete can look [...], while the file has no trace of what was deleted, how long it was, nothing.

Of course, even indicating what were the places of the deletions is optional.

Posted by: piotr berman | Jul 16, 2004 12:59:21 AM

The comments to this entry are closed.