Breached Unity

Data Creep

In addition to my PDF Farm, I've now got a pod farm going. In six months, I've managed to amass more than 2 Gb of podcasts and audiobooks, the vast majority of which (about 95%) I have actually listened to. What to do with this stuff?

My dream would be an automatic Speech-to-Text engine running behind iTunes (where they all get imported) and in front of a robust indexing database. Then, while I'm searching for other bits of stuff, the indexed transcripts will return hits, too.

This kind of deep searching offsets some of what Maciej feared would happen if audio took over blogging. Winer and Curry have pushed podcasting through the roof, and for good reason. It's a medium that lends itself to exchanges of lots of audio information automatically. It's convenient to listen to and digest, if you're an auditory person (which, I admit, I'm not).

But we don't yet have any way of building this stuff into our external memory banks yet. We can't search it, we can't deconstruct it, and we can't remix it easily. iTunes does have some cool bookmarking features, but that's certainly not enough. If this stuff is to become useful in a meaningful way (as in, more than just a distribution medium for indy rock and the rants of geeks in their moms' basements), we need to be able to look inside and extract the contents of these podcasts.

Breached Unity

Samstag, Juli 09, 2005