This afternoon at the SIIA NetGain Convergence of Content and Technology conference, Brewster Kahle, the Founder & Digital Librarian of the Internet Archive is speaking. I am most interested in hearing what Kahle has to say, esp. with regard to the work that's done at Placeblogger, and with numbers of sites that simply vanished when web hosting services like Geocities have been shut down.
I asked Kahle about this before he spoke, and he and I agreed that, unfortunately, a lot of it will just go away. Some is archived and could be found via the WayBack Machine. Still, some is lost forever.
Kahle begins that we could indeed have universal access to all knowledge. He's going to show how technological, copyright, etc issues are coming together...
Looking at Text, if we try to put everything online, how much is there already? and what needs to be added? the 26million books in the Library of Congress? and how hard is that to get online. One book is about 1MB. It would cost too much to put all of the Library of Congress online. But what would you get for it and would anyone care??
We're starting to get print on demand services, as well as things that let you read online in a manner similar to turning pages in a book. So, we can go from digital to print, and then from print back to digital.
When print on demand went to countries such as Egypt, they found they had lots of old books, but not a lot of new books...
Amazon does much print on demand--we just don't know it!
One Laptop Per Child devices can be used like Kindle. Kahle shows us one used in this manner.
How do we get all those books online? Shipped to India and China, but found that scan-your-own was better. Robots broke down. Created a special scanner with two mirrors that, while looking primitive, is highly effective at $0.10 per page! It takes about 12 hours for the computer to do the processing to a PDF (very time consuming but effective.)
There are 18 scanning centers across the U.S., with 200 people working for the Internet Archive and 50 scanning books. They get about 1,000 books scanned a day!
There are about 1 million books in 8 collections. All are out-of-copyright books. Copyright raises issues.
Audio: how much audio is there? and how does it get processed? About 2-3 million. and could be easily put online, but it is highly litigious. So, they stay away from highly commercial works.
One big success is among rock musicians. Greatful Dead in particular. Allows for trading of music as long as *no money is made from the activity!* But as stuff got online, trading became harder. Internet Archive offered unlimited storage, bandwidth for free. Turns out there are other bands that do what the Dead does. One to three bands a day signed up, and now there's 3,000 plus bands and their live recordings. Some communities can be helped and supported, like this community, in open ways.
Internet Archive has 200,000 audio items in over 100 collections!
It's smaller than text and different legally, so not handled the same as text
Moving Images There are about 150,000 around, and about 1,000 that are not copyrighted. 50 percent of the 150,000 are from India (!?!?!) Formats of moving images keep changing. Movies that are in IA were converted and re-coded to make it easier to find. There's maintenance of moving image archiving.
The Internet Archive has tons of old public service films and educational films that are out of copyright and uploaded regularly. But, however, IA doesn't know why people would want these films ;-)
Moving Images also include television programs. Tons of TV programs are recorded daily. See the Television ArchiveS
On site staff at the Presidio (where the IA is localted) is about 35 people. That's very small
Software has lots of difficulties in archiving. Copyright, platforms, etc. Lots of the storage of this is difficult becuase of these issues.
Started collecting the WWW around 1980's Nowadays, 4 billiion pages are archived.
Want to see your old website: go to the WayBack machine to find old sites.
If you're doing something to help people, they won't get angry at what you do with content online. If you do something wrong, you'll get in trouble.
If we're working to create the Library of Alexandria 1, might not be a great idea.If we build this thing up, what should be do differently: MAKE COPIES! Digital copies are easy to make.
They designed their own computer! (I'm very impressed)
So, we can collect up, and preserve all kinds of media over the long haul
But the library industry is imploding like many other industries. Monopolies forming--which isn't good. Libraries are getting to be central controlled, not local controlled. They are ruled by contract now, raterh than ruled by Law. They aer for-profit and not non profit.
What about the future of books? Books play a different role in our lives than other forms of media. They are like the mind. They are written by one person and are one person's idea.
What's going on with books: Book publishers having trouble making money. A couple of big players are controlling aggregation of works, as well as distribution in order to try to control the distribution on media. Google is aggregating libraries (public domain works) and putting restrictions on what they aggregate! (how awful!)
Class action law that was used to control digitizing of content--so that Google can lock up content. To Kahle, it doesn't seem right that literary "orphan" books should. Class action settlements are making changes in content. Secretly negotiated class actions effective in making legislative decisions without legislating.
What we need is a set of standards. We're also missing distance lending of copyright.
If we keep our eye on the ball (on the potential monopolies), we can have all this knowledge at our disposal.
At the end, someone confuses Kahle with the digital utopians who believe all content should be free. That's not what he was saying. What he said was how we should be careful that content doesn't get owned and monetized by monopolies--esp. "orphan" books, which no one has made money off of for a very long time. But he is not against paid-for content. Just that the payment goes to the right people