Metadata for the Common Man (or Woman)

In July I was honored to be appointed Visiting Scholar at SILS, the School of Information and Library Science and the University of North Carolina, Chapel Hill. The Information and Library Science community and the Open Source community share many common passions, especially the belief that sharing knowledge is important and good work. And increasingly I see a shared fate for both communities…

In February this year I learned the history of a great man, Dr. Fred Kilgour. Dr. Kilgour was curious, courageous, and committed when it came to learning and to sharing knowledge. His pioneering work to establish and defend the library’s right to share both books and information about books made the WorldCat online cataloging system a viable index for discovering and accessing library resources around the world. Created in 1971, it contains more than 1 billion records referencing physical and digital items in more than 360 languages, as of May 2007. And what makes it work? Standards, and a fundamental commitment to sharing information.

While politicians debate whether or not sea levels are rising and whether or not such elevations are due to human-caused activities or not,

Lately I’ve observed that the more data we put into our computers—be it personal photos from the beach, calendar information, address lists, commerical materials, research papers, business plans, etc.—the more we depend on either a good memory or Google to find it for us. Yet for all of Google’s searching power, and for all the power of our own brains, terabytes of data are becoming wasted assets because there was never any concept of information or library science to consider when first the data was being generated.

Here is one example where I got really lucky: photography. I have a Canon EOS digial camera system, and some time ago I had to decide which few of my many lenses I should bring on a trip throughout Asia (Australia, China, and Japan). Though I had not thought of it ahead of time, my camera had been annotating my digital images with lens type, aperture, shutter speed, ISO speed, etc., if only I could decode that information from the embedded EXIF data. Using Phil Harvey’s excellent ExifTool and a little Perl hackery (for which I have been credited), I was able to decypher the unpublished data format and learn what lenses I’d most used at what apertures in my 3000+ digital photo history. The result: a method for selecting the optimal lenses giving a limited lens knapsack.

But increasingly data is being produced without tags, and this lack of tagging makes it difficult or impossible to do intelligent aggregate and selective searches. Folksonomies and taxonomies have become powerful tools in the right hands, but too much data is created without any thoughts or any science about how that data will be maintained or re-purposed in the longer term.

We need to get a lot more serious about metadata definitions and workflows. Fortunately, open source tools make it much easier to reward the metadata creator with an accepting and acceptable workflow. By this I mean an open source desktop can facilitate metadata tagging from the desktop. Open source tools that interface with databases can pass metadata to and from the database. Editors (even 2d paint, 2d illustration and 3d editors) can become part of the metadata workflow. Thus, open source can enable the standards that the information and library scientists can help define.

I hear in my mind the solo trumpet of Aaron Copeland’s Fanfare for the Common Man, at once solitary and hopeful but now also pervasive and ubiquitous (due to its adoption by network television), and I imagine that in the future, some creative soul will make metadata as defining, as resonant, and as pervasive and ubiquitous as Copeland’s composition has become. My only hope is that their genius as it relates to information and library science is as great as Copelands was to modern music.