Code as Data

An extension to the Google Site Map to allow your public code to be searched. This is a cool idea. It is taking the philosophy of distributed data one step further. In this case we are treating Code As Data for the purposes of Search. It is just a little innovation that makes it easy for Code search engines to locate code.

From Code Search Site Map:

We’ve heard from a number of site owners who want to make sure their public source code is searchable via Google Code Search. To help with that, we extended the Sitemap Protocol to support code files. This makes it possible to specify all the code files on your site, as well as the programming language and software license for each file.

To get started, check out the new Code Search tags for Sitemaps. For complete software packages that are archives (.tar, .tar.gz, .tar.bz2, or .zip), you can create a packagemap file to describe all the individual code files in each package.

The benefits go beyond Google Code Search. The concept can be used behind the firewall for enterprises to share code and detect duplicate code inside an enterprise as well.

Combining Code Search with AIML may be used to produce an interactive code finder for open source.