Making Red Hat bugzilla indexable by Google

Submitted by dag on Fri, 2008/12/12 - 16:57

Based on Joshua Daniel Franklin's comment about searching Red Hat Bugzilla by Component I rediscovered a comment made by Anonymous.

This got me to think about something I remember discussing last LinuxTag during a Fedora dinner when Red Hat's Bugzilla was the subject. Currently RHbz is not indexed by Google and in my opinion this is hurting the Red Hat, Fedora, but also the CentOS community.

If you search for a problem (eg. revelation pygtk EL-5) or if you have an exact error message (I failed to find a good example here), the keywords may bring you to posts on the internet, but the most important resource, Red Hat's Bugzilla content, will not be part of your search results.

While in fact it benefits Red Hat and Fedora directly if the community discusses their issues and bugs in one location. It benefits users because they can keep track of solutions and share problems with others. And searching Google is faster and easier than using Bugzilla's search.

So don't see this as criticism, think of this as one area where Red Hat could improve and foster their community. I discussed Launchpad yesterday regarding working with upstream and I think Canonical is doing much better regarding community building. (Despite Launchpad not being free software, which is a complete different topic.)

If people say Ubuntu is more popular because they get more search results, maybe opening up Red Hat's Bugzilla could help change this believe ?

Thanks Dag! Sounds like a

Thanks Dag! Sounds like a simple matter of putting a URL redirect frontend on Bugzilla to make Google think it's not a CGI website.

Bugzilla actually ships a

Bugzilla actually ships a robots.txt file that instructs Google not to index a Bugzilla site. Traditionally this was because Google's crawler would completely kill the Bugzilla server when it tried to crawl it. Now, Bugzilla itself has had a LOT of performance improvements since then. It may very well be that it's safe to let Google crawl the thing now, but we'd probably need a guinea pig to test it. :)

Last time we discussed trying to allow Google to crawl, several people voiced concern about the potential for security bugs that were mis-filed without the security flag on them (or someone reports a crash, and the developers discover it's exploitable when trying to fix it) to get picked up and cached by Google before they got secured. In real life, I don't think that actually happens very often, but that's one concern for it anyway.

Launchpad not open source.. yet!

Not open source? Yeah, true. Except that it will be released, Affero GPL and all, by July 21st 2009! Stay tuned for a roadmap document to be released by Karl Fogel and I by the end of this month.

Will be isnt.

Launchpad has been scheduled to be released, but it is not now. Kind of weak that someone so interested in open source developed a closed source project.

If its not open now, its not open. Promises from corporations mean nothing.

Bugzilla's robots.txt

Actually it ships a sitemap which tells google what to index and as far as I know, google indexes it. Isn't that correct?

The one I am looking at right

The one I am looking at right now is from 23/12/2008. As far as any queryies I tested none of them returned anything from Bugzilla.

Even with a included. Maybe Red Hat did change the setup after the blog article ?


Yes, Red Hat Bugzilla has indexing enabled now - thank you for pointing it out.