[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Full-disclosure] Google's robots.txt handling
- To: James Lay <jlay@xxxxxxxxxxxxxxxxxxx>
- Subject: Re: [Full-disclosure] Google's robots.txt handling
- From: Gynvael Coldwind <gynvael@xxxxxxxxxxx>
- Date: Tue, 11 Dec 2012 00:33:40 +0100
Hey,
> > Here is an example:
> >
> > An admin has a public webservice running with folders containing
> > sensitive informations. Enter these folders in his robots.txt and
> > "protect" them from the indexing process of spiders. As he doesn't
> > want the /admin/ gui to appear in the search results he also puts his
> > /admin in the robots text and finaly makes a backup to the folder
> > /backup.
If no one would know about a folder, why would one add it to
robots.txt in the first place?
But that's missing the point anyway - robots.txt is not a security mechanism.
If someone uses robots.txt as the only and last line of defense he
plainly doesn't understand what he's doing (especially that it's one
of the first files both pentesters & attackers look at).
If someone has an /admin/ site (which is a really easily guessable
name, checked by every web directory scanner out there) he cannot rely
on concealment*, but on proper user authentication using mechanisms
designed for such purpose (e.g. requiring a password).
(* for historical reasons there is a Polish IT term for such attempts
- "deep hiding", there's even a wiki page on that -
http://pl.wikipedia.org/wiki/G%C5%82%C4%99bokie_ukrycie)
> I'm wondering if, in perhaps .htaccess, one could allow ONLY site
> crawlers access to the robots.txt file. Then add robots.txt to
> robots.txt...would this mitigate some of the risk?
1. It's still missing the point.
2. No, it wouldn't work in case of scanners that try to impersonate robots.
--
gynvael.coldwind//vx
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/