[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Full-disclosure] Google's robots.txt handling



Is this the case even when there is an entry in robots.txt for robots.txt

Philip Whitehouse

On 11 Dec 2012, at 12:22, Ulisses Montenegro <ulisses.montenegro@xxxxxxxxx> 
wrote:

> If I understand the OP correctly, he is not stating that listing something in 
> robots.txt would make it inaccessible, but rather that Google indexes the 
> robots.txt files themselves, and makes the contexts of those available for 
> query. So, in a way, they make it easier for Google search results harvesters 
> to find sites which host files/directories of known applications, while 
> Google does not index those directories/files themselves because it follows 
> the robots.txt restrictions. In a nutshell:
> 
> [Attacker] Google, show me sites that have public /wp-admin/ directories.
> [Google] I don't know about that, I was not allowed to index those.
> [Attacker] Ok, so show me the hosts that have robots.txt files which disallow 
> indexing /wp-admin/ directories, then...
> [Google] Sure thing, here you go!
> 
> Yes, the fact that those resources are out there in the open makes the effort 
> of hiding them from Google crawlers rather useless, but still Google should 
> not allow queries on the contents of robots.txt files, as it sort of beats 
> the purpose of disallowing stuff from being indexed...
> 
> 
> On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson 
> <scott.ferguson.it.consulting@xxxxxxxxx> wrote:
>> > /From/: Hurgel Bumpf <l0rd_lunatic () yahoo com>
>> > /Date/: Mon, 10 Dec 2012 19:25:39 +0000 (GMT)
>> > ------------------------------------------------------------------------
>> > Hi list,
>> >
>> >
>> > i tried to contact google, but as they didn't answer my email,  i do 
>> > forward this to FD.
>> > This "security" feature is not cleary a google vulnerability, but exposes 
>> > websites informations that are not really
>> > intended to be public.
>> >
>> > (Additionally i have to say that i advocate robots.txt files without 
>> > sensitive content and working security mechanisms.)
>> >
>> > Here is an example:
>> >
>> > An admin has a public webservice running with folders containing sensitive 
>> > informations. Enter these folders in his
>> > robots.txt and "protect" them from the indexing process of spiders. As he 
>> > doesn't want the /admin/ gui to appear in the
>> > search results he also puts his /admin in the robots text and finaly makes 
>> > a backup to the folder /backup.
>> >
>> > <snipped>
>> >
>> > This shouldn't be a discussion about bad practice but the google feature 
>> > itself.
>> >
>> > Indexing a file which is used to prevent indexing.. isn't that just 
>> > paradox and hypocrite?
>> >
>> > Thanks,
>> >
>> >
>> > Conan the bavarian
>> 
>> Your point eludes me - Google is indexing something which is publicly
>> available. eg.:- curl http://somesite.tld/robots.txt
>> So it seems the solution to the "question" your raise is, um, nonsensical.
>> 
>> If you don't want something exposed on your web server *don't publish
>> references to it*.
>> 
>> The solution, which should be blindingly obvious,  is don't create the
>> problem in the first place. Password sensitive directories (htpasswd) -
>> then they don't have to be excluded from search engines (because listing
>> the inaccessible in robots.txt is redundant).  You must of missed the
>> first day of web school.
>> 
>> Kind regards.
>> 
>> 
>> _______________________________________________
>> Full-Disclosure - We believe in it.
>> Charter: http://lists.grok.org.uk/full-disclosure-charter.html
>> Hosted and sponsored by Secunia - http://secunia.com/
> 
> 
> 
> -- 
> “If debugging is the process of removing software bugs, then programming must 
> be the process of putting them in.” - Edsger Dijkstra
> _______________________________________________
> Full-Disclosure - We believe in it.
> Charter: http://lists.grok.org.uk/full-disclosure-charter.html
> Hosted and sponsored by Secunia - http://secunia.com/
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/