[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Full-disclosure] Google's robots.txt handling
- To: Ulisses Montenegro <ulisses.montenegro@xxxxxxxxx>
- Subject: Re: [Full-disclosure] Google's robots.txt handling
- From: Philip Whitehouse <philip@xxxxxxxxx>
- Date: Tue, 11 Dec 2012 12:54:37 +0000
Is this the case even when there is an entry in robots.txt for robots.txt
Philip Whitehouse
On 11 Dec 2012, at 12:22, Ulisses Montenegro <ulisses.montenegro@xxxxxxxxx>
wrote:
> If I understand the OP correctly, he is not stating that listing something in
> robots.txt would make it inaccessible, but rather that Google indexes the
> robots.txt files themselves, and makes the contexts of those available for
> query. So, in a way, they make it easier for Google search results harvesters
> to find sites which host files/directories of known applications, while
> Google does not index those directories/files themselves because it follows
> the robots.txt restrictions. In a nutshell:
>
> [Attacker] Google, show me sites that have public /wp-admin/ directories.
> [Google] I don't know about that, I was not allowed to index those.
> [Attacker] Ok, so show me the hosts that have robots.txt files which disallow
> indexing /wp-admin/ directories, then...
> [Google] Sure thing, here you go!
>
> Yes, the fact that those resources are out there in the open makes the effort
> of hiding them from Google crawlers rather useless, but still Google should
> not allow queries on the contents of robots.txt files, as it sort of beats
> the purpose of disallowing stuff from being indexed...
>
>
> On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson
> <scott.ferguson.it.consulting@xxxxxxxxx> wrote:
>> > /From/: Hurgel Bumpf <l0rd_lunatic () yahoo com>
>> > /Date/: Mon, 10 Dec 2012 19:25:39 +0000 (GMT)
>> > ------------------------------------------------------------------------
>> > Hi list,
>> >
>> >
>> > i tried to contact google, but as they didn't answer my email, i do
>> > forward this to FD.
>> > This "security" feature is not cleary a google vulnerability, but exposes
>> > websites informations that are not really
>> > intended to be public.
>> >
>> > (Additionally i have to say that i advocate robots.txt files without
>> > sensitive content and working security mechanisms.)
>> >
>> > Here is an example:
>> >
>> > An admin has a public webservice running with folders containing sensitive
>> > informations. Enter these folders in his
>> > robots.txt and "protect" them from the indexing process of spiders. As he
>> > doesn't want the /admin/ gui to appear in the
>> > search results he also puts his /admin in the robots text and finaly makes
>> > a backup to the folder /backup.
>> >
>> > <snipped>
>> >
>> > This shouldn't be a discussion about bad practice but the google feature
>> > itself.
>> >
>> > Indexing a file which is used to prevent indexing.. isn't that just
>> > paradox and hypocrite?
>> >
>> > Thanks,
>> >
>> >
>> > Conan the bavarian
>>
>> Your point eludes me - Google is indexing something which is publicly
>> available. eg.:- curl http://somesite.tld/robots.txt
>> So it seems the solution to the "question" your raise is, um, nonsensical.
>>
>> If you don't want something exposed on your web server *don't publish
>> references to it*.
>>
>> The solution, which should be blindingly obvious, is don't create the
>> problem in the first place. Password sensitive directories (htpasswd) -
>> then they don't have to be excluded from search engines (because listing
>> the inaccessible in robots.txt is redundant). You must of missed the
>> first day of web school.
>>
>> Kind regards.
>>
>>
>> _______________________________________________
>> Full-Disclosure - We believe in it.
>> Charter: http://lists.grok.org.uk/full-disclosure-charter.html
>> Hosted and sponsored by Secunia - http://secunia.com/
>
>
>
> --
> “If debugging is the process of removing software bugs, then programming must
> be the process of putting them in.” - Edsger Dijkstra
> _______________________________________________
> Full-Disclosure - We believe in it.
> Charter: http://lists.grok.org.uk/full-disclosure-charter.html
> Hosted and sponsored by Secunia - http://secunia.com/
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/