[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Full-disclosure] Google's robot.txt handling

To: Christian Sciberras <uuf6429@xxxxxxxxx>
Subject: Re: [Full-disclosure] Google's robot.txt handling
From: Jeffrey Walton <noloader@xxxxxxxxx>
Date: Tue, 11 Dec 2012 17:57:31 -0500

On Tue, Dec 11, 2012 at 5:53 PM, Christian Sciberras <uuf6429@xxxxxxxxx> wrote:
> If you ask me, it's a stupid idea. :)
>
> I prefer to know where I am with a service; and (IMHO) I would prefer to
> query (occasionally) Google for my CC instead of waiting for someone to
> start taking funds off it.
> Hiding it only provides a false sense of security - it will last until
> someone finds the service leaking out CCs.
Agreed. How about search engine data by other crawlers that was not sanitized?

> This is especially the case with robots.txt. Can someone on the list please
> define a "good web crawler"?
Haha! Milk up the nose.

> I think the problem here is that people are plain stupid and throw in direct
> entries inside robots.txt, whereas they should be sending wildcard entries.
> Couple that with actually protecting sensitive areas, and it's a pretty good
> defence.
We now know you don't need a robots.txt for exclusion. Just ask Weev.

Jeff

> On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton <noloader@xxxxxxxxx> wrote:
>>
>> On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas <mvilas@xxxxxxxxx> wrote:
>> > I think we can all agree this is not a vulnerability. Still, I have yet
>> > to
>> > see an argument saying why what the OP is proposing is a bad idea. It
>> > may be
>> > a good idea to stop indexing robots.txt to mitigate the faults of lazy
>> > or
>> > incompetent admins (Google already does this for many specific search
>> > queries) and there's not much point in indexing the robots.txt file for
>> > legitimate uses anyway.
>> I kind of agree here. The information is valuable for the
>> reconnaissance phase of an attack, buts its not a vulnerability per
>> se. But what is to stop the attacker from fetching it himself/herself
>> since its at a known location for all sites? In this case, Google
>> would be removing aggregated search results (which means the attacker
>> would have to compile it himself/herself).
>>
>> Google removed other interesting searches, such as social security
>> numbers and credit card numbers (or does not provide them to the
>> general public).
>>
>> Jeff
>>
>> > On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
>> > <scott.ferguson.it.consulting@xxxxxxxxx> wrote:
>> >>
>> >> > If I understand the OP correctly, he is not stating that listing
>> >> > something
>> >> > in robots.txt would make it inaccessible, but rather that Google
>> >> > indexes
>> >> > the robots.txt files themselves,
>> >>
>> >> <snipped>
>> >>
>> >> Well, um, yeah - I got that.
>> >>
>> >> So you are what, proposing that moving an open door back a few
>> >> centimetres solves the (non) problem?
>> >>
>> >> Take your proposal to it's logical extension and stop all search
>> >> engines
>> >> (especially the ones that don't respect robots.txt) from indexing
>> >> robots.txt. Now what do you do about Nutch or even some perl script
>> >> that
>> >> anyone can whip up in 2 minutes?
>> >>
>> >> Security through obscurity is fine when couple with actual security -
>> >> but relying on it alone is just daft.
>> >>
>> >> Expecting to world to change so bad habits have no consequence is
>> >> dangerously naive.
>> >>
>> >> I suspect you're looking to hard at finding fault with Google - who are
>> >> complying with the robots.txt. Read the spec. - it's about not
>> >> following
>> >> the listed directories, not about not listing the robots.txt.  Next
>> >> you'll want laws against bad weather and furniture with sharp corners.
>> >>
>> >> Don't put things you don't want seen to see in places that can be seen.
>> >>
>> >> >
>> >> >
>> >> > On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson <
>> >> > scott.ferguson.it.consulting () gmail com> wrote:
>> >> >
>> >> >
>> >> >     /From/: Hurgel Bumpf <l0rd_lunatic () yahoo com>
>> >> >     /Date/: Mon, 10 Dec 2012 19:25:39 +0000 (GMT)
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------
>> >> >     Hi list,
>> >> >
>> >> >
>> >> >     i tried to contact google, but as they didn't answer my email,  i
>> >> > do
>> >> >
>> >> > forward this to FD.
>> >> >
>> >> >     This "security" feature is not cleary a google vulnerability, but
>> >> >
>> >> > exposes websites informations that are not really
>> >> >
>> >> >     intended to be public.
>> >> >
>> >> >     Conan the bavarian
>> >> >
>> >> > Your point eludes me - Google is indexing something which is publicly
>> >> > available. eg.:- curl http://somesite.tld/robots.txt
>> >> > So it seems the solution to the "question" your raise is, um,
>> >> > nonsensical.
>> >> >
>> >> > If you don't want something exposed on your web server *don't publish
>> >> > references to it*.
>> >> >
>> >> > The solution, which should be blindingly obvious,  is don't create
>> >> > the
>> >> > problem in the first place. Password sensitive directories (htpasswd)
>> >> > -
>> >> > then they don't have to be excluded from search engines (because
>> >> > listing
>> >> > the inaccessible in robots.txt is redundant).  You must of missed the
>> >> > first day of web school.

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Follow-Ups:
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Thomas Behrend

References:
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Scott Ferguson
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Mario Vilas
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Jeffrey Walton
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Christian Sciberras

Prev by Date: Re: [Full-disclosure] Google's robot.txt handling
Next by Date: [Full-disclosure] Removing seless email addresses (on FD list)
Previous by thread: Re: [Full-disclosure] Google's robot.txt handling
Next by thread: Re: [Full-disclosure] Google's robot.txt handling
Index(es):
- Date
- Thread