[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Full-disclosure] Google's robot.txt handling

To: <noloader@xxxxxxxxx>
Subject: Re: [Full-disclosure] Google's robot.txt handling
From: Thomas Behrend <webmaster@xxxxxxxxxxxxxxxx>
Date: Wed, 12 Dec 2012 07:20:14 +0100

 We found this "Security Issue" real long time ago and used it by 
 ourself to find hidden pages.
 The only thing you could do, is to harden the directory for Crawlers 
 with Mod_Rewrite or in the index.(php|pl|py|asp|etc) itself when you 
 check the Browser String. If it doesn´t contain somethin like the common 
 Browser Strings, just send an 404 back and Google and other Crawlers 
 will never index it.

 Of course, you just could rename /admin/ to /4dm1n/ or even just us an 
 Subdomain you never link on your Webpage, in that case, just split the 
 Webcontent and hide / in the robots.txt just in case the URL leaks.

 Another thing we saw working: Just lock the directory via htaccess of 
 your Webserversoftware and Google didn´t index the page because the 
 Crawler didn´t get an HTTP Code 200 back, its getting an 401.

 So, thats our way to "hide" our Admininterfaces. Worked so far, but 
 even in case someone finds it, the Interface should be strong enough to 
 withstand any Attack. And of course, the Login Creditials shouldn´t be 
 "password" or on top of Page "Speak friend an come in" :)

 So long
 Thomas

 On Tue, 11 Dec 2012 17:57:31 -0500, Jeffrey Walton <noloader@xxxxxxxxx> 
 wrote:
> On Tue, Dec 11, 2012 at 5:53 PM, Christian Sciberras
> <uuf6429@xxxxxxxxx> wrote:
>> If you ask me, it's a stupid idea. :)
>>
>> I prefer to know where I am with a service; and (IMHO) I would 
>> prefer to
>> query (occasionally) Google for my CC instead of waiting for someone 
>> to
>> start taking funds off it.
>> Hiding it only provides a false sense of security - it will last 
>> until
>> someone finds the service leaking out CCs.
> Agreed. How about search engine data by other crawlers that was not
> sanitized?
>
>> This is especially the case with robots.txt. Can someone on the list 
>> please
>> define a "good web crawler"?
> Haha! Milk up the nose.
>
>> I think the problem here is that people are plain stupid and throw 
>> in direct
>> entries inside robots.txt, whereas they should be sending wildcard 
>> entries.
>> Couple that with actually protecting sensitive areas, and it's a 
>> pretty good
>> defence.
> We now know you don't need a robots.txt for exclusion. Just ask Weev.
>
> Jeff
>
>> On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton 
>> <noloader@xxxxxxxxx> wrote:
>>>
>>> On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas <mvilas@xxxxxxxxx> 
>>> wrote:
>>> > I think we can all agree this is not a vulnerability. Still, I 
>>> have yet
>>> > to
>>> > see an argument saying why what the OP is proposing is a bad 
>>> idea. It
>>> > may be
>>> > a good idea to stop indexing robots.txt to mitigate the faults of 
>>> lazy
>>> > or
>>> > incompetent admins (Google already does this for many specific 
>>> search
>>> > queries) and there's not much point in indexing the robots.txt 
>>> file for
>>> > legitimate uses anyway.
>>> I kind of agree here. The information is valuable for the
>>> reconnaissance phase of an attack, buts its not a vulnerability per
>>> se. But what is to stop the attacker from fetching it 
>>> himself/herself
>>> since its at a known location for all sites? In this case, Google
>>> would be removing aggregated search results (which means the 
>>> attacker
>>> would have to compile it himself/herself).
>>>
>>> Google removed other interesting searches, such as social security
>>> numbers and credit card numbers (or does not provide them to the
>>> general public).
>>>
>>> Jeff
>>>
>>> > On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
>>> > <scott.ferguson.it.consulting@xxxxxxxxx> wrote:
>>> >>
>>> >> > If I understand the OP correctly, he is not stating that 
>>> listing
>>> >> > something
>>> >> > in robots.txt would make it inaccessible, but rather that 
>>> Google
>>> >> > indexes
>>> >> > the robots.txt files themselves,
>>> >>
>>> >> <snipped>
>>> >>
>>> >> Well, um, yeah - I got that.
>>> >>
>>> >> So you are what, proposing that moving an open door back a few
>>> >> centimetres solves the (non) problem?
>>> >>
>>> >> Take your proposal to it's logical extension and stop all search
>>> >> engines
>>> >> (especially the ones that don't respect robots.txt) from 
>>> indexing
>>> >> robots.txt. Now what do you do about Nutch or even some perl 
>>> script
>>> >> that
>>> >> anyone can whip up in 2 minutes?
>>> >>
>>> >> Security through obscurity is fine when couple with actual 
>>> security -
>>> >> but relying on it alone is just daft.
>>> >>
>>> >> Expecting to world to change so bad habits have no consequence 
>>> is
>>> >> dangerously naive.
>>> >>
>>> >> I suspect you're looking to hard at finding fault with Google - 
>>> who are
>>> >> complying with the robots.txt. Read the spec. - it's about not
>>> >> following
>>> >> the listed directories, not about not listing the robots.txt.  
>>> Next
>>> >> you'll want laws against bad weather and furniture with sharp 
>>> corners.
>>> >>
>>> >> Don't put things you don't want seen to see in places that can 
>>> be seen.
>>> >>
>>> >> >
>>> >> >
>>> >> > On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson <
>>> >> > scott.ferguson.it.consulting () gmail com> wrote:
>>> >> >
>>> >> >
>>> >> >     /From/: Hurgel Bumpf <l0rd_lunatic () yahoo com>
>>> >> >     /Date/: Mon, 10 Dec 2012 19:25:39 +0000 (GMT)
>>> >> >
>>> >> >
>>> >> > 
>>> ------------------------------------------------------------------------
>>> >> >     Hi list,
>>> >> >
>>> >> >
>>> >> >     i tried to contact google, but as they didn't answer my 
>>> email,  i
>>> >> > do
>>> >> >
>>> >> > forward this to FD.
>>> >> >
>>> >> >     This "security" feature is not cleary a google 
>>> vulnerability, but
>>> >> >
>>> >> > exposes websites informations that are not really
>>> >> >
>>> >> >     intended to be public.
>>> >> >
>>> >> >     Conan the bavarian
>>> >> >
>>> >> > Your point eludes me - Google is indexing something which is 
>>> publicly
>>> >> > available. eg.:- curl http://somesite.tld/robots.txt
>>> >> > So it seems the solution to the "question" your raise is, um,
>>> >> > nonsensical.
>>> >> >
>>> >> > If you don't want something exposed on your web server *don't 
>>> publish
>>> >> > references to it*.
>>> >> >
>>> >> > The solution, which should be blindingly obvious,  is don't 
>>> create
>>> >> > the
>>> >> > problem in the first place. Password sensitive directories 
>>> (htpasswd)
>>> >> > -
>>> >> > then they don't have to be excluded from search engines 
>>> (because
>>> >> > listing
>>> >> > the inaccessible in robots.txt is redundant).  You must of 
>>> missed the
>>> >> > first day of web school.
>
> _______________________________________________
> Full-Disclosure - We believe in it.
> Charter: http://lists.grok.org.uk/full-disclosure-charter.html
> Hosted and sponsored by Secunia - http://secunia.com/

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

References:
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Scott Ferguson
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Mario Vilas
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Jeffrey Walton
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Christian Sciberras
- Re: [Full-disclosure] Google's robot.txt handling
  - From: Jeffrey Walton

Prev by Date: Re: [Full-disclosure] Removing seless email addresses (on FD list)
Next by Date: Re: [Full-disclosure] Removing seless email addresses (on FD list)
Previous by thread: Re: [Full-disclosure] Google's robot.txt handling
Next by thread: [Full-disclosure] [SECURITY] [DSA 2586-1] perl security update
Index(es):
- Date
- Thread