On Sat, 24 Mar 2012 10:26:48 -0000, Dave said: > Doesn't the the -e, robots=off, --page-requisites and -H wget directives > enable > one to collect all the necessary files that are called from a page? No, not *all* the files, for the same reason that if you visit a page with NoScript enabled, you may end up with missing content and/or big open spaces on the page. Consider a page that has Javascript on it: todaysfile = "http://www.news-site.com/" + date_as_string; document.load(todaysfile); Unless you interpret the javascript, you don't know what URL will get loaded, because yesterday and tomorrow will get a different URL. So basically, if you try to pull it down with wget or similar, you will miss *all* the stuff that's pulled down via Javascript (and probably via css as well - does wget know how to follow CSS references?). On many modern web designs, this ends up being the vast majority of the content.
Attachment:
pgpXK3NcXtBjQ.pgp
Description: PGP signature
_______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/