I_NEED_HELP Posted January 17, 2007 Report Share Posted January 17, 2007 I need some help using an application called interchange on my webserver. I have a url www.somewebsite.com/questions/*.html I want to index all files in that questions directory. I am using the following wget command. wget -r -l2 -A html -X cgi-bin -D www.somewebsite.co.uk/ -P /home/httpd/vhosts/somewebsite.co.uk/catalogs/somewebsite/swish_site/ http://www.somewebsite.co.uk/questions/ This only index the index page of this folder. It wil not follow the links on the page. What would be the appropriate command to use to index all pages from that folder. [moved from Software by spinynorman] Quote Link to comment Share on other sites More sharing options...
neddie Posted January 17, 2007 Report Share Posted January 17, 2007 Have you checked the result of your index to see whether it actually contains links to the questions? If you look at the URL http://www.somewebsite.co.uk/questions/ in a browser, do you see links to each of the questions, and are they html links? Try with a simpler command, just with -r and --no-parent but without the -D, -P, -X, -A and see if that works. And it it doesn't, use the -o or -d options to see extra output why it's not following the links. Quote Link to comment Share on other sites More sharing options...
I_NEED_HELP Posted January 17, 2007 Author Report Share Posted January 17, 2007 (edited) I have tried it with simpler options and it just does not follow the links on the page. i tried wget -r --no-parent http://www.somesite.co.uk/answers/ -d | more and that only retrieve the index page. The page has links on it that are just plain ordinary html links yet does not follow those links. ive tried using -d and sending out put to a logfile. The lines relating to one of the links i want it to follow are below: DEBUG output created by Wget 1.10.2 (Red Hat modified) on linux-gnu. Enqueuing http://www.somesite.co.uk/answers/ at depth 0 Queue count 1, maxcount 1. Dequeuing http://www.somesite.co.uk/answers/ at depth 0 Queue count 0, maxcount 1. --14:06:39-- http://www.somesite.co.uk/answers/ => `www.somesite.co.uk/answers/index.html' Resolving www.somesite.co.uk... 213.218.223.240 Caching www.somesite.co.uk => 213.218.223.240 Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/ HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:39-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:39 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:40-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:40 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 --14:06:40-- http://www.somesite.co.uk/answers/index => `www.somesite.co.uk/answers/index' Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8) Connecting to www.somesite.co.uk|213.218.223.240|:80... connected. Created socket 4. Releasing 0x09dc14a8 (new refcount 1). ---request begin--- GET /answers/index HTTP/1.0 User-Agent: Wget/1.10.2 (Red Hat modified) Accept: */* Host: www.somesite.co.uk Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 302 moved Date: Wed, 17 Jan 2007 14:06:40 GMT Server: Apache/2.0.52 (Red Hat) Location: index Connection: close Content-Type: text/plain ---response end--- 302 moved Location: index [following] Closed fd 4 20 redirections exceeded. FINISHED --14:06:40-- Downloaded: 0 bytes in 0 files Do these lines mean anything. i do not have any experience using wget before. Edited January 17, 2007 by I_NEED_HELP Quote Link to comment Share on other sites More sharing options...
neddie Posted January 17, 2007 Report Share Posted January 17, 2007 HTTP/1.1 302 moved The server isn't delivering the page, it's issuing a HTTP redirect to say "the page isn't here, it's over there". It's not quite clear what the redirect url is in this case though. It seems like it's just looping until its 20 tries limit is reached Try with the full URL answers/index.html and just for giggles try with another URL just to make sure this particular server isn't reacting strangely to a wget request. Quote Link to comment Share on other sites More sharing options...
I_NEED_HELP Posted January 17, 2007 Author Report Share Posted January 17, 2007 ok ive got this working. On my original command it appeared to be the -D that was causing it to fail retirval of all pages for some reason. now that is taken out it works fine and it is not really needed in this case anyway. Thanks for the help neddie. Quote Link to comment Share on other sites More sharing options...
neddie Posted January 17, 2007 Report Share Posted January 17, 2007 Maybe some of the links pointed to somewebsite rather than www.somewebsite. Just a thought. Anyway, you're welcome! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.