Jump to content

Using WGet [solved]


Recommended Posts

I need some help using an application called interchange on my webserver. I have a url www.somewebsite.com/questions/*.html I want to index all files in that questions directory. I am using the following wget command.

 

wget -r -l2  -A html -X cgi-bin -D www.somewebsite.co.uk/ -P /home/httpd/vhosts/somewebsite.co.uk/catalogs/somewebsite/swish_site/ http://www.somewebsite.co.uk/questions/

 

This only index the index page of this folder. It wil not follow the links on the page. What would be the appropriate command to use to index all pages from that folder.

 

 

[moved from Software by spinynorman]

Link to comment
Share on other sites

Have you checked the result of your index to see whether it actually contains links to the questions? If you look at the URL http://www.somewebsite.co.uk/questions/ in a browser, do you see links to each of the questions, and are they html links?

 

Try with a simpler command, just with -r and --no-parent but without the -D, -P, -X, -A and see if that works. And it it doesn't, use the -o or -d options to see extra output why it's not following the links.

Link to comment
Share on other sites

I have tried it with simpler options and it just does not follow the links on the page. i tried

 

wget -r --no-parent http://www.somesite.co.uk/answers/ -d | more

 

and that only retrieve the index page. The page has links on it that are just plain ordinary html links yet does not follow those links. ive tried using -d and sending out put to a logfile. The lines relating to one of the links i want it to follow are below:

DEBUG output created by Wget 1.10.2 (Red Hat modified) on linux-gnu.

Enqueuing http://www.somesite.co.uk/answers/ at depth 0
Queue count 1, maxcount 1.
Dequeuing http://www.somesite.co.uk/answers/ at depth 0
Queue count 0, maxcount 1.
--14:06:39--  http://www.somesite.co.uk/answers/
	   => `www.somesite.co.uk/answers/index.html'
Resolving www.somesite.co.uk... 213.218.223.240
Caching www.somesite.co.uk => 213.218.223.240
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/ HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:39--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:39 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:40--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:40 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
--14:06:40--  http://www.somesite.co.uk/answers/index
	   => `www.somesite.co.uk/answers/index'
Found www.somesite.co.uk in host_name_addresses_map (0x9dc14a8)
Connecting to www.somesite.co.uk|213.218.223.240|:80... connected.
Created socket 4.
Releasing 0x09dc14a8 (new refcount 1).

---request begin---
GET /answers/index HTTP/1.0
User-Agent: Wget/1.10.2 (Red Hat modified)
Accept: */*
Host: www.somesite.co.uk
Connection: Keep-Alive

---request end---
HTTP request sent, awaiting response... 
---response begin---
HTTP/1.1 302 moved
Date: Wed, 17 Jan 2007 14:06:40 GMT
Server: Apache/2.0.52 (Red Hat)
Location: index
Connection: close
Content-Type: text/plain

---response end---
302 moved
Location: index [following]
Closed fd 4
20 redirections exceeded.

FINISHED --14:06:40--
Downloaded: 0 bytes in 0 files

 

Do these lines mean anything. i do not have any experience using wget before.

Edited by I_NEED_HELP
Link to comment
Share on other sites

HTTP/1.1 302 moved

The server isn't delivering the page, it's issuing a HTTP redirect to say "the page isn't here, it's over there". It's not quite clear what the redirect url is in this case though. It seems like it's just looping until its 20 tries limit is reached

 

Try with the full URL answers/index.html and just for giggles try with another URL just to make sure this particular server isn't reacting strangely to a wget request.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...