Jump to content

Looking for software to download an entire website (non ftp)


fuzzylizard
 Share

Recommended Posts

Hello,

 

I am looking for some software that will allow me to point it to a website and it will capture the entire thing. Something like blackwidow on windows.

 

Preferably I would like to be able to point it at a website, tell it how deep to go and whether it can leave the server, and it would download the entire site to specified folder.

 

Anyone have any suggestions?

 

Thanks in advance.

Link to comment
Share on other sites

Thanks but no.

 

Warning: wildcards not supported in HTTP.

 

Anyone else?

 

Blackwidow is a windows software that will allow you to download an entire website according to a set of rules. It downloads all files while retaining the file structure of the website. All files includes html, jpeg, gif, and anything else that is not behind a password protected directory or that is script driven - i.e. it won't download php or cfm files.

Link to comment
Share on other sites

I downloaded the whole IceWM site in about a minute.

 

[gd@localhost tmp]$ wget -r http://www.icewm.org/

--21:52:29--  http://www.icewm.org/

          => `www.icewm.org/index.html'

Résolution de www.icewm.org... complété.

Connexion vers www.icewm.org[66.35.250.210]:80...connecté.

requête HTTP transmise, en attente de la réponse...200 OK

Longueur: non spécifié [text/html]



   [ <=>                                                                 ] 9,452         56.98K/s



21:52:29 (56.98 KB/s) - « www.icewm.org/index.html » sauvegardé [9452]



...

...

...

...

--21:53:08--  http://www.icewm.org/files/es/FAQ/IceWM-FAQ-12.html

          => `www.icewm.org/files/es/FAQ/IceWM-FAQ-12.html'

Connexion vers www.icewm.org[66.35.250.210]:80...connecté.

requête HTTP transmise, en attente de la réponse...200 OK

Longueur: 4,692 [text/html]



100%[====================================================================>] 4,692         33.45K/s    ETA 00:00



21:53:09 (33.45 KB/s) - « www.icewm.org/files/es/FAQ/IceWM-FAQ-12.html » sauvegardé [4692/4692]





Terminé --21:53:09--

Téléchargement: 1,535,694 octets dans 180 fichiers

 

[gd@localhost tmp]$ du -sh www.icewm.org/

1.9M    www.icewm.org

 

????

 

MOttS

Link to comment
Share on other sites

Getting closer. However, I need to download everything below a certain level on a website

 

www.foo.com/~html/foo/bar/...

 

When I used the wget command, it downloaded everything from the root on down. I need something with more control.

Link to comment
Share on other sites

Alright, a little reading of the man page and I have figured it out. I know, I know RTFM!!!

 

Anyway, here is the command I needed to use:

 

wget -r -np -l 3 http://www.foo.com/~html/foo/bar/

 

-np no parent. It does not desend upwards into the parent directory

 

-l 3 levels to search. In this case 3. The default is 5.

 

wget is a very cool command that I am going to have to look at more.

 

Thanks for the help.

Link to comment
Share on other sites

Boy, you were quick to jump.  wget is a wonderful tool, and the right answer was given immediately.  I use wget DAILY and cannot say enough nice things about it

 

Yeah, I know. Hopefully I can be forgiven for that. I guess I was looking for a graphical solution. Hey, at least I checked the man page and saw the error of my ways, that must count for something. :)

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...