GNU Wget Tutorial

As a student, you may find yourself wanting to download lots of lecture slides and other materials off a module homepage, which can become quite an arduous task. Thankfully, GNU created Wget which is already on most linux machines. It is best demonstrated by example:

wget -r -l5 -np -k -nH --cut-dirs=5 --load-cookies cookies.txt http://www2.warwick.ac.uk/fac/sci/physics/current/teach/module_home/px421/

-r
Means wget acts recursively i.e. it follows links found on the current page (much like a search engine spider.

-l
Specifies the depth, which means how many of these links it can follow. If you imagine all the links on the current page forming branches away from it, then the links on those pages forming branches away from those, then -l5 sets the maximum branch distance away from the current page.

-np
No Parent, means wget will only progress down the directory tree i.e. it will not work its way back into http://www2.warwick.ac.uk/fac/sci/physics/current/teach/module_home/

-k
Convert Hyperlinks. When wget downloads a page, say index.html, there will be links on that page just like viewed in your browser but -k will convert them to local links, so that you can navigate your way through the pages on your local machine.

-nH
No host directories. Basically wget would otherwise create a folder named “http://www2.warwick.ac.uk/” and all the downloaded stuff would get stored in there, which is normally undesirable.

–cut-dirs=5
Otherwise wget would create 5 directories

http://www2.warwick.ac.uk/fac/
sci/
physics/
current/
teach/
module_home/

in a directory tree which you don’t want to have to click through…

–load-cookies
Normally content is restricted and you need to login, so you need to supply wget with some cookies. If you are a firefox user then there is an extension called ‘cookie exporter’, which you can use to output your cookies to a file called cookies.txt.

That’s it!

Libraries

Compiling converts your source code into object or machine code, which the processor can understand. So the compiler produces an object file (.o) from your source code. The linker then pieces the object files together and herefrom produces an executable. If you wish to “compile only” i.e. to obtain merely the object file, you can add the “-c” flag at compilation:

michael@michael-laptop:~$ gcc -c test.c

This produces the object file “test.o”. You can inspect this object file with the nm command. It basically lists all the symbols defined in the object file.

Static Libraries

A static library is an archive of object files. All we do is include this archive in the compile line just like we would do for the .o files. Any executable created by linking with a static library contains its own copy of all the binary object files found inside, so only the executable need be distributed. This archive is created with the ar command e.g.

michael@michael-laptop:~$ ar r library.a file1.o file2.o file3.o 

” r Insert the files member into archive (with replacement).”
You can then display what files are in an archive with the t option.

michael@michael-laptop:~$ ar t library.a

Shared Libraries

It is a better idea to use shared libraries over static libraries. This is because modules that a program requires are loaded into the memory from shared objects at run time or load time, whereas static libraries are all put together at compile time. This has the advantage that I can change the object files in the libraries and not need to recompile my executable. If the library was static and I made a change, then I would need to recompile all the executables which depend on that library. Shared libraries have 3 names:

soname

has the prefix “lib” and the suffix .so followed by a full stop and then the major version number i.e. libtest.so.1
We would only increment the major version number if we make a change which breaks backward compatibility e.g. changing the number of arguments that a function has

realname

is the actual filename which contains the actual library code. It gains a minor version number plus a realase number in addition to the soname i.e. libtest.so.1.0.1

linker name

is the name of the library which the linker refers to at compilation. It is the same as the soname, just without the version number i.e. libtest.so
The linker name is basically a symbolic link to the soname, which is itself a symbolic link to the real name.

To create a shared library, we need to compile our source code as follows:

michael@michael-laptop:~$ gcc -fPIC -c test.c

The “-fPIC” option tells the compiler to produce Position Independent Code, which means the code can function regardless of where in the memory it is loaded. We can then proceed by using the “-shared” option at gcc and passing the soname as an option to the linker with the -Wl command.

michael@michael-laptop:~$ gcc -shared -Wl,-soname,libtest.so.1 -o libtest.1.0.1 test.o

The -shared option tells the compiler that the output file should be a shared library.
-Wl,option
Pass option as an option to the linker. If option contains commas, it is split into multiple options at the commas. You must not include whitespaces.
The -soname option specifies the soname, duh.
The -o option specifies the real name.

Now that the shared library has been created, we need to install it, namely with ldconfig. The ldconfig program generates a symbolic link, named as the soname, to the realname. The -n option specifies the directory where the shared library is found.
Finally, we need to create a new symbolic link with a filename as the linker name, to the soname.

michael@michael-laptop:~$ ln -s libtest.so.1 libtest.so

where -s option stands for symbolic.

at command

Scheduling a process to run automatically at a certain date and time can be quite useful. This is achieved with the at command. The at command reads a series of commands from the standard input and lumps them together into one single at-job to be executed at some point in the future.

Syntax: at [-V] [-q queue] [-f file] [-mkdv] [-t time]

You can man at to find out about all the options.

Here is an example:

michael@michael-laptop:~$ at -mv 4:44 sep 21
Wed Sep 21 04:44:00 2011

warning: commands will be executed using /bin/sh
at> g++ main.cpp -o test
at> nohup ./test &
at> 
job 13 at Wed Sep 21 04:44:00 2011
michael@michael-laptop:~$ 

-m option mails the user when the at-job has been executed. The -v option simply produces the first line i.e. displays when the job will be executed. One does not write , instead, this is achieved by entering CTRL+D here on a new line.

To see whats at-jobs have been scheduled, enter atq (alternatively at -l

michael@michael-laptop:~$ atq
13	Wed Sep 21 04:44:00 2011 a michael
michael@michael-laptop:~$ at -l
13	Wed Sep 21 04:44:00 2011 a michael
michael@michael-laptop:~$ 

To remove this job, enter atrm and the ID:

michael@michael-laptop:~$ atrm 13
michael@michael-laptop:~$ atq
michael@michael-laptop:~$

The at-job inherits the environment of the terminal that schedules it and hence contains the same working directory and environment variables. If at some point in the future you forget what commands an at-job contains, you can view them with at -c job. This lists the environment variables of the at-job and contains the commands at the bottom.

Kill – Killing Processes and the top Command

Sooner or later you will want to kill a process, whether it be some code executed in the background that is taking too long or simply a program that is misbehaving.

If the code has been executed by the user in the background, then one can use the jobs command with the kill command:

michael@michael-laptop:~$ jobs
[1]+  Running                 ./test &
michael@michael-laptop:~$ kill -9 %1
michael@michael-laptop:~$ 
[1]+  Killed                  ./test

The second term specifies which signal to send, as specified in . As an exemplary aside, CTRL + C sends the SIGINT interrupt signal to the program and CTRL + Z sends the SIGSTOP pause signal. The -9 sends the SIGKILL signal, which is slightly more forceful than the default SIGTERM which is obtained by -15 or omitting this field. SIGKILL forces the program to end immediately whereas SIGTERM can be intercepted or ignored by the program. The latter is the gentler approach as it can allow the program to clean itself up before finishing. ?using the latter would have produced this line of code [1]+ Terminated ./test as opposed to Killed

top command

Alternatively, you can use the top command to display processes in order of CPU demand. N.B. It is called “top” because only the top most demanding processes are shown.

top - 12:26:34 up 23 days, 12:39,  2 users,  load average: 8.00, 8.01, 8.05
Tasks: 152 total,  10 running, 142 sleeping,   0 stopped,   0 zombie
Cpu(s): 64.4%us,  0.2%sy,  0.0%ni, 35.4%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  24732896k total, 24581112k used,   151784k free,   531980k buffers
Swap:  3999676k total,     9080k used,  3990596k free, 21636732k cached

  PID  USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 2447  laemmel   20   0  122m 111m  972 R  100  0.5   9880:20 BD-concn-semifl    
 2486  lindon    20   0 58676  40m 1492 R  100  0.2   9875:24 hing                
 4053  laemmel   20   0  122m 111m  972 R  100  0.5   9672:33 BD-concn-semifl    
 7782  sturm     20   0 19496 7100 1220 R  100  0.0   1520:41 tfpulling          
 20031 sturm     20   0 23384  10m 1220 R  100  0.0   5330:51 tfpulling          
 26339 sturm     20   0 22616  10m 1220 R  100  0.0   4374:08 tfpulling          
 30841 glaser    20   0 44.5g 473m 344m R  100  2.0  10974:30 hoomd              
 1093  sturm     20   0 58764  45m 1184 R  100  0.2  10176:04 tfpulling          

Within top, you can press k (for kill) and then enter the PID (process ID) of the process you would like to kill. To exit top, press q. If you know the PID, which you can get from top, you can use the kill command in a regular terminal by entering kill -9 PID

Background Processes and the jobs Command

If your code takes a long time to run after execution, you may consider running it in the background. Consider the executable “test”, I run it as follows

1. Running Processes in the Background

michael@michael-laptop:~$ ./test

while it is running, you do much with the terminal. You can abort it with CTRL + C or you can pause it with CTRL + Z

michael@michael-laptop:~$ ./test
^Z
[1]+  Stopped                 ./test

The last line is exactly what you would get if you typed the job command. You can now send this process to run in the background by typing bg

michael@michael-laptop:~$ bg
[1]+ ./test &
michael@michael-laptop:~$ 

Typing jobs again will now show you that the process is running in the background

michael@michael-laptop:~$ jobs
[1]+  Running                 ./test &
michael@michael-laptop:~$

When the process finished, a confirmation will be displayed in the terminal like this

[1]+  Done                    ./test
michael@michael-laptop:~$ 

In addition, you can bring processes back to the foreground by entering fg
job_id
, where the job id is found from the jobs command (it is the number before the process e.g. 1 in this example).

You may now also wish to know how to kill background processes

Creating SSH Keys with ssh-keygen and ssh-copy-id

Have you ever gotten tired of constantly entering your password whenever ssh’ing a remote computer? You don’t need to! You can create a pair of SSH keys, namely a private and public one, which will save you from re-entering your password in the future. The private key you keep secret in your home folder and the public key you copy to every server or remote computer which you wish to SSH. The authentication then proceeds as follows: when you wish to SSH a remote computer, their public key is then compared with your private key and if they match, then the authentication succeeds. Let’s proceed with how we create SSH keys

 

1. Create SSH Keys with ssh-keygen

Open up a terminal and type ssh-keygen

lindon@michael-laptop:~$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/lindon/.ssh/id_rsa):  
Created directory '/home/lindon/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/lindon/.ssh/id_rsa.
Your public key has been saved in /home/lindon/.ssh/id_rsa.pub.

When prompted to enter a file, passphrase, same passphrase, simply press return – we don’t want to enter anything here. The last line will then be succeeded  by the key’s fingerprint and random image. You’ll notice that the public and private key are found in the /home/user/.ssh    folder (it’s hidden so make sure you can view hidden folders). The next step is to copy the public key over to the remote computer…

 

2.Transfer the public key with ssh-copy-id command

Syntax: ssh-copy-id [-i [identity_file]] [user@]machine

lindon@michael-laptop:~$ ssh-copy-id -i lindon@remotecomputer.com
The authenticity of host 'remotecomputer (xx.xxx.xxx.xx)' can't be established.
RSA key fingerprint is   :  :  :  :  :  :  :  :  :  :  :  :  :  :  :  :.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'remotecomputer,xx.xxx.xxx.xx' (RSA) to the list of known hosts.
lindon@remotecomputer's password:
Now try logging into the machine, with "ssh 'lindon@remotecomputer.com'", and check in:

~/.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

lindon@michael-laptop:~$

The -i option means that the default public key, i.e. ~/.ssh/id_rsa.pub, is used.

 

3. Login without Password

lindon@michael-laptop:~$ ssh lindon@remotecomputer.com
Welcome to Ubuntu 11.04 (GNU/Linux 2.6.38-11-generic x86_64)

* Documentation:  https://help.ubuntu.com/

Last login: Sun Sep 11 15:57:14 2011 from xx.xxx.xxx.xx
lindon@remotecomputer:~$

And there we go, login without password!