Jump to content

HOWTO how 'ln' can make a better understanding of Unix


aru
 Share

Recommended Posts

 

arusabal

Moderator

Joined: 17 Apr 2002

Posts: 836

Location: Spain

 

Posted: Fri Sep 20, 2002 2:59 pm Post subject: how 'ln' can make a

better understanding of Unix (Linux)

_________________________________________________________________

 

 

This morning I was reading a thread from comp.unix.shell named .

 

I started to read it because I saw a bunch of answers there in such an apparently easy question (It seemed to me just a simple case of using combined 'find' and 'sed'). That called my curiosity.

 

Within that thread there was a discussion (among others) about the use of the 'ln' command instead of 'cp' while making backups of the files which were modified by 'sed'.

 

the backup stuff was something like this:

 

sed '<command>' file > file.tmp

cp file file.bak

mv file.tmp file

 

That's the way I've always done it! But another guy show this way:

 

sed '<command>' file > file.tmp

ln file file.bak

mv file.tmp file

 

So what? Why using 'ln' there? I've always thought that 'ln' was just for making links of one file to another and thus making both files identically one to each other so something that happens to one of the files will happen to the other (I do understand perfectly the soft links which are just pointers to a 'real file', and never cared about hard ones). So where the hell is the backup?

 

Evidently I don't understand many things of UNIX, and one of those things are the hard links; That was my first thought. My second thought was as many of you are thinking right now, *if you are still reading this post*, what a stupid discussion!

 

Despite of those thoughts I kept reading the thread, and what about was my surprise when I found this brilliant explanation about the 'ln' command, i-nodes, the way the UNIX filesystems are thought, some historical hints and, of course why was better the use of 'ln' than

'cp'

 

Here is the post, from David Thompson (dat1965@yahoo.com) at comp.unix.shell (pasted here without permission)

 

-----------------------------------------------------------

[begin of quote]

 

From: David Thompson (dat1965@yahoo.com)

 

Subject: Re: replacing a string in all the files under a directory

(and subdirectories)

Newsgroups: comp.unix.shell

View: Complete Thread (20 articles) | Original Format

Date: 2002-09-18 01:42:39 PST

 

"Bruce Burhans" <bburhan1@earthlink.net> wrote

> Would you mind explaining that use of ln? Never

> seen the like...

 

Hi Bruce,

[WARNING: over zealous typist with too much free time.]

 

The concept of a "link" is somewhat hard to fathom, esp

since many people are introduced to the similar concept

of a "file", and never quite learn to separate the two

very well. Hopefully, what I've written here will help

to clarify the distinction, which in turn will be useful

in explaining the ln command. I know this is greatly

simplified, but hopefully the "big picture" is evident.

 

Sooner or later in the world of Unix, you come across

the term "i-node". What is an i-node? Well, think

of an i-node as the index into an array that specifies

everything about a file EXCEPT the file's name. This

array is stored on disk, and managing this disk-based

array is one of the primary responsibilities of Unix.

 

Ok, so imagine an array of 1000 i-nodes,

 

index information

+--------+-----------------------------+

| 0001 | owner,group,perms,dates,etc |

+--------+-----------------------------+

| 0002 | owner,group,perms,dates,etc |

+--------+-----------------------------+

| 0003 | owner,group,perms,dates,etc |

+--------+-----------------------------+

...

+--------+-----------------------------+

| 0999 | owner,group,perms,dates,etc |

+--------+-----------------------------+

| 1000 | owner,group,perms,dates,etc |

+--------+-----------------------------+

 

Most people understand the i-node as a record stored

on disk, but the key to that AH-HAH feeling is to focus

on the index column above. Think of an i-node as this

index. Therefore, an i-node is an integer number, unique

for each file system. I'll use the term "i-node" to

represent the index into the above array.

Why is an i-node important? Because, Unix stores directory

information in a way that associates the filename with an

i-node. That is, a directory is simply an association between

the index in the i-node table and the string that represents

the filename. This association is called a link. Actually,

then, a directory is a list of these associations. Each

entry in the directory is one association between an i-node

number and a string filename.

 

Ok, so imagine the directory itself stored on disk, something

like an array, where each directory entry contains two pieces

of information: the i-node and the filename.

 

i-node filename

+--------+--------------------+

| 0021 | file1.c | +-----------------------------+

| 0257 | file2.c | +-----------------------------+

| 0008 | Makefile |

+-----------------------------+

...

+-----------------------------+

| 0834 | XYZ.c | +-----------------------------+

| 0172 | cmd.sh |

+-----------------------------+

 

Notice how the directory seemingly contains i-nodes in no

particular order. This is easily understood because as files

are created on disk, the next available i-node number is used

in the i-node table, and then the association between a filename

and the i-node number is stored in the directory. On a very

active disk, files may be created and deleted often, so i-nodes

are freed and re-used often as well. [Also, note that the i-node

table is not concerned with the tree-like directory structure; this

organization is conceptually a level above the i-node table. That

is, the two entries . and .. in each directory are special files

that help to maintain that hierarchy. Think of i-nodes as a simple

array of records stored on disk, where each record tracks only the

information (most of, anyways) you see from the 'ls -l' command.]

 

AH-HAH!

 

The directory is storing links! Think of it like this: "i-node

834 is associated with filename XYZ.c", which is what you mean

when you say "filename XYZ.c is linked to i-node 834". Same

difference. This is the origin of the term "link" in Unix.

 

Try typing 'ls -il' in your home directory. See that first column

showing i-node numbers? The 'ls' command doesn't normally show

you the i-node number because it's not terribly useful in every

day life. Let's create a simple file,

 

$ echo hello david > hello.txt

$ ls -il

50175 -rw-rw-r-- 1 davidt eng 12 Sep 17 23:16 hello.txt

 

So, I just created a file with the name hello.txt, and Unix used

i-node 50175 to store the owner, group, permissions, etc. We say

that the i-node 50175 has 1 link. In fact, look carefully and you

can see the 3rd column, it has a value of 1; which is telling us

that the i-node referenced by hello.txt has 1 link, er, 1 filename

associated with it.

What if we did this neat-o trick: Let's store another entry in

the directory for the same i-node 50175 but use a different name.

 

Ie, we want our directory to look something like,

 

i-node filename

+--------+--------------------+

| 50175 | hello.txt | +-----------------------------+

| 50175 | goodbye.txt |

+-----------------------------+

 

Because of how the Unix directory was designed over 20 years ago, it's

entirely possible for the i-node number to be the same, as long as

the filename is different. Pretty cool design.

 

Ok, so how do you do that? How do you make 2 different filenames point

to the same i-node? By using the ln command, like this,

 

$ ln hello.txt goodbye.txt

$ ls -il hello.txt goodbye.txt

50175 -rw-rw-r-- 2 davidt eng 12 Sep 17 23:16 goodbye.txt

50175 -rw-rw-r-- 2 davidt eng 12 Sep 17 23:16 hello.txt

 

See how both filenames have the same i-node number? We now say that

i-node 50175 has 2 links, which are hello.txt and goodbye.txt. Look

above at the 3rd column, see how it now says 2?

 

The ln command above created what is known today as a "hard" link.

This

term came into vogue to distinguish it from a "symbolic" link, which

is

a newer feature of the ln command. Until symbolic links were invented

by BSD (I think), hard links weren't called "hard" links, they were

just

plain links. So, nowadays, we have to distinguish between the two

ideas,

so the old-fashioned traditional idea of a link is now popularly known

as a hard link. But beware, most Unix documentation and writings won't

necessarily mention that; they'll just say link.

 

Ok, a hard link is very interesting. Note what happens if I append

more

text to the end of (either) file,

$ echo more more more >> hello.txt

$ ls -il hello.txt goodbye.txt

50175 -rw-rw-r-- 2 davidt eng 27 Sep 17 23:29 goodbye.txt

50175 -rw-rw-r-- 2 davidt eng 27 Sep 17 23:29 hello.txt

 

See how both files have the same number of 27 bytes, and even the same

timestamp? That's because they're the same i-node, so the 'ls' command

is being forced to read the identical information from the i-node

table

for both files. Even if you change the permissions on one of the

files,

 

$ chmod 777 goodbye.txt

$ ls -il hello.txt goodbye.txt

50175 -rwxrwxrwx 2 davidt eng 27 Sep 17 23:29 goodbye.txt

50175 -rwxrwxrwx 2 davidt eng 27 Sep 17 23:29 hello.txt

 

the permissions for both files are changed. Can you see why? It's

because the permissions are stored in the i-node itself. So, if you

change permissions of one, you change the other. Say this statement

until it sinks in: "Files doesn't have permissions, only i-nodes

have permissions". This is a very (very) strictly technical

statement. But it makes sense. The 'ls' command has to read

the information from the i-node record, and inside this i-node

record is the permissions. When you type 'ls xyz.c', what happens

under the hood is,

 

1. The ls command finds filename xyz.c in the current directory,

2. The i-node for filename xyz.c is found,

3. The i-node is used to lookup the information in the i-node table.

4. The ls command formats this i-node information and prints it.

 

Another interesting feature of hard links is this: when you remove one

of the links, the i-node is not freed until all filenames referencing

that i-node are removed. In fact, this helps to explain the historical

roots of the unlink() system call. The Unix rm command is a program

that calls the unlink() system call, which removes the link

association

in the directory, as well as free the i-node and disk space.

 

However, whereas the link association between an i-node and filename

is always removed from the directory, Unix won't free the i-node in

the i-node table (or free up the disk space) until no other directory

entry references that i-node. This is a very important realization,

and is a useful trick to the clever Unix (C or shell) programmer.

 

Andreas takes advantage of this trick. The neat thing about the

two system calls link() and unlink() is that they are atomic. That

is, they are guaranteed to update the directory structure and the

i-node table without interference from other processes. The ln

command implements the Unix link() system call, and rm implements

the unlink() system call.

 

Andreas knows that whenever you use sed to edit a file, and you want

to save the original file as a backup, you should do it this way,

 

sed 'your-commands' filename > filename.tmp

ln filename filename.bak

mv filename.tmp filename

 

because if you press CTRL-C anywhere in the middle of any of these

commands, you won't lose your original data. Use of the ln command

to create a hard link takes advantage of the atomic nature of the

Unix link() system call.

 

Let's study what the 3 commands above must be doing to the directory.

If you focus on the i-node numbers, it should help you understand the

ln command alot better,

 

1. sed 'your-commands' filename > filename.tmp

 

i-node filename

+--------+--------------------+

| 63527 | filename | original file +-----------------------------+

| 73626 | filename.tmp | newer file, with edits

+-----------------------------+

 

2. ln filename filename.bak

 

i-node filename

+--------+--------------------+

| 63527 | filename | original file +-----------------------------+

| 73626 | filename.tmp | newer file, with edits

+-----------------------------+

| 63527 | filename.bak | original file

+-----------------------------+

 

3. mv filename.tmp filename

 

i-node filename

+-----------------------------+

| 73626 | filename | newer file, with edits

+-----------------------------+

| 63527 | filename.bak | original file

+-----------------------------+

 

See how the original filename (with same i-node number) physically

became the backup file? The ln command provides a safer way to

achieve this.

 

I enjoyed typing this. Hopefully, it is worthy.

 

--

David Thompson

 

--------------------------------------------------

[end of quote]

 

I really hope that this post has enjoyed you the same way it did to me, and more important, that it had contributed to a better understanding of how Unixes work.

 

arusabal

 

 

rolf

Moderator

Joined: 16 Apr 2002

Posts: 968

Location: Oakland, CA USA

Posted: Fri Sep 20, 2002 7:16 pm Post subject:

_________________________________________________________________

 

 

Thanks, arusabal. That made inodes (plus files, permissions, links) a lot more understandable.

 

 

UnTamed

Frequent user

Joined: 01 May 2002

Posts: 126

Location: gmt-5:00

Posted: Sat Sep 21, 2002 4:12 pm Post subject:

_________________________________________________________________

 

 

_Very_ interresting read Idea

 

Thanks!

 

 

theYinYeti

Senior user

Joined: 13 May 2002

Posts: 452

Location: Cannes (France)

Posted: Mon Sep 23, 2002 10:25 am Post subject:

_________________________________________________________________

 

 

Very good article.

For all of you wanting to experiment with this wonderful tool (hard links), take care of this:

Notice this phrase: "an i-node is an integer number, unique for each file system".

This means that you cannot hard-link a file into another filesystem. For example, if you have:

/dev/hda1 -> /

/dev/hda2 -> /home

You cannot do that

 

Code:

ln /home/me/importantfile.ps /tmp/importantfile-save.ps



 

 

Yves.

 

 

 

Editor's note: This thread was originally posted at the old MUB (Mandrake User Board at club-nihil). This post is the result of a 99% automatic backup, so due to its nature some text may be lost (improbable but possible).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
 Share

×
×
  • Create New...