aru Posted January 13, 2003 Report Share Posted January 13, 2003 arusabal Moderator Joined: 17 Apr 2002 Posts: 836 Location: Spain Posted: Fri Sep 20, 2002 2:59 pm Post subject: how 'ln' can make a better understanding of Unix (Linux) _________________________________________________________________ This morning I was reading a thread from comp.unix.shell named . I started to read it because I saw a bunch of answers there in such an apparently easy question (It seemed to me just a simple case of using combined 'find' and 'sed'). That called my curiosity. Within that thread there was a discussion (among others) about the use of the 'ln' command instead of 'cp' while making backups of the files which were modified by 'sed'. the backup stuff was something like this: sed '<command>' file > file.tmp cp file file.bak mv file.tmp file That's the way I've always done it! But another guy show this way: sed '<command>' file > file.tmp ln file file.bak mv file.tmp file So what? Why using 'ln' there? I've always thought that 'ln' was just for making links of one file to another and thus making both files identically one to each other so something that happens to one of the files will happen to the other (I do understand perfectly the soft links which are just pointers to a 'real file', and never cared about hard ones). So where the hell is the backup? Evidently I don't understand many things of UNIX, and one of those things are the hard links; That was my first thought. My second thought was as many of you are thinking right now, *if you are still reading this post*, what a stupid discussion! Despite of those thoughts I kept reading the thread, and what about was my surprise when I found this brilliant explanation about the 'ln' command, i-nodes, the way the UNIX filesystems are thought, some historical hints and, of course why was better the use of 'ln' than 'cp' Here is the post, from David Thompson (dat1965@yahoo.com) at comp.unix.shell (pasted here without permission) ----------------------------------------------------------- [begin of quote] From: David Thompson (dat1965@yahoo.com) Subject: Re: replacing a string in all the files under a directory (and subdirectories) Newsgroups: comp.unix.shell View: Complete Thread (20 articles) | Original Format Date: 2002-09-18 01:42:39 PST "Bruce Burhans" <bburhan1@earthlink.net> wrote > Would you mind explaining that use of ln? Never > seen the like... Hi Bruce, [WARNING: over zealous typist with too much free time.] The concept of a "link" is somewhat hard to fathom, esp since many people are introduced to the similar concept of a "file", and never quite learn to separate the two very well. Hopefully, what I've written here will help to clarify the distinction, which in turn will be useful in explaining the ln command. I know this is greatly simplified, but hopefully the "big picture" is evident. Sooner or later in the world of Unix, you come across the term "i-node". What is an i-node? Well, think of an i-node as the index into an array that specifies everything about a file EXCEPT the file's name. This array is stored on disk, and managing this disk-based array is one of the primary responsibilities of Unix. Ok, so imagine an array of 1000 i-nodes, index information +--------+-----------------------------+ | 0001 | owner,group,perms,dates,etc | +--------+-----------------------------+ | 0002 | owner,group,perms,dates,etc | +--------+-----------------------------+ | 0003 | owner,group,perms,dates,etc | +--------+-----------------------------+ ... +--------+-----------------------------+ | 0999 | owner,group,perms,dates,etc | +--------+-----------------------------+ | 1000 | owner,group,perms,dates,etc | +--------+-----------------------------+ Most people understand the i-node as a record stored on disk, but the key to that AH-HAH feeling is to focus on the index column above. Think of an i-node as this index. Therefore, an i-node is an integer number, unique for each file system. I'll use the term "i-node" to represent the index into the above array. Why is an i-node important? Because, Unix stores directory information in a way that associates the filename with an i-node. That is, a directory is simply an association between the index in the i-node table and the string that represents the filename. This association is called a link. Actually, then, a directory is a list of these associations. Each entry in the directory is one association between an i-node number and a string filename. Ok, so imagine the directory itself stored on disk, something like an array, where each directory entry contains two pieces of information: the i-node and the filename. i-node filename +--------+--------------------+ | 0021 | file1.c | +-----------------------------+ | 0257 | file2.c | +-----------------------------+ | 0008 | Makefile | +-----------------------------+ ... +-----------------------------+ | 0834 | XYZ.c | +-----------------------------+ | 0172 | cmd.sh | +-----------------------------+ Notice how the directory seemingly contains i-nodes in no particular order. This is easily understood because as files are created on disk, the next available i-node number is used in the i-node table, and then the association between a filename and the i-node number is stored in the directory. On a very active disk, files may be created and deleted often, so i-nodes are freed and re-used often as well. [Also, note that the i-node table is not concerned with the tree-like directory structure; this organization is conceptually a level above the i-node table. That is, the two entries . and .. in each directory are special files that help to maintain that hierarchy. Think of i-nodes as a simple array of records stored on disk, where each record tracks only the information (most of, anyways) you see from the 'ls -l' command.] AH-HAH! The directory is storing links! Think of it like this: "i-node 834 is associated with filename XYZ.c", which is what you mean when you say "filename XYZ.c is linked to i-node 834". Same difference. This is the origin of the term "link" in Unix. Try typing 'ls -il' in your home directory. See that first column showing i-node numbers? The 'ls' command doesn't normally show you the i-node number because it's not terribly useful in every day life. Let's create a simple file, $ echo hello david > hello.txt $ ls -il 50175 -rw-rw-r-- 1 davidt eng 12 Sep 17 23:16 hello.txt So, I just created a file with the name hello.txt, and Unix used i-node 50175 to store the owner, group, permissions, etc. We say that the i-node 50175 has 1 link. In fact, look carefully and you can see the 3rd column, it has a value of 1; which is telling us that the i-node referenced by hello.txt has 1 link, er, 1 filename associated with it. What if we did this neat-o trick: Let's store another entry in the directory for the same i-node 50175 but use a different name. Ie, we want our directory to look something like, i-node filename +--------+--------------------+ | 50175 | hello.txt | +-----------------------------+ | 50175 | goodbye.txt | +-----------------------------+ Because of how the Unix directory was designed over 20 years ago, it's entirely possible for the i-node number to be the same, as long as the filename is different. Pretty cool design. Ok, so how do you do that? How do you make 2 different filenames point to the same i-node? By using the ln command, like this, $ ln hello.txt goodbye.txt $ ls -il hello.txt goodbye.txt 50175 -rw-rw-r-- 2 davidt eng 12 Sep 17 23:16 goodbye.txt 50175 -rw-rw-r-- 2 davidt eng 12 Sep 17 23:16 hello.txt See how both filenames have the same i-node number? We now say that i-node 50175 has 2 links, which are hello.txt and goodbye.txt. Look above at the 3rd column, see how it now says 2? The ln command above created what is known today as a "hard" link. This term came into vogue to distinguish it from a "symbolic" link, which is a newer feature of the ln command. Until symbolic links were invented by BSD (I think), hard links weren't called "hard" links, they were just plain links. So, nowadays, we have to distinguish between the two ideas, so the old-fashioned traditional idea of a link is now popularly known as a hard link. But beware, most Unix documentation and writings won't necessarily mention that; they'll just say link. Ok, a hard link is very interesting. Note what happens if I append more text to the end of (either) file, $ echo more more more >> hello.txt $ ls -il hello.txt goodbye.txt 50175 -rw-rw-r-- 2 davidt eng 27 Sep 17 23:29 goodbye.txt 50175 -rw-rw-r-- 2 davidt eng 27 Sep 17 23:29 hello.txt See how both files have the same number of 27 bytes, and even the same timestamp? That's because they're the same i-node, so the 'ls' command is being forced to read the identical information from the i-node table for both files. Even if you change the permissions on one of the files, $ chmod 777 goodbye.txt $ ls -il hello.txt goodbye.txt 50175 -rwxrwxrwx 2 davidt eng 27 Sep 17 23:29 goodbye.txt 50175 -rwxrwxrwx 2 davidt eng 27 Sep 17 23:29 hello.txt the permissions for both files are changed. Can you see why? It's because the permissions are stored in the i-node itself. So, if you change permissions of one, you change the other. Say this statement until it sinks in: "Files doesn't have permissions, only i-nodes have permissions". This is a very (very) strictly technical statement. But it makes sense. The 'ls' command has to read the information from the i-node record, and inside this i-node record is the permissions. When you type 'ls xyz.c', what happens under the hood is, 1. The ls command finds filename xyz.c in the current directory, 2. The i-node for filename xyz.c is found, 3. The i-node is used to lookup the information in the i-node table. 4. The ls command formats this i-node information and prints it. Another interesting feature of hard links is this: when you remove one of the links, the i-node is not freed until all filenames referencing that i-node are removed. In fact, this helps to explain the historical roots of the unlink() system call. The Unix rm command is a program that calls the unlink() system call, which removes the link association in the directory, as well as free the i-node and disk space. However, whereas the link association between an i-node and filename is always removed from the directory, Unix won't free the i-node in the i-node table (or free up the disk space) until no other directory entry references that i-node. This is a very important realization, and is a useful trick to the clever Unix (C or shell) programmer. Andreas takes advantage of this trick. The neat thing about the two system calls link() and unlink() is that they are atomic. That is, they are guaranteed to update the directory structure and the i-node table without interference from other processes. The ln command implements the Unix link() system call, and rm implements the unlink() system call. Andreas knows that whenever you use sed to edit a file, and you want to save the original file as a backup, you should do it this way, sed 'your-commands' filename > filename.tmp ln filename filename.bak mv filename.tmp filename because if you press CTRL-C anywhere in the middle of any of these commands, you won't lose your original data. Use of the ln command to create a hard link takes advantage of the atomic nature of the Unix link() system call. Let's study what the 3 commands above must be doing to the directory. If you focus on the i-node numbers, it should help you understand the ln command alot better, 1. sed 'your-commands' filename > filename.tmp i-node filename +--------+--------------------+ | 63527 | filename | original file +-----------------------------+ | 73626 | filename.tmp | newer file, with edits +-----------------------------+ 2. ln filename filename.bak i-node filename +--------+--------------------+ | 63527 | filename | original file +-----------------------------+ | 73626 | filename.tmp | newer file, with edits +-----------------------------+ | 63527 | filename.bak | original file +-----------------------------+ 3. mv filename.tmp filename i-node filename +-----------------------------+ | 73626 | filename | newer file, with edits +-----------------------------+ | 63527 | filename.bak | original file +-----------------------------+ See how the original filename (with same i-node number) physically became the backup file? The ln command provides a safer way to achieve this. I enjoyed typing this. Hopefully, it is worthy. -- David Thompson -------------------------------------------------- [end of quote] I really hope that this post has enjoyed you the same way it did to me, and more important, that it had contributed to a better understanding of how Unixes work. arusabal rolf Moderator Joined: 16 Apr 2002 Posts: 968 Location: Oakland, CA USA Posted: Fri Sep 20, 2002 7:16 pm Post subject: _________________________________________________________________ Thanks, arusabal. That made inodes (plus files, permissions, links) a lot more understandable. UnTamed Frequent user Joined: 01 May 2002 Posts: 126 Location: gmt-5:00 Posted: Sat Sep 21, 2002 4:12 pm Post subject: _________________________________________________________________ _Very_ interresting read Idea Thanks! theYinYeti Senior user Joined: 13 May 2002 Posts: 452 Location: Cannes (France) Posted: Mon Sep 23, 2002 10:25 am Post subject: _________________________________________________________________ Very good article. For all of you wanting to experiment with this wonderful tool (hard links), take care of this: Notice this phrase: "an i-node is an integer number, unique for each file system". This means that you cannot hard-link a file into another filesystem. For example, if you have: /dev/hda1 -> / /dev/hda2 -> /home You cannot do that Code: ln /home/me/importantfile.ps /tmp/importantfile-save.ps Yves. Editor's note: This thread was originally posted at the old MUB (Mandrake User Board at club-nihil). This post is the result of a 99% automatic backup, so due to its nature some text may be lost (improbable but possible). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.