How does `pwd` operate?

After re-reading some of the papers from Bell Labs, something clicked in my mind, and I'm hooked. I'm now reading "The UNIX Programming Environment", by Kernighan and Pike. It's got that fun style you're probably familiar with, if you've read K&R or the blue book. One of the first exercise questions, on the chapter on file systems:

(harder) How does the pwd command operate?

Seems like a fun one. My first guess is that it used $PWD from the environment. Let's test that.

~ % PWD=/usr/local pwd
/home/gg

There's a non-standard -L flag that seems to use $PWD. Maybe that one would do?

~ % PWD=/usr/local pwd -L
/home/gg

Not that either. Wait a second, what's pwd again?

~ % type pwd
pwd is a shell builtin

Hah, so I was calling the wrong one. So I replace pwd with /bin/pwd in my queries above, but the results are the same.

My next hypothesis is that it would somehow expand . to absolute. I'm not aware of a UNIX command that performs such an expansion, so I man -k some keywords. Nothing.

Maybe pwd(1)? It's not terribly descriptive (it's such a simple utility after all.) It doesn't explain the implementation at all, but links me to getcwd(3). Alright, let's just look at the source.

OpenBSD implementation

int
main(int argc, char *argv[])
{
    int ch, lFlag = 0;
    const char *p;

    /* pledge(), parse flags... */

    if (lFlag)
        p = getcwd_logical();
    else
        p = NULL;
    if (p == NULL)
        p = getcwd(NULL, 0);

    if (p == NULL)
        err(EXIT_FAILURE, NULL);

    puts(p);

    exit(EXIT_SUCCESS);
}

Unless -P is passed, it just calls getcwd. Let's see what that "logical" function does:

static char *
getcwd_logical(void)
{
    char *pwd, *p;
    struct stat s_pwd, s_dot;

    /* Check $PWD -- if it's right, it's fast. */
    pwd = getenv("PWD");
    puts("PWD found in the ENV");
    puts(pwd);
    if (pwd == NULL)
        return NULL;
    if (pwd[0] != '/')
        return NULL;

    /* check for . or .. components, including trailing ones */
    for (p = pwd; *p != '\0'; p++)
        if (p[0] == '/' && p[1] == '.') {
            if (p[2] == '.')
                p++;
            if (p[2] == '\0' || p[2] == '/')
                return NULL;
        }

    if (stat(pwd, &s_pwd) == -1 || stat(".", &s_dot) == -1)
        return NULL;
    if (s_pwd.st_dev != s_dot.st_dev || s_pwd.st_ino != s_dot.st_ino)
        return NULL;
    return pwd;
}

So -L does check for $PWD, but only returns it if it's pointing to the same inode, on the same device. You can't just manually override it to be anything you want. In that case, it falls back to the libc call to getcwd.

Makes me wonder what use this -L flag is in the first place. Maybe it has to do with symlinks?

/tmp % mkdir one
/tmp % ln -s one two
/tmp % cd one
/tmp/one % /bin/pwd
/tmp/one
/tmp/one % cd ../two
/tmp/two % /bin/pwd
/tmp/one
/tmp/two % /bin/pwd -L
/tmp/two

Makes sense. Anyway, that's not a very satisfying answer. I doubt the authors' intended answer would have been "defer to the libc".

Plan9

Ok, OpenBSD source didn't help. But Plan9 is Unicibus ipsis Unicior, so maybe we can find the answer there. Let's inspect pwd(1):

     DESCRIPTION
          Pwd prints the path name of the working (current) directory.
          Pwd is guaranteed to return the same path that was used to
          enter the directory.  If, however, the name space has
          changed, or directory names have been changed, this path
          name may no longer be valid.  (See fd2path(2) for a descrip-
          tion of pwd's mechanism.)

Hah, that was helpful! Now, from fd2path(2):

          As an example, getwd(2) is implemented by opening . and exe-
          cuting fd2path on the resulting file descriptor.

So my hypothesis above was correct, at least when it comes to Plan9. Also, another cool thing about Plan9 is that it lets me inspect a folder ("everything is a file", right?)

% cat . > foo
% cat foo

I can then run foo through hexdump and see what's in there.

GNU

Let's see how coreutils implements it... nope. Just nope.

Wrapping up

So that was it, a brief excursion into different implementations of a simple command in UNIX. The difference in complexity is palpable. The Plan9 documentation is fun to read, and so is the code.

More from gclv
All posts