Characteristical common properties of the Bourne shell

Characteristical common properties of the traditional Bourne shells

distinguishing them from bourne compatible shells

The purpose of this page is not to point out portability problems (this is a side effect) but to identify characteristic features; probably just for fun. Otherwise this page also would have to list what the bourne shell might share with one or a few other shells only. Instead, if you're interested in portability: Paul Jarc tries to document all the suspicious or nonportable constructs that can impact portable script-writing, and the autoconf documentation contains a chapter about portable shell programming. (The "#! /" issue in earlier releases likely is a myth, though.) Keep in mind that autoconf strictly aims at maximizing portability for install scripts, thus earlier versions even suggested to avoid all unportable extensions after Version7.

Content:

   Common to all versions
   Common to all versions since SVR1
   Common to all versions since SVR2 - related to functions
   Common to all versions since SVR2 - related to the unset built-in
   Common to all versions since SVR2 - related to the hash built-in
   Common to all versions since SVR2 - other issues
   Not exclusive to traditional Bourne shells (wrongly might be considered characteristical)

Common to all versions:

Command substitution is not also implemented with $()
The syntax "export VAR=value" isn't accepted
(but "VAR=value export VAR" is accepted and portable)
You can use the caret, '^', as replacement for the pipe symbol, '|'. This origins from the main predecessor, the Thompson shell. (An exception is the Version 8 shell, which is not widely distributed, where this feature was removed while cleaning up the design.)
However, this feature is not documented in all manuals (e.g., not for Version 7, the BSDs, later Ultrix sh, System III, AIX 3 ff., IRIX 3 ff., Reliant and Solaris).
Some manuals which don't document this alternative, also wrongly don't list it under having "a special meaning to the shell" ("Quoting"), e.g., Version 7, the BSDs, System III, AIX 3 ff. and Reliant -- although '^' is actually useful in practice for, e.g., "stty(1)".
On System III, /etc/wall is a script and it contains a line like the following, interestingly even with both ^ and |:
```
    who^sed -e '[..build a commandline..]' | sh
```
Redirecting commands (other than simple-commands), that is, control structures like "while", "for" and "if", but also "{ list;}", causes a sub-shell to be created. The main impacts are: the isolated environment (i.e., variable assignments) and "exit" does not leave the script but only the redirected structure.
Workaround for variable assignments (example taken from the Heirloom shell package):
```
    exec 5<&0 <input
    while read line; do
        ...
        variable=value
        ...
    done
    exec <&5 5<&\- 
```
(The variable in a redirected "while read variable" construct gets assigned the empty value when reading EOF, i.e., if you don't break out before that. This may appear as if the variable was localized even in modern shells.)

Consider a variable which has been picked up by the shell from the environment at startup. Modifying this variable creates a local copy.
Check the difference: The built-in set will see the local value, whereas the commmand env will see the exported value.
Using export changes the behaviour of the variable, so this is needed once only, not again after further changes.

    outer-shell$ VAR=outer sh
    $ echo VAR:$VAR; set|grep VAR; env|grep VAR
    VAR:outer
    VAR=outer
    VAR=outer

    $ VAR=inner
    $ echo VAR:$VAR; set|grep VAR; env|grep VAR
    VAR:inner       # echo and set are built-ins and see the local copy of VAR
    VAR=inner
    VAR=outer       # env is a command and gets the unmodified VAR passed on

    $ export VAR
    $ echo VAR:$VAR; set|grep VAR; env|grep VAR
    VAR:inner       # "export" marked the local copy to be passed on from now
    VAR=inner
    VAR=inner

    $ VAR=other
    $ echo VAR:$VAR; set|grep VAR; env|grep VAR
    VAR:other       # You do not have to re-"export" further changes
    VAR=other
    VAR=other

This can even affect scripts being called:

If you call a script directly from a bourne shell ("./script" without shebang),
then the shell only forks off a subhell and reads in the script.
The split between original and local copy of the variable is still present in the subshell.

But if the script is a real executable with #! magic, or if another sh is called,
then fork and exec is used and only the original unmodified variable will be visible.

    $ cat > plain-script
    VAR=inner
    env|grep VAR
    <ctrl-D>:

    $ cat > shebang-script
    #!/bin/sh
    VAR=inner
    env|grep VAR
    <ctrl-D>:
    $ chmod +x plain-script shebang-script

    outer-shell$ VAR=outer sh
    $ ./plain-script      # the executable bit has to be set
    VAR=inner
    $ ./shebang-script
    VAR=outer
    $ sh plain-script
    VAR=global

If the shell did an exec() to sh with the script as argument after the fork() anyway, this side effect wouldn't occur.

This whole item has been corrected and improved a lot, thanks to suggestions from Bela Lubkin.

IFS is even used to split (unquoted) words, thus ''IFS=X; echoXfoo'' calls echo with the argument foo.
Positional parameters beyond $9 are not directly accessible, but only via "shift".
For the ">>" redirection, open(2) is not called with O_APPEND, but instead the shell jumps to the end with lseek(2).
Going on with writing after a truncation (e.g., cutting a logfile) will lead to a sparse file of the original size, i.e., then you'll read NULL bytes at the beginning.
See service.c:initio() and search for IOAPP.
In fact, a few other early shells, like csh and ksh86 (see io.c:fdopen()), behave similar. But none of the later bourne compatible shells do so.
The reason for all this (pointed out in <3DAF47B9.PR1RJ8L6@bigfoot.de>) is that O_APPEND had not been introduced until System III and 4.2BSD ("4.1c.2").
Flags must not be separated, i.e., you need "sh -vx" instead of "sh -v -x".
In a script, any built-in failing unexpectedly is considered fatal and the shell doesn't return with an error condition but exits the script.
Modern shells limit this to a subset of special built-ins with varying details.
Bourne shell specific examples are
- cd nonexistent || echo error
- test x -unknownoperator y
- read (without any arguments)
But these are not fatal:
- exec 0<&-; read var (reading with a closed STDIN)
- kill (without any arguments)
Fatal examples which are not specific to Bourne shells:
- readonly var; unset var (pdksh and mksh abort)
- . non-accessible-file (most shells and POSIX, except bash)
An appropriate workaround for the cd example is: (cd nonexistent) && cd nonexistent || echo error
You can also see this in an interactive shell: if you type several commands separated by semicolon,
the execution of the command line is aborted immediately.
(Thanks to Jilles Tjoelker for pointing out special builtins and the last two exceptions.)
Although the following works in all variants with an external command like cat, it fails with built-ins like "read a" or ":". You get "/tmp/sh12345: cannot open", except on OpenServer5.x where this has been fixed.
```
     command<<EOF
     `pwd`
     EOF
```
If you make the command asynchronous (&), it fails even with external commands,
```
     command<<EOF &
     `pwd`
     EOF
```
Preceding the call of a built-in with a variable assignment will not only affect the execution environment of the built-in,
but the assignment will remain in effect in the current environment, for example in "IFS= read variable".
POSIX later adopted this behaviour, but only for special built-ins.
D. Korn gave an explanation for that on the austin group mailing list.
"for i do" (omitting the optional "in word") is a portable way to iterate over all arguments.
However, "for i; do" is not accepted in traditional Bourne shells, although you interestingly may insert a newline instead of the semicolon.
In a for loop, unquoted strings which look like assignments preceding a command, are wrongly parsed away:
```
    for i in var=value x; do echo "$i"; done
```
only prints x, but the following does not (except the flag -k is set).
```
    for i in x var=value; do echo "$i"; done
```
(Pointed out by Stefano Lattarini on the autoconf-patches mailing list).
(Confirmed until SVR4.2 Todo: SVR4.2 MP2 ff..)
Solution: quote or put the string into a variable.
The shell doesn't wait for all elements of a pipeline. The following returns immediately:
```
true|sleep 5|true
```
Be aware, that the early 7th edition and System III variants certainly do wait for a process if they are interested in its exit status.
There are a few other shells which behaved similar, bash-1.05 and ksh86, but these were changed soon.
If a redirection fails, the output goes to the original file descriptor (echo x 1>&3). [spotted in the autoconf documentation]
You can't use ${var=literal value}, i.e., a literal value with blanks but not surrounded by quotes.
Workaround: use a variable instead: tmp="literal value"; echo ${var=$tmp},
or quote the value like ${var="literal value"} or ${var=literal\ value}.
Be careful with the \ method: quoting the whole expression,
var=set; echo "${var+literal\ value}"
may or may not include the backslash in the output depending on the shell (thanks to Martin Väth for pointing out)
You can't unset the parameter list with "set --", but only by shifting away the parameters (or shorter: set dummy; shift).
Order of variable assignments in command substitution is reversed (echo `var=old var=new env` prints var=old)
The shell forbids to trap SIGSEGV, because this signal is usually used for the internal memory management.
However, as it doesn't work this way on the 68K architecture, memory management is implemented differently on SunOS 3.1 (and ff.) for example. (And trapping SIGSEGV was allowed then on SunOS 3 and 4.)
See more about this memory management at the bottom of the main page.
"echo a > file1 b > file2" isn't evaluated correctly. "a b" will end in file1, although it should be in file2. It works correctly if not inserting anything between the redirections: "echo a b > file1 > file2".
As exception, this is fixed on SunOS 5.6.
"2>&1 program >/dev/null" isn't evaluated correctly: the order is reversed and thus stderr is also redirected.
As exception, this is fixed on SunOS 5.6.
(Some early korn shells before ksh88 also behaved like this)
File descriptor 19 is used for internal purposes without error checking, i.e., if the shell (called with fd 19 being open)
calls another program, this one will not be able to read from the fd anymore.
An exception is SCO Unix 3.2v4.2 (and its successors Open Desktop 3 and OpenServer 5) where this was fixed.
On HP-UX it was not fixed, but changed: fd 59 is hardcoded instead.
"case x in in)" fails with "syntax error at line x: `in' unexpected" (exception: fixed on SunOS 5.x).
Bourne shells accept the redirection <> (undocumented) but don't have implemented it correctly
Compare cmd.c, inout(), keyword IORDW, with service.c, initio(): here, IORDW (and thus O_RDWR) is missing.
As exception this was fixed on SunOS 5.6 ff.
An unbalanced single, double or back quote is silently inserted at EOF.
(But note that some other, special combinations are even accepted by ksh88 and ksh93, too:
echo "`pwd"
echo `'pwd`
echo `"pwd`)

Common to all versions since SVR1:

(implementation of set -e changed)

"set -e" doesn't affect commands following after semicolon on the same line.
As exception, this is fixed on SunOS since 5.5.

Common to all versions since SVR2, related to the new functions:

Function and variable names share the same namespace, i.e., they can't have the same name.
Variable assignments preceding a function call (example: "var=1 function") are without effect, that is, the variable remains unchanged both in the function and in the current environment.
Functions which make use of shell-internal temporary files are fragile.
- Define a function with an embedded here-document:
```
      my_function() {
      cat<<EOF
      $@
      EOF
      } 
```
  This will already create a temporary file by the time this definition.
  As illustration: "type my_function" then will result in something like
```
      my_function is a function
      my_function(){
      cat 0<</tmp/sh123450
      } 
```
  with /tmp/sh123450 containing the body of the here-document, $@.
  Two notes in advance:
  - there are a few other shells which also show this behaviour (see below).
  - there is no problem about the variable $@, it's not expanded until it is used.
- The problem, however: the function is visible in called scripts if they are created as subshell.
  This happens for example if you call an executable shell script (without the #! mechanism).
  - Trying to unset the function in the second script will likely terminate it due to a SIGSEGV.
  - If the sub-script terminates correctly, then the temporary file is erased.
    Now, if you call the function again from the parent script you get an error like this:
    script: /tmp/sh123456: cannot open
  (A workaround for all this is to put the function body into "eval '...'")

Common to all versions since SVR2, related to the new "`unset`" built-in:

You can't unset IFS PATH PS1 PS2 MAILCHECK.
As exception, bash-1.x doesn't allow this for all (but MAILCHECK), too

Common to all versions since SVR2, related to the new "`hash`" built-in:

PATH may not contain more than 255 elements.

Common to all versions since SVR2, other issues

Before SVR2, here-documents couldn't be run asynchronously (as part of a command substution here).

For the following code there's a race condition:
the right command of the pipeline (grep) might be starting before the temporary file containing the here-document was created.
The shell then errors with a "/tmp/sh12345: cannot open"

    echo `cat <<EOF|grep x
    x
    EOF
    `

If the here-document is also right part of a pipeline at the same time,
then the probability to trigger the race was highly increased for me:

    echo `:|cat <<EOF|grep x
    x
    EOF
    `

As exception, this was fixed on SunOS 5.5.

Not exclusive to Bourne shells

The following also applies to ksh88.
Using here-documents in functions will create a temporary file at the time of definition already (see above).
(A workaround is to put the function body into "eval '...'")
The following also applies to ksh88 at least up to release ksh88g.
If such a function from above is executed in the background and the script itself then exits immediately, an error is displayed instead:
```
  $ cat f
  my_function() {
  cat<<EOF
  test
  EOF
  }
  (sleep 1; my_function) &
  $ sh f
  f: /tmp/sh123456: cannot open
```
The function definition (containing the temp file) is exported to a subshell.
However the parent (exiting before the subshell), wrongly cleans up the temp file.
(Workaround again: put the function body into "eval '...'")
echo x >'' echo x > $unset_or_empty_variable
ksh88 and ksh93 until (and including) release 'ksh93-l' just ignore these redirections, too.
echo "`pwd"
ksh88/93 accept this, too.
Some versions of ksh93 (up to release p, but not very early variants) export a sorted environment, too.
IFS=''; echo "$*"
The traditional Almquist Shell yields blank separated arguments, too.
"wait"ing for a non-existent process return 0 in some zsh releases, too.
sh -exc 'for x in a b c; do ( false ); echo status is $?; done'
ash, ksh and bash2 exit the script, too. BTW, POSIX forbids this.
When started without PATH in the environment, ksh88 searches an internal PATH without adding it to the environment, too.
The arguments in ". script <arguments>" are discarded.
Many ash don't use the arguments either.
A workaround exploits the fact that the positional parameters are delivered instead:
```
	my_function() {
	    . script
	}
	my_function <arguments> 
```
echo x > a*
Traditional and current ash don't expand this, too.
Jilles Tjoelker pointed out to me that POSIX allows but does not require expanding this in interactive mode and forbids expanding this in non-interactive mode.
using "$@" in connection with the flag u, without any arguments, errors out with "@: parameter not set"
set x; shift; set -u; echo "$@"
(this applies to all forms of $@ and $*)
ksh88, ksh93 until -t, pdksh also error here.
Workaround: use ${1+"$@"}
B=value; A=B; echo `eval echo $"$A"`
bash-1, ksh88, traditional and current ash print "value", too.
The simple precedence rules lead to an error for, e.g., "test -n $variable" if variable is set to "=" (also keep in mind the values "-" and "!").
ksh88 also stumbles here.
the exit status of "test" is 1 ("false") (instead of 2, "error")
ksh88g, bash-1.14.7 and zsh-3.0.8 also behave like this.
When looking up a command (because it's not given with absolute path) which turns out to not exist, the shell doesn't continue:
"nonex > file" doesn't truncate the file,
whereas "/nonex > file" does so.
Also, the error message cannot be redirected in the first case.
Many ash behave the same (except dash 4.7ff. and FreeBSD 9)
Thanks to Jilles Tjoelker for pointing me to ash.
An exit handler ("trap handler 0") installed in a subshell is only called upon exiting from this subshell, if the last command was a built-in.
Earlier Almquist shells also behave like this.
The exit status is still pending in an exit handler ("trap handler 0"), that is, the next recent status is still available there:
ksh88 and zsh also behave like this.
"var=pre; var=post echo $var > $var" creates a file named post (containing pre).
bash-1 and -2 (but not -3) also behave like this.
the "test" built-in operators -a and -o check for both conditions, even if stopping after the first would be sufficient.
Almost all shells handle this likewise.
The leading paren in "case x in (x)" is not accepted.
The original Almquist shell and many of its earlier variants don't accept it, either.
This is an issue about nesting, and about script portability, but less relevant in the traditional Bourne shell, because it doesn't implement the "$(...)" form of command substitution.
When using quotes in parameter substitution (for using spaces) inside here-documents
```
cat<<EOF
echo ${var="quotes for blanks"}
EOF
```
the quotes are not only printed in later Bourne shells, but also in ksh88.
"--" indicating end of options for built-ins: ksh86 and bash-1.x also don't know this
"{ ...; }" is also accepted as body in a "for" loop. This is not documented.
However, most bourne compatible shells also accept this. One exception is dash since release 0.3.8.1.

2014-10-11, <http://www.in-ulm.de/~mascheck/bourne/common.html>