We make very careful considerations about the interface and operation of the GNU coreutils, but unfortunately due to backwards compatibility reasons, some behaviours or defaults of these utilities can be confusing.

This information will continue to be updated and overlaps somewhat with the coreutils FAQ, with this list focusing on less frequent potential issues.

chmod

chmod -R 644

find -type f -print0 | xargs -r0 chmod 644 # Traditional method

# Traditional method find -type f -exec chmod 644 {} + # Newer POSIX equivalent

# Newer POSIX equivalent chmod -R a-x,a+X # Using GNU extensions to in-tool recursion

cut

awk

join -a1 -o1.$field $file /dev/null

chmod -R is redundant and tricky. If for example you copy a dir from VFAT and want to turn off executable bits on files using, that will fail to recurse as it removes the executable bits from dirs. This is achievable in various ways:cut doesn't work with fields separated by arbitrary whitespace. It's often better to useor even

cut -s only suppresses lines without delimiters. Therefore if you have a line with a missing field but it does contain some delimiters, a blank line is output

Similarly, if you want to output a blank line when there are no delimiters one needs to append a delimiter like:

printf '%s

' a:b c d:e | sed '/:/!s/$/:/' | cut -d: -f2-

dd

$ dd status=none count=1 if=/dev/random bs=512 | wc -c 78

dd iflag=nonblock

ddis usually what you want because when reading from a fifo/pipe you often get a short read, which means you get too little data if you specify "", or too much data if you specify "". For example:Note we do warn in certain cases since version 8.10, but not with count=1 above as short reads are often used with count=1 as an idiom to "consume available data", though perhapswould be a more direct and general way to do that?

dd conv=noerror really also needs conv=sync so that if reading from failing disk, one gets correctly aligned data, with unreadable bits replaced with NULs. Note if there is a read error anywhere in a block, the whole block will be discarded. So one needs to balance between speed (bigger) and minimized data loss (smaller). This is simpler and more dynamic in a more dedicated tool like ddrescue.

dd skip=0x100 doesn't skip anything as the "0x" prefix is treated as a zero multiplier. coreutils >= 8.26 will warn about this at least, suggesting to use "00x" if that really was the intention.

df

du

$ cd git/coreutils $ du -s ./ ./tests 593120 ./ $ du -s ./tests ./ 10036 ./tests 583084 ./ $ du -s --separate-dirs ./tests ./ 128 ./tests 16268 ./ $ du -s --separate-dirs ./ ./tests 16268 ./

echo

$ echo -e -n $ echo -n -e $ echo -- -n -- -n $ POSIXLY_CORRECT=1 env echo -e -n -e -n $ POSIXLY_CORRECT=1 env echo -n -e

expr

$ expr 2 - 1; echo $? 1 0 $ expr 2 - 2; echo $? 0 1 $ expr substr 01 1 1; echo $? 0 1 $ expr ' ' : '^ *$'; echo $? 1 0 $ expr '' : '^ *$'; echo $? 0 1 $ expr 0 : '[0-9]$'; echo $? 1 0 $ expr 0 : '\([0-9]\)$'; echo $? 0 1

For full portability theoption is needed when parsing the output from df. Line wrapping is avoided, though df will no longer wrap lines since version 8.11 (Apr 2011) to help avoid this gotcha. Also if one needs to parse the header, theoption will use more standardised (but ambiguous) wording. See also the Block size issue.If two or more hard links point to the same file, only one of the hard links is counted. The FILE argument order affects which links are counted, and changing the argument order may change the numbers thatoutputs. Note this also impacts specified directories which is confusing:Note du doesn't handle reflinked files specially, and thus will count all instances of a reflinked file.is non portable and its behaviour diverges between systems and shell builtins etc. One should really consider using printf instead. This shell session illustrates some inconsistencies. Where you seebeing used, that is selecting the coreutils standalone version:The exit status of expr is a confusing gotcha. POSIX states that exit status of 1 is used if "the expression evaluates to null or zero", which you can see in these examples:The string matching above is especially confusing, though does conform to POSIX, and is consistent across solaris, FreeBSD and GNU utils.

As for changing the behaviour, it's probably not possible due to backwards compatibility issues. For example the '^..*$' case would need to change the handling of the '*' in the expression, which would break a script like:

printf '%s

' 1 2 '' 3 | while read line; do expr "$line" : '^[0-9]*$' >/dev/null || break # at first blank line echo process "$line" done

ls

ln

ln -nsf

*sum

rm

rm -rf

find "$dir" -depth -type d -exec chmod +wx {} + && rm -rf "$dir"

sort

LC_ALL=C sort ...

equal comparisons

$ printf '%s

' ２ １ | ltrace -e strcoll sort sort->strcoll("\357\274\222", "\357\274\221") = 0 ２ １ $ printf '%s

' ２ １ | sort -u ２

$ printf "%s

" 1 zero 0 .0 | sort -nu zero 1

i18n patch issues

Note, using a leading ^ in the expression is redundant and non portable.ls -lrt will also reverse sort names for files with matching timestamps (common in /dev/ and /proc/ etc.) This is as per POSIX but probably not what the user wanted. There is no way to reverse by time and have non reversed name sorting.is needed to update symlinks, though note that this will overwrite existing files, and cause links to be created within existing directories.The checksum utilities like md5sum, sha1sum etc. add backslashes to the output names if those names contain '

' or '\' characters. Also '*' is added to the output where O_BINARY is significant (CYGWIN). Therefore automatic processing of these utilities require one to unescape first.does not mean "delete as much as posible". It only avoids prompts. For example with a non writeable dir, you will not be able to remove any contents. Therefore this is sometimes necessary to:A very common issue encountered is with the default ordering of theutility. Usually what is required is a simple byte comparison, though by default the collation order of the current locale is used. To use the simple comparison logic you canas detailed in the FAQ As well as being slower, the locale based ordering can often be surprising. For example some character representations, like the full width forms of latin numbers, compare equal to each other.The equal comparison issue with --unique can even impact in the "C" locale, for example withdropping items unexpectedly. Note this example also demonstrates that --unique implies --stable, to select the first encountered item in the matching set.Related to locale ordering, there is the i18n patch on Fedora/RHEL/SUSE which has its own issues. Note disabling the locale specific handling as described above effectively avoids these issues.

Example 1: leading space are mishandled with --human-numeric-sort:

$ printf ' %s

' 4.0K 1.7K | sort -s -h 4.0K 1.7K

$ printf '%s

' Dániel Dylan | sort Dániel Dylan $ printf '%s

' Dániel Dylan | sort -f Dylan Dániel

field handling

If specified like -k1,1b it only applies to end char count

If specified like -b -k1,1.2n -k2,2 then it's not used for the first field

If specified like -n -k1,1b -k2,2 then -n not used for the first field

In general you usually want -b with -k, but it's off by default

sort -k1,1.endpos only works for fixed width fields (spans multiple fields)

Empty fields are problematic

Example 2: case folding results in incorrect ordering:Fields specified withare separated by default by runs of blank characters (space and tab), and by default the blank characters preceding a field arein the comparison, which depending on your locale could be significant to the sorting order. This is confusing enough on its own, but is compounded with theandoptions. Ignoring leading blanks (-b) is particularly confusing, because...Also precisely specifying a particular field, requires both the startfields specified. I.E. to sort on field 2 you use -k2

These field delineation issues along with others are so confusing, that the sort --debug option was added in version 8.6 to highlight the matching extent and other consequences of the various options.

--random-sort

sort -R

split

split

$ seq 1000 | split -l 1 - foo_ $ ls foo_* ... foo_yy foo_yz foo_zaaa foo_zaab ...

split

--numeric-suffixes/-d

$ seq 1000 | split -l 10 -d - bar_ $ ls bar_* ... bar_88 bar_89 bar_9000 bar_9001 ...

-a/--suffix-length

$ seq 1000 | split -a5 -l 10 -d - baz_ $ ls baz_* baz_00000 baz_00001 ... baz_00098 baz_00099

tac

$ printf "1

2" | tac 21

tail

tee

yes | tee log | timeout process

$ seq 100000 | tee >(head -n1) > >(tail -n1) 1 14139 $ seq 100000 | tee -p >(head -n1) > >(tail -n1) 1 100000

test

test -s $file || echo no data >&2

test -s "$file" || echo no data >&2

wc

wc -l

$ printf "hello

world" | wc -l 1

wc -L

$ printf "\xe2\xf2\xa5" | wc -L 0 $ printf "\xe2\x99\xa5" | LC_ALL=C wc -L 0 $ printf "\xe2\x99\xa5" | LC_ALL=C sed 's/././g' | wc -L 3 $ printf '\x1b[33mf\bred\x1b[m

' | tee /dev/tty | wc -L red 10

Unit representations

does randomize the input similarly to the shuf command, but also ensures that matching keys are grouped together. shuf also provides optimizations when outputting a subset of the input.produces file names that may be surprising, as it defaults to a two letter suffix for initial files, but to support an arbitrary number of files, has a feature to widen the letters in the output file names as needed. The widening scheme ensures file names are sorted correctly when using standard shell sorting, when subsequently listing or concatenating the resultant files. For example:behaves the same with theoption, which could lead to unexpected numeric sequences. This was done again to ease sorting of the output (which is usually not inspected by humans for large sequences), but mainly for backwards compatibility with existing concatenation scripts that use standard sorting.The recommended solution is to specify aparameter which still allows for standard ordering at the shell, but with more natural numbers:tac like wc has issues dealing with files without a last '

' character.tail -F is probably what you want rather than -f as the latter doesn't follow log rotations etc.tee by default will exit immediately upon receiving SIGPIPE to be POSIX compliant and to support applications like. Now this is problematic in the presence of "early close" pipes, often seen when combining tee with bash >(process substitutions). Starting with coreutils 8.24 (Jul 2015), tee has the new -p, --output-error option to control the operation in such cases.The mode of operation of test depends on the number of arguments. Therefore you will not get an expected error in cases like, if "$file" is empty or unset. That's because test(1) will then be operating in string testing mode, which will return success due to "-s" being interpreted as a true expression. Instead ensure the variable is appropriately quoted to avoid such issues:on a file in which the last line doesn't end with '

' character will return a value of one-less than might be expected as wc is standardised to just count '

' characters. POSIX in fact doesn't consider a file without a '

' as the last character to be a text file at all. Also by only counting '

' characters results in consistent counts whether counting concatenated files, or totaling individual files.counts the maximumwidth for a line, considering only valid, printable characters, but not terminal control codes.The --block-size option is unusual in that appending ato the unit, changes itbinary to decimal. I.E. KB means 1000, while K means 1024.

In general the units representations in coreutils are unfortunate, but an accident of history. POSIX species 'k' and 'b' to mean 1024 and 512 respectively. Standards wise 'k' should really mean 1000 and 'K' 1024. Then extending from that we now have (which we can't change for compatibility reasons):

k=K=kiB=KiB=1024

kb=KB=1000

M=MiB=1024 2

MB=1000 2

...

du() { env du -B1 "$@" | numfmt --to=iec-i --suffix=B; }

Timezones

© Nov 29 2015

Note there is new flexibility to consider when controlling the output of numeric units, by leveraging the numfmt utility. For example to control the output of du you could define a function like:Discussed separately at Time zone ambiguities