sort(C)


sort -- sort and merge files

Syntax

sort [-m] [-bdfiMnru] [-o output] [-k keydef] ... [-t x] [-T tmpdir]
[-y [kmem]] [-z recsz] [file ... ]

sort -c [-bdfiMnru] [-k keydef] ... [-t x] [-T tmpdir] [-y [kmem]] [-z recsz]
[file]

sort [-mu] [-bdfiMnr] [-o output] [-t x] [-T tmpdir] [-y [kmem]] [-z recsz]
[+pos1 [ -pos2]] ... [file ... ]

sort -c [-u] [-bdfiMnr] [-t x] [-T tmpdir] [-y [kmem]] [-z recsz]
[+pos1 [-pos2]] ... [file]

Description

sort sorts lines of all the named files together and writes the result on the standard output. The standard input is read if ``-'' is used as a filename or if no input files are named.

Comparisons are based on one or more sort keys extracted from each line of input. A sort key defines a minimal sequence of characters which are to be used in sorting. By default, there is one sort key, the entire input line, and ordering is determined by the collating sequence defined by the locale (see locale(M)).

The following options alter the default behavior:


-c
Check that the input file is sorted according to the ordering rules. This option produces no output; it only affects the exit value.

-m
Merge only; the input files should already be sorted.

-o output
The argument output is the name of a file to use instead of the standard output. This file may be the same as one of the input files. There may be optional blanks between -o and output.

-T tmpdir
tmpdir is the pathname of a directory to be used for temporary files. The default is to try /usr/tmp and /tmp. If -T is specified then tmpdir and /tmp are tried. There must be a space between -T and tmpdir.

-u
Unique: suppress all but one in each set of lines having equal keys. This option can result in unwanted characters placed at the end of the sorted file.

-y [kmem]
The amount of memory used by sort has a large impact on its performance; for example, sorting a small file in a large amount of memory is inefficient. If the -y option is omitted, sort begins using the default memory size (32KB), and allocates more memory as needed. If kmem is specified, sort starts using that number of kilobytes of memory, unless the administrative minimum (32KB) or maximum (1MB) is violated. In this case, sort uses the corresponding minimum or maximum value.

If kmem is 0, sort uses the minimum memory requirement of 16KB.

By convention, specifying -y with no argument uses the maximum memory requirement of 1MB.


-z recsz
Causes sort to use a buffer size of recsz bytes for the merge phase. Input lines longer than the buffer size will cause sort to terminate abnormally. Normally, the size of the longest line read during the sort phase is recorded and this maximum is used as the record size during the merge phase, eliminating the need for the -z option. However, when the sort phase is omitted (-c or -m options) a system default buffer size is used, and if this is not large enough, the -z option should be used to prevent abnormal termination.
The following options override the default ordering rules.

-d
``Dictionary'' order: only letters, digits and blanks (spaces and tabs) are significant in comparisons. Dictionary order is defined by the current setting of LC_CTYPE (see locale(M)).

-f
Fold lowercase letters into uppercase. Conversion between lowercase and uppercase letters are governed by the current setting of LC_CTYPE (see locale(M)).

-i
Ignore non-printable characters in non-numeric comparisons. Non-printable characters are defined by the current setting of LC_CTYPE (see locale(M)).

-M
Compare as months according to the current setting of LC_TIME (see locale(M)). The first month in the year compares low to the second month and so on; for example, in the POSIX locale, ``JAN'' < ``FEB'' < ... < ``DEC'' and invalid fields compare low to ``JAN''. The -M option implies the -b option.

-n
An initial numeric string, consisting of optional blanks, an optional minus sign, and zero or more digits with optional decimal point, is sorted by arithmetic value. The -n option implies the -b option. Note that the -b option is only effective when restricted sort key specifications are in effect.

-r
Reverse the sense of comparisons.
The treatment of field separators can be altered using the options:

-b
Ignore leading blanks when determining the starting and ending positions of a restricted sort key. If the -b option is specified before the first sort key argument, it will be applied to all sort keys.

-t x
Use x as the field separator character; x is not considered to be part of a field (although it may be included in a sort key). If x is a space, specified as -t " ", all spaces (including those at the beginning of a line) are treated as field separators. Each occurrence of x is significant (for example, xx delimits an empty field).
When ordering options appear before restricted sort key specifications, the requested ordering rules are applied globally to all sort keys. When one or more of the flags b, d, f, i, n, or r is attached to a specific sort key (see ``Sort key field definition'') the specified ordering options override all global ordering options for that key.

When there are multiple sort keys, later keys are compared only after all earlier keys compare equal. Lines that otherwise compare equal are ordered with all bytes significant.

Input files are treated as sequences of records (lines), each of which contains one or more fields. By default, the first blank character (space or tab) of a sequence of blank characters acts as the field separator. Remaining blank characters in the sequence are treated as part of the field unless the -b option (ignore leading blanks) is specified. If the -t option is used to specify a field separating character, all occurrences of that character are interpreted as separating fields.

The option -t " " specifies that a space character is to be used as the field separator. In this case, any tab characters are interpreted as being part of a field; any leading tab characters are ignored if the -b option is specified. All space characters are interpreted as field separators and are unaffected by the -b option.

Sort key field definition

Sort key fields may be defined in two ways:

-k keydef
keydef is a key field definition for a restricted sort. There may be more than one key field defined. Each takes the form:

start[flag][,end[flag]]

start and end restrict a key field to part of a line. flag is one of the modifiers b, d, f, i, n, or r. These modifiers act like the options -b, -d, -f, -i, -n, and -r respectively, but they only apply to the key field except for b; it acts only on the start or end to which it is attached.

A key field start is specified in the form field[.first] where the field numbers start at 1 for the first field on a line. first defines the number of the character that starts the key field. Characters in fields are also numbered from 1; if first is missing, 1 is assumed. Similarly, a key field end has the form field[.last] where last specifies the last character of a key field; default is the last character in field. If end is missing, the key field is assumed to extend to the end of the line.

The -b option and the b modifier cause characters in a field to be counted from the first non-blank character.


+pos1 [ -pos2 ]
This notation restricts a sort key to one beginning at pos1 and ending at pos2. The characters at positions pos1 and pos2 are included in the sort key (provided that pos2 does not precede pos1). A missing -pos2 means the end of the line.

In this form of key field specification, fields are numbered in ascending order, starting from 0. The character position in a field can also be referenced, starting from 0 (for the first character). All blanks in a sequence of blanks are considered to be part of the next field. For example, all blanks at the beginning of a line are considered to be part of the first field.

pos1 and pos2 each have the form:

m[.n][flag]

A starting position specified by +m.n is interpreted to mean the (n+1)th character in the (m+1)th field. A missing .n means .0, indicating the first character of the (m+1)th field. flag is one of the modifiers b, d, f, i, n, or r. If the b flag is in effect, n is counted from the first non-blank in the (m+1)th field; +m.0b refers to the first non-blank character in the (m+1)th field.

A last position specified by -m.n is interpreted to mean the nth character (including separators) after the last character of the mth field. A missing .n means .0, indicating the last character of the mth field. If the b flag is in effect, n is counted from after the final leading blank in the (m+1)th field; -m.0b refers to the first non-blank in the (m+1)th field.

It is not possible to use a sort key field to extend the span of a field outside the separator characters that delimit the field. Use the -t option if you need to specify a key field based on column position alone; see ``Sorting a file by columns'' for an example that uses this method.

Exit values

sort returns the following exit values:

0
sort processed all input successfully; with the -c option, the input file was correctly sorted.

1
Using the -c option, sort found that the file was not ordered as specified. Using options -c and -u, sort found two input lines with identical keys.

>1
An error occurred in sort, such as input lines being too long.

Diagnostics

When the last line of an input file is missing a newline character, sort appends one, prints a warning message, and continues.

Examples

All examples are given for both forms of sort key field syntax.

Sort the contents of infile with the second field as the sort key:

sort -k 2,2 infile

sort +1 -2 infile

Sort, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the first character of the second field as the sort key:

sort -r -o outfile -k 2,2.1 infile1 infile2

sort -r -o outfile +1.0 -1.1 infile1 infile2

Sort, in reverse order, the contents of infile1 and infile2 using the first two non-blank characters of the second field as the sort key:

sort -r -k 2.1b,2.2b infile1 infile2

sort -r +1.0b -1.2b infile1 infile2

Print the password file (passwd(F)) sorted by the numeric user ID (the third colon-separated field):

sort -t: -k 3n,3 /etc/passwd

sort -t: +2n -3 /etc/passwd

Print the lines of the already sorted file infile, suppressing all but the first occurrence of lines having the same third field (the options -um with just one input file make the choice of a unique representative from a set of equal lines predictable):

sort -um -k 3,3 infile

sort -um +2 -3 infile

Sorting a file by columns

To sort a file based on columns, use the -t option to specify a field separator character which does not appear in the input. This will cause each line to be treated as a single field. The -k option or the +pos1 and -pos2 specifiers can then be used to sort on particular ranges of columns. The -b option and b modifier flag can also be used to ignore leading blanks (spaces or tabs).

For example, if the character ``:'' does not appear in the file infile, sort this file on the contents of columns 9 through 72 using:

sort -t: -k 1.9,1.72 infile

sort -t: +0.8 -0.72 infile

Files

/usr/tmp/stm???

Open UNIX 8 compatibility notes

When running ACP on Open UNIX 8 and UnixWare 7 systems, set OSRCMDS=on to use the SCO OpenServer version of the <sort> command. This provides the expected behaviors for SCO OpenServer applications. The SCO OpenServer version of this command is also provided on Open UNIX 8 systems under the OSP feature See the Running SCO OpenServer Applications topic in the Open UNIX 8 documentation set.

See also

coltbl(M), comm(C), join(C), locale(M), uniq(C)

Standards conformance

sort is conformant with:

ISO/IEC DIS 9945-2:1992, Information technology - Portable Operating System Interface (POSIX) - Part 2: Shell and Utilities (IEEE Std 1003.2-1992);
AT&T SVID Issue 2;
X/Open CAE Specification, Commands and Utilities, Issue 4, 1992.


© 2005 The SCO Group, Inc. All rights reserved.
SCO OpenServer Release 6.0.0 -- 03 June 2005