Sed

Introduction

This page is where I note the GNU sed commands I found difficult to elaborate.

Left-padding numbers with zeros

Let’s say we have a text file called numbers containing integer numbers (one per line, and without any sign):

123
4
5678
99999

The following command outputs the same numbers, but with leading zeros (for the numbers having less than 4 digits):

sed -e ":redo;s/^\([0-9]\{1,3\}\)$/0\1/; t redo" numbers
0123
0004
5678
99999

The t command causes sed to restart at the :redo label as long as the s command performs a substitution. The s command here substitutes a line with a number made of one to three digits with the same number with a zero prepended.

Here is the same command with the parameter (number of digits) set as a shell variable:

N=4; sed -e ":redo; s/^\([0-9]\{1,$(($N-1))\}\)$/0\1/; t redo" numbers

Substituting starting at a specific line number

Taking the same example file as in the previous section, if you need to add a leading minus sign starting at line 3, you can use the following command:

sed "3,/\d0/s/^/-/" numbers

Here “3” is the first line where the substitution should be done and “/d0/” is a regular expression used to match the last line where the substitution should be done (starting on the line following line 2 in this case). “/d0/” does not match any line (well, unless you have null characters in your input), so the substitution is done on every remaining line.

See the “addresses” section of the GNU sed manual for all the details about line selection.

Joining lines matching a pattern

Still taking the same example file, the following command substitutes the preceding end of line line sequence with a space character if the line matches a pattern (here pattern “67”):

sed -e ':redo; N; s/\n\(.*67\)/ \1/; t redo; P; D' numbers

The N command adds the next input line into the pattern space. The s command substitutes the end of line sequence with a space if the added line contains the searched pattern. The t command causes sed to restart at the :redo label as long as the s command performs a substitution. After exiting the loop, the P command causes the pattern space to be output up to the first end of line sequence and the D command deletes the pattern space up to the first end of line sequence (only if the pattern space contains an end of line sequence).