Sed search by regular expression. Stream SED editor. sed program options

NAME

sed - stream editor

SYNTAX

sed [-Ealn] command

sed [-Ealn] [-e command] [-f command_file] [-i extension]

DESCRIPTION

The sed utility reads the specified file, or standard input, if no file is specified, modifies the input as specified in the command list. Then, the input is written to standard output. One command can be specified as the first argument to sed. Multiple commands can be specified using the -e or -f options. All commands are applied in the order they are specified, regardless of their origin. The following options are available:

Interprets regular expressions as extended (modern) regular expressions more often than basic regular expressions. The documentation page for re_format(7) fully describes both formats.

Files listed as parameters to the "w" function are created (or deleted) before the default process begins. The -a option causes sed to delay opening each file until a command containing the bound function "w" is applied to the input line.

Adds editing commands specified by the command argument to the list of commands

Adds editing commands found in command_file to the list of commands. Each editing command should be listed on a separate line.

Edits files with replacement, saving the backup copy with the specified extension. If extension is not specified (zero length), backup will not be saved. It is not recommended to assign a zero extension to the files you replace because you risk corrupting the file completely or partially when disk space is exhausted.

creates a buffer of output lines.

by default, each input line is output to standard output after all commands have been applied. The -n option overrides this behavior.

The form of sed commands is as follows:

]function

Whitespaces can be inserted before the first address and as part of a command function.

Typically, sed will loop the input line, not including the trailing newline, into a pattern of spaces, (until someone leaves the "D" function), apply all commands with addresses that selected that pattern, copy the pattern to standard output, appending symbol new line and deleting the template.

Some functions use hold space to save all or part of a pattern for later retrieval.

Addresses Sed

The address is optional, but if one is specified, the address must be a number (which counts the lines of the input files), a dollar sign ('$') which indicates the last line entered, or a context address (which contains a regular expression that is preceded and followed by behind it, separator).

The command line without addresses selects each pattern. The command line with one address selects all patterns that match the address.

The command line with two addresses selects an inclusive range. The range starts with the first pattern that matches the first address. The end of the range is the next pattern that matches the second address. If the number of the second address is equal to or less than the line number of the first, then that line will be selected.

In the case where the second address is a context address, sed does not re-compare the second address with a pattern that matches the first address. Starting with the first line followed by the selected range, sed begins searching for the first address again.

Editing commands can be applied to unselected patterns through the use of the (“!”) character.

Regular expressions sed

The regular expressions used in sed are, by default, basic regular expressions (see re_format(7) for more details), but extended (new) regular expressions can be used instead if the -E flag is used.

In addition, sed has the following additions to regular expressions:

1. In a context address, any character other than a backslash (“\”) or newline character can be used to delimit a regular expression. Also, using a backslash before a delimiter character forces the character to be a letter. For example, in the context address xabcxdefx, the delimiter character in the regular expression is "x", the second "x" will be a letter, so the regular expression will be "abcxdef".

2. sequences
matches the newline character embedded in the pattern. You cannot use newline characters in an address or substitution command.

The only peculiarity inherent in sed regular expressions is the ability to be, by default, relative to the last one used regular expression. If the regular expression is empty, that is, only a delimiter character is specified, then the last used regular expression will be used. The last regular expression is determined to be the last regular expression used, as part of an address or substitution command, at run time, not at compile time. For example, the command "/abc/s//XXX/" will replace "XXX" with the pattern "abc".

sed functions

In the following list of commands, maximum quantity allowed addresses for each command are shown as , , or , representing zero, one, or two addresses. The text argument contains one or more strings. To include a newline character in text, precede it with a backslash character. Other backslash characters in the text are removed and the following characters are treated as characters.

The "r" and "w" functions take an optional file parameter, which must be separated from the function character by whitespace. Each file given as an argument to sed is created (or its contents truncated) before any input processing begins.

The functions "b", "r", "s", "t", "w", "y", "!", and ":" accept additional arguments. The following example shows which arguments should be separated from function symbols by whitespaces.

Two functions make up a function list. The list of sed functions is separated by newlines, as shown in the following example:

"(" can be preceded by a space and followed by a space. Functions can be preceded by a space. The trailing ")" can be preceded by a newline character or a space.

Function-list Runs a function list only when a template is selected.

text Writes text to standard output immediately, before attempting to read the input line, whether the "N" function is used or a new loop is started.

b Branches the ":" function with a specific label. If no label is specified, the branch occurs at the end of the script.

text Deletes a template. From address 0 or 1, or a range of 2 addresses, text is written to standard output.

d Deletes the pattern and starts a new cycle.

D Removes the initial segment of the pattern through the first newline character and begins a new loop.

g Replaces the contents of the template with the contents of the hold space.

G Adds a newline character followed by the contents of a hold space.

h Replaces the contents of the hold space with the contents of the template.

H Adds a newline character followed by the contents of the pattern

i ext Writes text to standard output.

l (letter el). Writes the pattern to standard output in a visually unambiguous form. This form is as follows:

backslash \

carriage return

tabulation

vertical tab v

Non-printing characters are written as three-digit octal numbers (preceded by a backslash) for each byte in the character (most important bytes first).

Long lines are hyphenated by specifying the hyphenation point in the form of a backslash followed by a newline character. The end of each line is indicated by the "$" symbol.

n Writes the pattern to standard output if the default output has not been overridden, and replaces the pattern with the following line of input.

N Appends the following line to the template, using an embedded newline character to separate the added material from the original content.

p Writes the pattern to standard output.

P Writes the pattern up to the first newline character to standard output.

q Forks the end of the script and exits without loading a new loop.

r file Copies the contents of file to standard output immediately before the next attempt to read the line occurs. If file cannot be read, for any reason, it is silently ignored and no error condition is set.

s/regular expression/replacement/flags

Replaces the string being replaced with the first regular expression in the pattern. Any character other than a backslash or newline can be used in place of a backslash or newline in place of a slash to separate the regular expression and the replacement. Within regular expression and substitution, the regular expression delimiter itself can be used as a character if it is preceded by a backslash.

The ampersand (“&”) appears in the replacement if the string being replaced matches the regular expression. The special meaning of "&" in this context can be modified by preceding the ampersand with a backslash. Line “#”, where “#” is a digit that is replaced with text that matches the transmission of the inverse expression.

A string can be separated by adding a newline character to it. To specify a newline character in the string to be replaced, precede it with a backslash.

The value of flags in the function being replaced is zero or one of the following:

N Generate replacement only for N' repetitions of the regular expression in the pattern.

g Create a replacement for all non-overlapping regular expression matches, not just the first one.

p Write the pattern to standard output if a substitution is made. If the string being replaced is the same as the string being replaced, it is still considered a replacement.

w Adds a pattern to file if replacement occurs. If the string being replaced is the same as the string being replaced, it is still considered a replacement.

Forking the function ":" label if any replacement has been made since the input line was read again or the function "t" was run. If the label is not defined, the branch goes to the end of the script.

w file Appends a pattern to file.

x Changes the template contents and hold spaces.

y/string1/string2/

Replaces all occurrences of a character in string1 in the pattern with matching characters from string2. Any character other than a backslash or newline can be used in place of a backslash or newline to separate lines. Within string1 and string2, a backslash is followed by any character other than a newline, which is a character, and a backslash followed by an "n" is replaced by a newline.

Function

Function-list

Applies a function or list of functions only to rows that are not selected by address.

This function does nothing, it calls a label in which the "b" and "t" commands can branch.

Writes the line number to standard output followed by a newline character.

Ignores empty lines.

The "#" and the rest of the line are ignored (treated as a comment), with the only exception that if the first two characters in the file are "#n", the output is suppressed by default. Same as defining the -n option in command line.

Environment

The variables COLUMNS, LANG, LC_ALL, LC_CTYPE and LC_COLLATE prevent sed from starting, as described in environ(7).

Output status

the sed utility exits with a value of 0 if the exit was successful, and >0 if there were problems

See also

awk(1), ed(1), grep(1), regex(3), re_format(7)

Standard

The sed utility is expected to be in the EEE Std 1003.2 ("POSIX.2") specification set

The -E, -a, and -i options are not standard on FreeBSD extensions and may not be available on other operating systems.

The sed command was written by L. E. McMahon and appeared in Version 7 of AT&T UNIX.

Multibyte characters containing the value 0x5C (ASCII `\') can be recognized

incorrect as continuation of string characters in command arguments `a', 'c' and 'i'

Multibyte characters cannot be used as command delimiters

And I can't find well written tutorials.

Let me say that I have worked with regex in other languages ​​(Python, JavaScript, Java) so this shouldn't be a problem.

So here are my questions (theoretical and more practical):

    Are the regular expressions used in sed exactly the same as those used by Python/JS/Java? I've read about BREs and EREs, but how different are they? Shouldn't ERE be expansion BRE?

    if I want to, say, just extract something from an output pipe, what is the sed syntax?

Details on the second question: Let's say I have uptime output with sed:

Uptime | sed...

Given an example output from uptime: 18:13 up 5:12, 2 users, load averages: 0.45 0.37 0.40 , I want to parse the one-time uptime as two separated numbers (hours and minutes) and then I I want to display them as xxhyym (xx – hours, yy minutes).

And to finish, here's what I would do in Python:

Hh, mm = re.match(r"\s+ up \s+(\d(1,2)):(\d(1,2))").groups() print "%sh%sm" % (hh , mm)

2 Solutions collect form web for “Regular commands and commands with sed command”

Traditional unix tools support BRE or ERE (basic or extended regular expressions). POSIX encodes both. Wikipedia explains them. Majority modern instruments extend ERE, often with additional features first introduced in Perl (which is known as PCRE).

ERE extends the functionality of BRE, but does not extend the syntax. In BRE, only the characters \[.*^$ have special meaning, and some operators, such as grouping \(…\) use backslashes. In ERE, +?|() are also special, and a backslash followed by a non-alphanumeric character is never special.

BRE does not have Python/PCRE \d and \s . You can express these character sets using traditional sets of constructs and character classes: \d is [[:digit:]] and \s is [[:space:]] . Note the double parentheses: one to indicate the character set and one to indicate the character class; for example, "letters, dashes, or underscores" can be written [-_[:alpha:]] .

BRE does not have a + operator (some sed implementations support \+ as an extension of BRE syntax); X+ is the same as XX* . Groups and matches require an extra backslash.

So the BRE equivalent of Python's \s+ up \s+(\d(1,2)):(\d(1,2)) is [[:space:]][[:space:]]* up [[: space:]][[:space:]]*\([[:digit:]]\(1,2\)\):\([[:digit:]]\(1,2\)\) . Note that you're running through too much: \s+ and space means at least two whitespace characters.

You will need to match the entire line since the sed command will rewrite the line. There is no separate command for writing a string collected from saved groups. Correcting extra spaces, the equivalent of your Python snippet:

Uptime | sed "s/^.*[[:space:]][[:space:]]*up[[:space:]][[:space:]]*\([[:digit:]]\(1 ,2\)\):\([[:digit:]]\(1,2\)\).*$/\1h\2m/"

Unlike the Python snippet, this retrieves the first match rather than the last match, but that doesn't matter here.

The uptime output contains space characters and ASCII digits, so you can simplify the regex:

Uptime | sed "s/^.* up *\(\(1,2\)\):\(\(1,2\)\).*$/\1h\2m/"

This will only correspond to the weekend uptime if the machine was less than 1 day old. I will leave the appropriate number of days as an exercise. (Hint: write two expressions: sed -es/AS ABOVE/\1h\2m/ -e "s/EXERCISE/\1d\2h\3m/")

Each tool uses (mostly) its own RE library. Even among different versions sed you will find the differences here. Two popular standards are the POSIX standard regular expressions, many of which accept them (with at least some options), another popular set is the Perl Compatible Regular Expression (PCRE) library. But the latter are a little different from the “vanilla” RE...

In your case:

Uptime | sed -e "s/^ \(\):\(\).*$/\1h\2m/"

(Tried on Fedora 18, sed-4.2.1-10.fc18.x86_64, GNU sed).

Update: what's wrong with the large documentation on home page GNU sed? Or this tutorial? The white paper for GNU sed is a little long, but complete.

Introduction

The sed command is a Stream EDitor for automatic text editing. "Stream editor" - in the sense that it can edit the incoming data stream continuously, say, as part of a program channel (pipe). Automatically - this means that as soon as you set the editing rules, the rest occurs without your tedious participation. In other words, the sed editor is not interactive.

The sed program is more complex than the commands that we have already discussed in previous articles in the HuMan series. It includes an arsenal of its own commands, therefore, in order to avoid tautology and confusion, in this article the sed command will henceforth be called a “program” or “editor”, and the sed editor commands will simply be called commands.

The sed program is capable of complex tasks, and it takes time to learn how to formulate these tasks.

But along with complex actions, the sed command has simple but very useful capabilities, which are no more difficult to master than other Unix commands. Don't allow yourself to be overwhelmed by the complexity of mastering the entire program.

We'll start from simple to complex, so you can always figure out where to stop.

Command s - substitution (replacement)

The sed program has many commands of its own. Most users only know the s command, and this is enough to work with the sed editor. The s command replaces PATTERN with REPLACE:

sed s/SAMPLE/REPLACEMENT/

$ echo day | sed s/day/night/ (Enter) night

It couldn't be simpler. And here is an example with input from the zar.txt file:

In the mornings he did exercises. Lightning is an electric charge. $ sed s/charge/discharge/ zar.txt In the mornings he did a discharge. Lightning is an electrical discharge.

I didn't put the expression s/SAMPLE/REPLACE/ in quotes because this example does not need quotes, but if it contained metacharacters, then quotes would be required. In order not to break your head every time, and not to accidentally make a mistake, always put quotation marks, preferably the “stronger” single ones, this is a good habit. You can't spoil porridge with oil. I, too, will not skimp on quotation marks in all subsequent examples.

As we can see, the replacement command s has four components:

S the command itself /.../.../ separator PATTERN pattern for searching and subsequent replacement REPLACE expression that will replace the PATTERN if one is found.

The forward slash (/) is used as a separator by tradition, since sed's ancestor, the ed editor, uses them (as does the vi editor). In some cases, such a separator is very inconvenient, for example, when you need to change paths to directories that also contain a forward slash (/usr/local/bin). In this case, you have to separate the forward slashes with backslashes:

Sed "s/\/usr\/local\/bin/\/common\/bin/"

This is called a “picket fence” and looks very ugly, and most importantly, incomprehensible.

What's unique about sed is that it allows you to use any delimiter, such as the underscore:

$ echo day | sed s_day_night_ night

or colon:

$ echo day | sed s:day:night:night

If, while searching for a delimiter you like, you get the message "incomplete `s command", then this character is not a good separator, or you simply forgot to put a delimiter or two.

In this article, I have to use the traditional delimiter (/) to avoid confusing the reader, but if necessary, I will use the tilde (~) as a delimiter.

Regular expressions (RE)

(Regular expressions, regexp, RE)

The topic of regular expressions is so vast that entire books are devoted to it (see links at the end of the article). However, talking seriously about the sed editor without using regular expressions is as counterproductive as talking about trigonometry using adding sticks. Therefore, it is necessary to talk at least about those regular expressions that are often used with the sed program.

With Or any other letter. Most letters, numbers, and other non-special characters are considered regular expressions that represent themselves.

* An asterisk following any symbol or regular expression means any number (including zero) repetitions of this symbol or regular expression.

\+ Indicates one or more repetitions of a character or regular expression.

\? Means none or one repeat.

\(i\) Means exactly i repetitions.

\(i,j\) The number of repetitions is in the range from i to j inclusive.

\(i,\) The number of repetitions is greater than or equal to i.

\(,j\) The number of repetitions is less than or equal to j.

\(RE\) Remember a regular expression or part of it for the purpose further use as a whole. For example, \(a-z\)* will search for any combination of any number (including zero) lowercase letters.

. Matches any character, including newline.

^ Indicates a null expression at the beginning of a line. In other words, whatever is preceded by this sign must appear at the beginning of the line. For example, ^#include will search for lines starting with #include.

$ The same as the previous one, only applies to the end of the line.

[LIST] Means any character from the LIST. For example, it will search for any English vowel letter.

[^LIST] Means any character except those in the list. For example, [^aeiou] will search for any consonant. Note: LIST can be an interval, for example [a-z], which will mean any lowercase letter. If you need to include ] (square bracket) in the LIST, indicate it first in the list; if you need to include - (hyphen) in the LIST, then indicate it first or last in the list.

RE1\|RE2 Means PB1 or PB2.

RE1RE2 Means the union of regular expressions РВ1 and РВ2.

\n Indicates a newline character.

\$; \*; \.; \[; \\; \^ Mean accordingly: $; *; .; [; \; ^

Attention: Rest symbols based on the backslash (\), adopted in the C language, are not supported by the sed program.

\1 \2 \3 \4 \5 \6 \7 \8 \9 Indicates the corresponding part of the regular expression, stored using the signs \(and \).

Some examples:

abcdef Means abcdef

a*b Represents zero or any number of a's and one b's. For example, aaaaaab; ab; or b.

a\?b Means b or ab

a\+b\+ Represents one or more a's and one or more b's. For example: ab; aaaab; abbbbb; or aaaaaabbbbbbb.

.* Means all characters on a line, on all lines, including empty ones.

.\+ Matches all characters on a line, but only on lines that contain at least one character. Empty strings do not match this regular expression.

^main.*(.*) It will search for lines that begin with the word main, and also contain opening and closing brackets, and there may be any number of characters before and after the opening bracket (or there may not be any).

^# Will search for lines starting with a # sign (eg comments).

\\$ Will search for lines ending with a backslash (\).

Any letters or numbers

[^ ]\+ (The square bracket, in addition to the ^ symbol, also contains a space and a tab) -- Means one or any number of any characters, except a space and a tab. Usually this means a word.

^.*A.*$ Means capital letter And exactly in the middle of the line.

A.\(9\)$ Indicates a capital letter A, exactly the tenth letter from the end of the line.

^.\(,15\)A Indicates a capital letter A, exactly the sixteenth from the beginning of the line.

Now that we've seen some regular expressions, let's return to the s command in sed.

Using the & symbol when the PATTERN is unknown “How is it unknown?”, you ask, “Don’t you know what you want to replace?” I will answer: I want to put in brackets any numbers found in the text. How to do this? Answer: use the & symbol.

The symbol & (ampersand), when placed as part of a REPLACEMENT, means any PATTERN found in the text. For example:

$ echo 1234 | sed "s/*/(&)/" (1234)

An asterisk (asterisk) after the interval is needed so that all numbers found in the sample are replaced. Without it it would have been:

$ echo 1234 | sed "s//(&)/" (1)234

That is, the first digit found was taken as a sample.

Here is an example with a completely meaningful load: let’s create a formula.txt file:

A+432-10=n

and apply the command to it:

$ sed "s/*-*/(&)/" formula.txt a+(432-10)=n

The mathematical formula acquired an unambiguous meaning.

Another ampersand symbol can be used to double the PATTERN:

$ echo 123 | sed "s/*/& &/" 123 123

There is one subtlety here. If we complicate the example a little:

$ echo "123 abc" | sed "s/*/& &/" 123 123 abc

as you would expect, only the numbers are doubled since there are no letters in the PATTERN. But if we swap parts of the text:

$ echo "abc 123" | sed "s/*/& &/" abc 123

then no doubling of numbers will work. This is a feature of the regular expression * - it matches only the first character of the string. If we want to double the digits, no matter where they are, we need to modify the regular expression in REPLACE:

$ echo "abc defg 123" | sed "s/*/& &/" abc defg 123 123

then the numbers will double, regardless of the number of previous “words”.

Using the escaped parentheses \(, \) and \1 to process part of a PATTERN The escaped parentheses \(and \) are used to store part of a regular expression.

The symbol \1 means the first memorized part, \2 the second, and so on, up to nine memorized parts (the program does not support more). Let's look at an example:

$ echo abcd123 | sed "s/\(*\).*/\1/" abcd

Here \(*\) means that the program must remember all alphabetic characters in any quantity; .* means any number of characters after the first remembered part; and \1 means we only want to see the first part remembered. That’s right: in the program output we see only letters and no numbers.

In order to swap words, you need to remember two sub-PATTERNS, and then swap them:

$ echo stupid penguin |sed "s/\([a-z]*\) \([a-z]*\)/\2 \1/" stupid penguin

Here \2 means the second sub-PATTERN, and \1 means the first. Note the spacing between the first expression \([a-z]*\) and the second expression \([a-z]*\). It is necessary for two words to be found.

The \1 sign does not have to be present only in REPLACEMENT; it can also be present in SAMPLE, for example, when we want to remove duplicate words:

$ echo penguin penguin | sed "s/\([a-z]*\) \1/\1/" penguin

Command substitution modifiers s

Replacement modifiers are placed after the last delimiter. These modifiers determine what the program will do if there is more than one match to PATTERN in the string, and how to perform the replacement.

Modifier /g

Global replacement

The sed program, like most Unix utilities, reads one line at a time when working with files. If we order a word to be replaced, the program will replace only the first word that matches the PATTERN on the given line. If we want to change every word that matches the pattern, then we should enter the /g modifier.

Without the /g modifier:

$ echo this cat was the most ordinary cat | sed "s/cat/kitten/" this kitten was the most ordinary cat

The editor replaced only the first word that matched.

And now with the global replacement modifier:

$ echo this cat was the most ordinary cat | sed "s/cat/kitten/g" this kitten was the most ordinary kitten

All matches in this string have been replaced.

And if you need to change all the words, say, put them in brackets? Then regular expressions will come to the rescue again. To select all alphabetic characters, both upper and lower case, you can use the [А-Яа-я] construction, but it will not include words like “something” or “s”ezd.” The construction [^ ]*, which matches all characters except space. So:

$ echo the stupid penguin timidly hides | sed "s/[^ ]*/(&)/g" (stupid) (penguin) (timid) (hides)

How to choose the right match from several

If you do not apply modifiers, the sed program will replace only the first word that matches the PATTERN. If you apply the /g modifier, the program will replace every matched word. How can you select one of the matches if there are several of them on the line? - Using the already familiar symbols \(and \), remember the sub-SAMPLES and select the one you need using the signs \1 - \9.

$ echo stupid penguin | sed "s/\([a-z]*\) \([a-z]*\)/\2 /" penguin

In this example, we remembered both words, and after putting the second (penguin) in first place, we removed the first (stupid) by putting a space in its place in the REPLACEMENT section. If we replace the space with a word, it will replace the first one (stupid):

$ echo stupid penguin | sed "s/\([a-z]*\) \([a-z]*\)/\2 smart /" smart penguin

Numeric modifier

This is a one/two/three-digit number that is placed after the last separator and indicates which match is to be replaced.

$ echo very stupid penguin | sed "s/[a-z]*/good/2" very good penguin

In this example, each word is a match, and we have told the editor which word we want to replace by placing a modifier 2 after the REPLACE section.

You can combine the numeric modifier with the /g modifier. If you need to leave the first word unchanged, and replace the second and subsequent ones with the word “(deleted)”, then the command will be like this:

$ echo very stupid penguin | sed "s/[a-z]*/(deleted)/2g" very (deleted) (deleted)

If you really want to remove all subsequent matches except the first, then you should put a space in the REPLACE section:

$ echo very stupid penguin | sed "s/[a-z]*/ /2g" very

Or don’t put anything at all:

$ echo very stupid penguin | sed "s/[^ ]*//2g" is very

The numeric modifier can be any integer from 1 to 512. For example, if you need to put a colon after the 80th character of each line, the command will help:

$ sed "s/./&:/80" filename

Modifier /p - output to standard output (print)

The sed program by default outputs the result to the standard output (for example, a monitor screen). This modifier is used only with the sed -n option, which just blocks the output of the result to the screen.

Modifier /w

Allows you to write the results of text processing to the specified file:

$ sed "s/SAMPLE/REPLACE/w filename

/e modifier (GNU extension)

Allows you to specify a shell command (not a sed program) as a REPLACEMENT. If a match to the PATTERN is found, it will be replaced with the output of the command specified in the REPLACE section. Example:

$ echo night | sed "s/night/echo day/e" day

/I and /i modifiers (GNU extension)

Makes the replacement process case insensitive.

$ echo Night | sed "s/night/day/i" day

Modifier Combinations

Modifiers can be combined when it makes sense. In this case, the w modifier should be placed last.

Conventions (GNU extension) There are only five of them:

\L converts REPLACE characters to lowercase \l converts the next REPLACE character to lowercase \U converts REPLACE characters to upper case \u converts the next REPLACE character to uppercase \E undoes a translation started by \L or \U For obvious reasons, these conventions are used alone. For example:

$ echo stupid penguin | sed "s/stupid/\u&/" Stupid penguin

$ echo little puppy | sed "s/[a-z]*/\u&/2" little Puppy

We've covered almost every aspect of the sed command. Now it's time to look at the options of this program.

sed program options

The program has surprisingly few options. (Which somewhat compensates for the excess of commands, modifiers and other functions). In addition to the well-known options --help (-h) and --version (-V), which we will not consider, there are only three of them:

Option -e--expression=command_set

One way to execute multiple commands is to use the -e option. For example:

Sed -e "s/a/A/" -e "s/b/B/" filename

All of the previous examples in this article did not require the -e option simply because they contained a single command. We could have used the -e option in the examples, it would not have changed anything.

Option -f If you need to execute a large number of commands, it is more convenient to write them to a file and use the -f option:

Sed -f sedscript filename

Sedscript here is the name of the file containing the commands. This file is called a sed program script (hereinafter simply a script). Each script command should occupy a separate line. For example:

# comment - This script will change all lowercase vowels to uppercase vowels s/a/A/g s/e/E/g s/i/I/g s/o/O/g s/u/U/g

You can name the script whatever you want, it is important not to confuse the script file with the file being processed.

Option -n The sed -n program does not print anything to standard output. To receive a withdrawal you need a special instruction. We have already become familiar with the /p modifier, which can be used to give such an indication. Let's remember the zar.txt file:

$ sed "s/1-9/&/p" zar.txt In the mornings he did exercises. Lightning is an electric charge.

Since no matches were found with the PATTERN (there are no numbers in the file), the s command with the /p modifier and the & sign as a REPLACEMENT (remember that the ampersand means the PATTERN itself) works like the cat command.

If PATTERN is found in the file, then lines containing PATTERN will be doubled:

$ sed "s/exercise/&/p" zar.txt In the mornings he did exercises. In the mornings he did exercises. Lightning is an electric charge.

Now let's add the -n option:

$ sed -n "s/exercise/&/p" zar.txt In the mornings he did exercises.

Now our program works like the grep command - it returns only lines containing PATTERN.

Selecting the desired elements of the edited text

Using just one s command, we were convinced of the extraordinary wide possibilities sed editor But everything he does comes down to search and replace. Moreover, during operation, sed edits each line one by one, without paying attention to the others. It would be convenient to limit the rows that need to be changed, for example:

  • Select lines by numbers
  • Select rows in a certain range of numbers
  • Select only rows containing a certain expression
  • Select only lines between some expressions
  • Select only lines from the beginning of the file to some expression
  • Select only lines from some expression to the end of the file

The sed program does all this and more. Any sed editor command can be used address-wise, in a certain range of addresses, or with the above restrictions on the range of lines. The address or constraint must immediately precede the command:

Sed "address/restrict command"

Selecting rows by numbers

This is the simplest case. Just indicate the number of the required line before the command:

$ sed "4 s/[a-z]*//i" gumilev.txt What a strange bliss In the early twilight of the morning, In the melting of spring snow, to all that is perishing and wise.

$ sed "3 s/В/(В)/" gumilev.txt What a strange bliss In the early twilight of the morning, (In) the melting of spring snow, In everything that perishes and is wise.

Selecting rows in a range of numbers

The range is indicated, not surprisingly, separated by commas:

$ sed "2.3 s/В/(В)/" gumilev.txt What a strange bliss (In) the early twilight of the morning, (In) the melting of spring snow, In everything that perishes and is wise.

If you need to specify a range up to the last line of a file, but you don’t know how many lines there are, then use the $ sign:

$ sed "2,$ s/в/(в)/i" gumilev.txt What a strange bliss (in) the early twilight of the morning, (in) the melting of spring snow, (in) everything that perishes and is wise.

Selecting rows containing an expression

The search expression is enclosed in forward slashes (/) and placed before the command:

$ sed "/morning/ s/in/(in)/i" gumilev.txt What a strange bliss (in) the early twilight of the morning, In the melting of spring snow, In everything that perishes and is wise.

Selecting rows in the range between two expressions

Just as in the case of line numbers, the range is specified separated by commas:

$ sed "/morning/,/wise/ s/in/(in)/i" gumilev.txt What a strange bliss (in) the early twilight of the morning, (in) the melting of spring snow, (in) everything that perishes and wisely .

Selecting lines from the beginning of the file to a certain expression

$ sed "1,/snow/ s/in/(in)/i" gumilev.txt What a strange bliss (in) the early twilight of the morning, (in) the melting of spring snow, In everything that perishes and is wise.

Selecting lines from a certain expression to the end of the file

$ sed "/snow/,$ s/in/(in)/i" gumilev.txt What a strange bliss In the early twilight of the morning, (in) the melting of spring snow, (in) everything that perishes and is wise.

Other sed editor commands

d (delete) command

Removes the following lines from standard output:

$ sed "2 d" gumilev.txt What a strange bliss In the melting of spring snow, In everything that perishes and is wise.

And more often they write it simpler (without a space):

Sed "2d" gumilev.txt

Everything that was said in the previous section about line addressing applies to the d command (as well as to almost all sed commands).

Using the d command, it is convenient to throw away the unnecessary "header" of some mail message:

$ sed "1,/^$/d" filename

(Delete lines from the first to the first empty line).

Get rid of comments in configuration file:

$ sed "/^#/d" /boot/grub/menu.lst

And you never know where you need to remove extra lines!

p (print) command

The English word "print" is translated as "print", which in Russian is associated with a printer, or at least with a keyboard. In fact, this word in the English context often simply means output to a monitor screen. So the p command does not print anything, but simply displays the specified lines.

When used by itself, the p command doubles the lines in the output (after all, the sed program prints a line to the screen by default, but the p command prints the same line a second time).

$ echo I have a cat | sed "p" I have a cat I have a cat

There are uses for this property, such as doubling empty lines to improve the appearance of text:

$ sed "/^$/ p filename

But the p command reveals its true colors in combination with the -n option, which, as you remember, prevents lines from being printed on the screen. By combining the -n option with the p command, you can get only the required lines in the output.

For example, look at lines one through ten:

$ sed -n "1.10 p" filename

Or just comments:

$ sed -n "/^#/ p" /boot/grub/menu.lst # GRUB configuration file "/boot/grub/menu.lst". # generated by "grubconfig". Sun 23 Mar 2008 21:45:41 # # Start GRUB global section # End GRUB global section # Linux bootable partition config begins # Linux bootable partition config ends # Linux bootable partition config begins # Linux bootable partition config ends

This is very reminiscent of the grep program, which we already encountered when we talked about the -n option with the /p modifier. But, unlike the grep command, the sed editor makes it possible not only to find these lines, but also to change them, replacing, for example, everywhere Linux with Unix:

$ sed -n "/^#/ p" /boot/grub/menu.lst | sed "s/Linux/Unix/" # GRUB configuration file "/boot/grub/menu.lst". # generated by "grubconfig". Sun 23 Mar 2008 21:45:41 # # Start GRUB global section # End GRUB global section # Unix bootable partition config begins # Unix bootable partition config ends # Unix bootable partition config begins # Unix bootable partition config ends

Team!

Sometimes you need to edit all rows except those that match the PATTERN or selection. Symbol exclamation point(!) inverts the selection. For example, let’s delete all lines except the second from Gumilyov’s quatrain:

$ sed "2 !d" gumilev.txt In the early twilight of the morning,

Or select all lines, except comments, from the /boot/grub/menu.lst file:

$ sed -n "/^#/ !p" /boot/grub/menu.lst default 1 timeout 20 gfxmenu (hd0,3)/boot/message title SuSe on (/dev/hda3) root (hd0,2) kernel /boot/vmlinuz root=/dev/hda3 ro vga=773 acpi=off title Linux on (/dev/hda4) root (hd0,3) kernel /boot/vmlinuz root=/dev/hda4 ro vga=0x317

Command q (quit)

The q command terminates the sed program after the specified line. This is convenient if you need to stop editing after reaching a certain point in the text:

$sed "11 q" filename

This command will finish when it reaches the 11th line.

The q command is one of the few sed commands that does not accept string ranges. The command cannot stop working 10 times in a row if we enter:

Sed "1.10 q" Absurd!

w (write) command

Like the w modifier of the s command, this command allows you to write the program's output to a file:

$ sed -n "3,$ w gum.txt" gumilev.txt

We will receive a gum.txt file containing the last two lines of Gumilyov's quatrain from the gumilev.txt file. Moreover, if such a file already exists, it will be overwritten. If you do not enter the -n option, then the program, in addition to creating the gum.txt file, will also display the entire contents of the gumilev.txt file.

For working on the command line, it is more convenient to use regular output redirection (> or >>), but in sed scripts, the w command will probably find its use.

r (read) command

This command will not only read the specified file, but also paste its contents into the desired location in the edited file. To select the “right place”, addressing that is already familiar to us is used (by line numbers, by expressions, etc.). Example:

$ echo From Gumilyov's poem: | sed "r gumilev.txt"

From Gumilyov's poem:

What a strange bliss In the early twilight of the morning, In the melting of spring snow, In everything that perishes and is wise.

Team =

Will give the number of the specified line:

$ sed "/snow/=" gumilev.txt What a strange bliss In the early twilight of the morning, 3 In the melting of spring snow, In everything that perishes and is wise.

$ sed -n "/snow/=" gumilev.txt 3

The command accepts only one address, does not accept intervals.

Command y

This command replaces characters from the PATTERN section with characters from the REPLACE section, working like a program tr.

$ echo Car - a legacy of the past | sed "y/Auto/Paro/" Steam car - a legacy of the past

Team y works only if the number of characters in the PATTERN is equal to the number of characters in the REPLACEMENT.

sed program scripts

In order to use the sed editor as a full-fledged one text editor, you need to master writing sed scripts. The sed program has its own simple programming language that allows you to write scripts that can work wonders.

This article cannot contain descriptions of sed scripts, just as its author does not set himself the task of mastering the sed programming language. In this article, I focused on using the sed editor on the command line, with an eye to using it as a filter in pipes. For this reason, I have omitted numerous sed commands that are only used in sed scripts.

There are many fans of the sed editor, and many articles on the topic of scripting, including on the RuNet. So for those interested in this wonderful program it will not be difficult to expand their knowledge.

The sed program and Cyrillic characters

As can be seen from the examples in this article, the sed program on a properly Russified system is fluent in the “great and powerful” language.

sed Program Summary

The sed program is multifunctional editor data flow, indispensable for:

  • Editing large text arrays
  • Editing files of any size when the sequence of editing actions is too complex
  • Editing data as it becomes available, including in real time - that is, in cases where it is difficult or completely impossible to use interactive text editors.

To fully master the sed program, it will take weeks or even months of work, since this requires:

  • Learn regular expressions
  • Learn to write sed scripts by mastering the simple programming language used in these scripts

On the other hand, mastering several of the most common commands in the sed editor is no more difficult than any Unix command; I hope this article will help you with this.

Afterword

Until now, in the articles of the HuMan series, I have tried to at least briefly disclose each option, each parameter of the command being described, so that the article could replace mana. In the future I will continue to adhere to this principle.

This article is an exception, as it does not describe all the features of the program. A complete description of them would require not an article, but a book. However, the article allows you to get an idea of ​​the sed editor and get started with this amazing program using its most common commands.

sed processes the stream sequentially, line by line, starting from the first and ending with the last (of course, unless otherwise specified in the sed script, for example, you can process only the first lines, and complete processing after some condition is met), usually each line is processed separately, in three stages.

Important

The diagram below will be often mentioned in the future, for example, “at the first stage of string processing” just means the first stage of this scheme.

Procedure 2.1. Text processing with the sed utility.

    At this stage, the string is loaded into the buffer. A buffer is a memory area allocated by sed, the size of which is not limited (for the GNU version of sed, of course, in practice the size is limited by the amount of RAM and swap memory).

    Loading ends when a newline character (\n) is read from the stream, or when the stream terminates. In this case, although the newline character is read from the stream, it is not written to the buffer.

    String processing.

    At this stage, the sed script is executed, and the contents of the buffer are usually changed. A sed script consists of special sed commands, each of which represents one of the letters of the Latin alphabet. As usual, small and CAPITAL letters differ: n And N these are different teams. The easiest way is to write sed commands on the command line, right after sed and its options, for example:

    Sed -n "p;p;p"

    A semicolon (;) is used to separate commands.

    Warning

    If you write scripts directly after the command, then always enclose them in single quotes, the fact is that only in this case the shell will not process these scripts, in some cases it is acceptable to use double quotes, for example, if you want to add a shell variable inside the script, but be careful: the shell will try to expand many service characters in your script.

    Clue

    If you need to use a single quote character inside a script, you can use its hexadecimal representation: " \x27 ".

    The sed commands can change the contents of the buffer, but in addition, as in other programming languages, conditional and unconditional jump commands can be used in sed scripts ( b , t, And T), there are also interrupt commands ( q And Q). Some commands affect not only the line processing stage, but also other stages, in addition, one or more lines from the input stream can be entered inside the script (as in the first stage).

    Important

    You need to understand that sed works with a stream, not a file; it only looks at lines sequentially, from beginning to end. Therefore, it is impossible to read the tenth line after the twentieth - if this is necessary, the tenth line should be saved while processing it. It is also impossible to know what given line is, for example, the penultimate one, despite the fact that it is quite possible to determine the line number from the beginning of the stream. You can find out that this line is the last, but only at the time of its processing.

    You can precede (almost) any sed command with an address expression, in which case the command will execute if and only if the address expression is true. You can use as an address expression

    Line number Then the command will be executed only for the line whose number is specified Line number range The command will be executed for all lines from the specified range (the range is indicated separated by commas; instead of the second number, it is acceptable to specify `$', this symbol denotes the last line. Regular expression

    The command will be executed only if this RE is found in the buffer.

    Combined range.

    You can create more difficult condition, for example from a given RE, and to the line $ (to the end). Or from the first RE to the second (inclusive).

    Comment

    Lines in sed are numbered starting from one. First, a search is made for the line that has a match with the first address of the range, and a command is executed for this line. Starting from the next line, the second address in the range is searched. However, for the first line this behavior can be changed: if you write not `1,RE", but `0,RE", then the regular expression will be checked in the same way in the first line.

    Clue

    If you want one address expression to act on more than one command at once, then enclose these commands in (curly braces).

    Comment

    Nested blocks are acceptable; in addition, it is permissible to use address expressions within a block, incl. and for nested blocks.

    In addition to writing the script immediately after the command, you can write it to a file; to execute such a file, you can use the -f option, for example:

    $ sed -f my_script.sed test_file.txt

    This command will execute the sed script my_script.sed for the file test_file.txt. Also, using sha-bang

    #!/bin/sed -f

    you can force the shell to execute your scripts, for example, if you added this sha-bang to the first line of your script, and in addition you have the right to execute this script, then the previous example can be executed like this:

    $ ./my_script.sed test_file.txt

    Warning

    At first, I would recommend writing all sed scripts that contain more than one command into files. Firstly, they are much easier to understand, secondly, they are saved in this way, and thirdly, they are much more convenient to edit. Also, you can use any quotes in sed files. It should also be taken into account that some commands must be written last in the line, which is often impossible if there is only one line. Finally, you can use comments in the file, which will allow you to quickly adapt already ready script for new use.
  1. Buffer output

    After the script completes, sed prints the contents of the buffer to the output stream. However, this is not always necessary; if you do not need it, use the -n option, which blocks buffer output. In addition, at the same stage, some other information is output if the commands were executed in the script a , c, and/or i. These commands also output information to the output stream, but not during execution, but at this stage. There are three commands ( d , D And Q), which also suppress the buffer output at this stage.

The first sed scripts.

To start studying sed we need some simple text, like this:

Example 2.1. Text used to test scripts.

You can donate a small amount of Yandex money to the account 41001666004238 to pay for hosting, Internet, and other things. This is of course voluntary, but it will greatly improve this document (I will have more time to improve it). In fact, the project is often on the verge of closure, because it has never brought in any money and never will. You can help me. Thank you.