sed awk
12 Jun 2015 • Leave CommentsThese tools do a big favor when editing streams. Both programs use regular expressions for selecting and processing text.
For instance, a configuration file contains lots of comment lines starting with #
and blank lines. When applying the configuration to systems, you might need to remove those lines for a clean and tidy configuration file.
sed '/^ *#/d' nginx_with_comments.conf | sed '/^$/d' > nginx.conf
Use sed
(stream editor, take pipes as input) command to delete d
specific pattern lines. The 1st part removes comment lines while the 2nd remove blank lines.
awk
considers each file as a set of records, which by default are the lines in the file. "awk" enables you to create a condition and action pair, and for each record that matches the condition, the action will fire. Most "awk" commands follow this pattern – you have a condition, followed by an action in curly braces.
ls -al ~/ | awk '/zachary/{print $1, $9}'
This command will first file lines contain zachary and then print the 1st and 8th items/columns - to print permissions and filenames. awk is really useful when dealing with columns. You can use other conditions; for example, checking whether or not a certain field is above or below a given threshold (i.e. $5 > 2), or checking whether a record has a certain column.
ls -al | awk '$9 ~/^\./ {gsub(/zachary/, "awkSub"); print;}'
akw
check wheter the 9th item (filename) starts wity dot. Then use gsub
(global substitute) to substitue every occurence of zachary with awkSub.
awk VS sed
Both are tools that transform text. BUT awk
can do more things besides just manipulating text. Its a programming language by itself with most of the things you learn in programming, like arrays, loops, if/else flow control etc You can "program" in sed
as well, but you won't want to maintain the code written in it.
sed
is a stream editor. It works with streams of characters on a per-line basis. It has a primitive programming language that includes goto-style loops and simple conditionals (in addition to pattern matching and address matching). There are essentially only two "variables": pattern space and hold space. Readability of scripts can be difficult. Mathematical operations are extraordinarily awkward at best.
awk
is oriented toward delimited fields on a per-line basis. It has much more robust programming constructs including if/else, while, do/while and for (C-style and array iteration). There is complete support for variables and single-dimension associative arrays plus (IMO) kludgey multi-dimension arrays. Mathematical operations resemble those in C. It has printf and functions. The "K" in "AWK" stands for "Kernighan" as in "Kernighan and Ritchie" of the book "C Programming Language" fame (not to forget Aho and Weinberger). One could conceivably write a detector of academic plagiarism using awk.
I would tend to use sed
where there are patterns in the text. For example, you could replace all the negative numbers in some text that are in the form "minus-sign followed by a sequence of digits" (e.g. "-231.45") with the "accountant's brackets" form (e.g. "(231.45)") using this (which has room for improvement):
echo "-123.456 -345.678" sed 's/-\([0-9.]\+\)/(\1)/g'
I would use awk
when the text looks more like rows and columns or, as awk
refers to them "records" and "fields" - matrix. Refer to the example above.
Use sed
for very simple text parsing. Anything beyond that, awk
is better. In fact, you can ditch sed
altogether and just use awk
. Since their functions overlap and awk
can do more, just use awk
. You will reduce your learning curve as well.
总之,sed能实现的awk也能实现;简单的模式匹配用awk;有矩阵行列特征用awk。