Best Practices for Writing Linux Shell Scripts Cat's Blog

preface

Recently, I started cleaning shell scripts again due to work needs. Although most commands are often used by the user, they are always ugly when they are written into scripts. And when I read scripts written by others, I always find it difficult to read. After all, shell script is not a serious programming language. It is more like a tool to mix different programs for us to call. Therefore, many people also think about where to write when writing, which is basically like a super long main function, and can't bear to look directly at it. At the same time, due to historical reasons, There are many different versions of the shell, and there are many commands with the same function that we need to choose, so that the code specification is difficult to unify.

Considering the above reasons, I checked some relevant documents and found that many people have considered these problems and have formed some good articles, but they are still a bit fragmented. Therefore, I will sort out these articles here as a technical specification for my own script writing in the future.

Code style specification

"Snake Stick" at the beginning

The so-called shebang actually appears in the first line of many scripts with "#!" The comment at the beginning indicates the default interpreter when we do not specify an interpreter, which is generally as follows:

Of course, there are many kinds of interpreters. In addition to bash, we can use the following command to view the native supported interpreters:

When we use it directly/ a. When sh executes this script, if there is no shebang, it will use the interpreter specified by $SHELL by default. Otherwise, it will use the interpreter specified by shebang.

However, the above writing method may not be adaptable, and we usually specify it in the following way:

This is the way we recommend.

Code has comments

Annotation is obviously a common sense, but it is important to emphasize again here, especially in shell scripts. Because many single line shell commands are not so easy to understand, and without comments, maintenance will be particularly difficult.

The meaning of annotation is not only to explain the purpose, but also to tell us the precautions, just like a README.
Specifically, for shell scripts, comments generally include the following parts:

shebang
Parameters of script
Purpose of script
Notes on script
Script writing time, author, copyright, etc
Notes before each function
Some complex single line command comments

Parameters should be standardized

This is very important. When our script needs to accept parameters, we must first judge whether the parameters conform to the specifications and give appropriate echoes to facilitate users to understand the use of parameters.

At least, let's at least judge the number of parameters:

Variables and magic numbers

Generally, we will define some important environment variables at the beginning to ensure the existence of these variables.

This definition method has a common purpose. The most typical application is that when we have installed many java versions locally, we may need to specify a java to use. Then we will redefine JAVA_HOME and PATH variables at the beginning of the script to control.

At the same time, a good piece of code usually does not have many "magic numbers" hard coded in the code. If necessary, it is usually defined in the form of a variable at the beginning, and then called directly when calling, so as to facilitate future modification.

Indentation has rules

For shell scripts, indentation is a big problem. Because there are many places to indent (such as if, For statements) are not long. Many people are too lazy to indent, and many people are not accustomed to using functions, which weakens the indenting function.

In fact, correct indentation is very important, especially when writing functions. Otherwise, it is easy to confuse the function body with the command directly executed when reading.

There are two common indentation methods: "soft tab" and "hard tab".

The soft tab is indented with n spaces (n is usually 2 or 4)
The so-called hard tab certainly refers to the real "" character
Instead of tearing down the best way, we can only say that each has its own advantages and disadvantages. Anyway, I'm used to using hard tab.
For if and for statements, we'd better not put then, Do keywords are written on a separate line, which looks ugly...

Standard naming

The naming convention basically includes the following points:

File name specification, ending with. sh, for easy identification
The variable name should have meaning and should not be misspelled
Unified naming style, writing shells usually use lowercase letters and underscores

Coding should be unified

When writing scripts, try to use UTF-8 encoding to support Chinese and other strange characters. Although I can write Chinese, I still try to write comments and log in English. After all, many machines do not directly support Chinese, which may lead to garbled code.

It is also important to note that when we write shell scripts in utf-8 code under Windows, we must pay attention to whether the utf-8 has a BOM. By default, Windows judges the utf-8 format by adding three EF BB BF bytes at the beginning of the file, but in Linux, there is no BOM by default. Therefore, if we are writing scripts under Windows, we must pay attention to changing the code to Utf-8 without BOM. Generally, we can change it with notepad++and other editors. Otherwise, the first three characters will be recognized when running under Linux, and some errors will be reported that the command cannot be recognized.

Remember to add permissions

Although this is very small, I often forget that without execution permission, direct execution will be impossible, which is a bit annoying...

Logging and Echo

Needless to say, the importance of logs is very important in large projects, which can facilitate our error correction.

If this script is used by users directly on the command line, we'd better be able to echo the execution process in real time during execution to facilitate user control.

Sometimes, in order to improve the user experience, we will add some special effects in the echo, such as color and flicker. For details, please refer to the introduction of ANSI/VT100 Control sequences.

Password to be removed

Do not hard code the password in the script, do not hard code the password in the script, do not hard code the password in the script.

Tell the important things three times, especially when the script is hosted on a platform like Github...

Too long to branch

When calling some programs, the parameters may be very long. In this case, in order to ensure a better reading experience, we can use backslashes to separate:

Notice that there is a space before the backslash.

Coding detail specification

Code efficiency

When you use a command, you should know the specific method of the command. Especially when the data processing is heavy, you should always consider whether the command will affect the efficiency.

For example, the following two sed commands:

They are all used to get the first line of the file. But the first command will read the entire file, while the second command will only read the first line. When the file is large, just such a different command will cause a huge difference in efficiency.

Of course, this is just an example. The correct use of this example should be the head - n1 file command...

Frequently use double quotation marks

Almost all the bigwigs recommend adding double quotation marks when using "$" to obtain variables.

Without double quotation marks, it will cause great trouble in many cases. Why? Take an example:

His operation results are as follows:

Why is this so? In fact, it can be explained that he executed the following order:

In many cases, when you use variables as parameters, you must pay attention to the above points and carefully understand the differences. The above is just a very small example. In practical application, there are too many problems caused by this detail...

Skillfully using main function

We know that, like Java, Compiled languages like C have a function entry. This structure makes the code highly readable. We know which ones are directly executed and which ones are functions. But scripts are different. Scripts belong to an interpretive language. They are executed directly from the first line to the last line. If commands and functions are mixed together in this process, it is very difficult to read.

My friends who use python know that a standard python script is generally like this:

He used a very clever method to implement the main function we are used to, making the code more readable.

In the shell, we also have similar tips:

We can use this writing method to implement a similar main function to make the script more structured.

Consider Scope

The default variable scope in the shell is global, such as the following script:

The output result is 2 instead of 1, which obviously does not conform to our coding habits and is likely to cause some problems.

Therefore, instead of using global variables directly, we'd better use commands such as local readonly. Secondly, we can use declare to declare variables. These methods are better than using the global method to define.

Function return value

When using functions, be careful, The return value of a function in the shell can only be an integer. It is estimated that because the return value of a function usually represents the running state of the function, 0 or 1 is usually enough, so it is designed to be like this. However, if you have to pass a string, you can also use the following alternative methods:

In this way, you can pass some additional parameters through echo or print.

Indirect reference value

What is indirect reference? For example, the following scenario:

We have one variable VAR1 and another variable VAR2. The value of VAR2 is the name of VAR1. Now we want to obtain the value of VAR1 through VAR2. What should we do?

The method for comparing native turtles is as follows:

This usage is really feasible, but it seems very uncomfortable and difficult to understand. We do not recommend it. In fact, we do not recommend using eval.

Comfortable writing is as follows:

By adding a variable name! You can do simple indirect reference.

However, it should be noted that with the above method, we can only achieve the value, not the assignment. If you want to assign values, you should also use eval honestly:

Use heredocs skillfully

The so-called heredocs can also be regarded as a multi line input method, that is, an identifier is defined after "<<", and then we can input multi line content until we encounter the identifier again.

Using heredocs, we can easily generate some template files:

Learn to check the path

In many cases, we will first obtain the path of the current script, and then use this path as a benchmark to find other paths. Usually, we use pwd directly to get the path of the script.

But in fact, this is not rigorous, Pwd gets the execution path of the current shell, not the execution path of the current script.

The correct approach should be the following two:

You should first cd into the directory of the current script and then pwd, or read the path of the current script directly.

Keep the code short

The brevity here not only refers to the code length, but only the number of commands used. In principle, we should never solve problems that can be solved by one command with two commands. This is not only related to the readability of code, but also related to the efficiency of code execution.

The most classic examples are as follows:

This is the most despised use of the cat command. It is meaningless to use it. It is clear that one command can solve the problem, and he has to add a pipe...

In fact, short code can also ensure the improvement of efficiency to some extent, such as the following example:

The two methods do the same thing, that is, find all files with the. txt suffix and make a series of replacements. The former executes find multiple times, while the latter executes find once, but adds the mode string of sed. The first one is more readable, but when the amount of replacement becomes larger, the second one will be much faster than the first one. The reason for the efficiency improvement here is that the second one only needs to execute the command once, while the first one needs to execute multiple times.

In addition, we can also easily parallelize by skillfully using the xargs command:

Specifying parallelism through the - P parameter can further speed up execution efficiency.

Command parallelization

When we need to fully consider the execution efficiency, we may need to consider parallelization when executing commands. The simplest parallelization in the shell is done through the "&" and "wait" commands:

Of course, the number of parallel operations here cannot be too many, or the machine will get stuck. The correct method is more complicated. I will discuss it later. If you want to save time, you can use the parallel command.

Use new writing method

The new writing here does not mean how powerful, but that we may prefer to use some newly introduced syntax, which is more code oriented, such as

Try to use func() {} to define functions instead of func {}
Try to use [[]] instead of []
Try to use $() to assign the result of the command to the variable instead of back quotes
In complex scenarios, try to use printf instead of echo for echo

In fact, many functions of these new writing methods are more powerful than those of the old ones, and you will know when you use them.

Other small tips

Considering that there are many fragmentary points, we will not expand them one by one. Here is a brief mention.

The path should be absolute as far as possible. Most paths are not prone to errors. If you have to use a relative path, you'd better use it/ modification
The variable replacement of bash is preferred to replace awk sed, which is shorter
For simple if, use&&| | as far as possible, and write it as a single line. For example, [[x>2]]&&echo x
When exporting variables, try to add the namespace of the subscript to ensure that the variables do not conflict
Trap will be used to capture the signal and perform some finishing work when receiving the termination signal
Use mktemp to generate temporary files or folders
Filter unfriendly output information with/dev/null
Use the return value of the command to judge the execution of the command
Before using the file, judge whether the file exists, or handle the exception
Do not process the data after ls (such as ls - l | awk '{print $8}'), The results of ls are very uncertain, and the platform is relevant
Do not use for loop when reading files, but use while read

Static check tool shellcheck

summary

In order to ensure the quality of scripts in the system, our simplest idea is probably to develop a static inspection tool, which can compensate for the knowledge blind spots that developers may have by introducing tools.

There are really not many static checking tools for shells on the market. You can find a tool called shellcheck after looking around. Open source on github, there are more than 8K stars, which seems to be very reliable. We can go to his home page for specific installation and use information.

install

This tool supports different platforms, at least

Major package management tools for Debian, Arch, Gentoo, EPEL, Fedora, OS X, openSUSE and other platforms. Easy installation. Please refer to the installation documentation for details

integrate

Since it is a static inspection tool, it must be integrated into the CI framework, Shellcheck can be easily integrated into Travis CI for static check of projects with shell script as the main language.

Sample

The Gallery of bad code in the document also provides a very detailed standard of "bad code", which has very good reference value. It is very comfortable to read as "Java Puzzlers" when you are free.

essence

However, in fact, I think the most essential part of this project is not the above functions, but it provides a very, very powerful wiki. In this wiki, we can find all the judgment bases of this tool. Here, each detected problem can be found in the corresponding question list number in the wiki. He not only tells us "this is not good", but also tells us "why this is not good", and "how should we write it", which is very suitable for further research by the party.

Link: https://blog.mythsman.com/2017/07/23/1/ (Click the end to read the original text.)

Original link: Best Practices for Scripting Linux Shells , Please indicate the source for reprinting!