Shell scripts
Shell scripts
- Write and run shell scripts to repeat complex tasks.
At this point you’re pretty familiar with using the command line; you should be able to do a variety of things:
- Navigate and create folder structures.
- Edit text files.
- Convert files between various formats.
- Filter lines from text files.
- Read and change permissions on files.
- Use version control software.
- Find files by name or properties.
- Redirect standard input, output, and error.
- Construct pipelines of commands.
While there’s a whole world of things you can still learn about using the command line, and more command line tools than you can imagine to learn about (which sometimes themselves contain entire programming languages!), you’ve learned and demonstrated a lot! 🎉
As you’re getting more used to using the command line, you may find yourself repeating similar, complex commands over and over again. Or maybe you find yourself doing the same kinds of things over and over again, maybe running multiple commands repeatedly. Or maybe you’ve built a complex pipeline that you want to keep for use later because you’ll need to use it again.
In their very simplest form, shell scripts are plain text files that contain a sequence of commands or statements, separated by newlines or semi-colons. As you grow shell scripts, they can contain things like conditional statements, loops, functions, and… uh, hey, wait a minute. That, uh, that sounds an awful lot like programming.
Most of the time we spend interacting with shells is interactive: the
shell is waiting for us to enter a command, and when we press
Enter, the shell runs the command, waits for it to finish (it
actually wait(2)
s, it’s a C function), then patiently waits
for us to enter another command.
Most shells can also run non-interactively: you give the shell the name of a file that contains a sequence of commands and the shell will just interpret, then run the sequence of commands like it’s a program.
Shells and languages
Similar to different programming languages, different shells use different syntax to express similar ideas. Deciding which shell program to use defines what syntax you’re going to use and the kinds of keywords you should be using when you’re looking for help (in the manual page for the shell, on the shell’s website, or generally online).
While your default shell on Aviary is tcsh
, we’re going
to be using bash
as a shell interpreter for scripting. This
is an opinionated choice, just as using vim
instead of
emacs
is an opinionated choice. As you get more comfortable
using the command line, you may want to choose a different shell (like
fish
) and thus a different shell language, but
bash
and its syntax are common enough that we’ll treat it
like a “lingua
franca”.
Basic scripting
Let’s start with the basics: the general structure of a shell script and some very simple shell scripts.
General structure
Shell scripts all start with a “shebang” line —
a line that starts with the symbols #!
. The first line
indicates which program is going to be run to interpret the rest of the
file.
#!/usr/bin/env bash # this is a bash script
Everything after the first line is the actual contents of the script, the sequence of commands to be executed.
Simple scripts
Here’s an example of a very simple script:
#!/usr/bin/env bash
ls -la
Copy and paste this into a new plain text file on Aviary; give it a
file name that represents what this does (I recommend la
,
“list all”).
Copying and pasting into vim
is tedious, having to enter
and exit paste mode is a real bother. Another way to quickly copy and
paste into your terminal to create a new file is using the program
cat
. As above, cat
is a program that will read
standard input and write to standard output. But we can redirect
standard output to a file!
cat > la
Once you enter that, press Enter, then paste, then press Control+D.
That’s it!
Once you’ve written the file, exit your editor. Before we can run a
shell script, we need to mark it as executable using
chmod
:
chmod a+x la
Then we can run the script:
./la
🎉, your first shell script!
Shell scripts consist of one or more lines of commands to run. Add another line to your script:
find . -name "*.md"
Then run it again (you don’t need to chmod
again). Now
two commands worth of output appear.
One command that you might find helpful when you’re scripting is the
echo
command, this is a “print” command for shells:
echo "Hello, world!"
You can also write comments using the #
symbol, anything
following is a comment:
echo "Hello, world" # a friendly message
find . -name "*.docx" -delete # clean up Word files
Here’s a more complete (and completely contrived) example of a script:
#!/usr/bin/env bash
echo "Here's what's in the current directory:"
# list all with long listings
ls -al
echo "Here are all the Markdown files:"
# find all files with the `.md` extension
find . -name "*.md"
Environment variables and
the $PATH
Shell languages, like other programming languages, support variables.
Almost all shells follow the same convention for naming variables:
- Variable names start with a
$
, - Variable names are all
UPPERCASE
, - Variable names are
SNAKE_CASE
(they use underscores to separate words).
All shells have special variables, and those special variables help your shell make decisions, or help define the behaviour that your shell has. These special variables are called “environment variables”.
Try running this in your shell to find out what shell you’re using:
echo $SHELL
$SHELL
is an environment variable that contains the name
of the running shell.
When you enter the names of programs on the command line, your shell has to figure out where that program actually is in a folder structure.
This is might seem obvious when you think about it, but think about it: the programs that you’re running on the command line are just files with bits in them. They were written in a programming language (often C), then compiled and put in a folder somewhere.
A lot of programs on Linux and UNIX systems live in the folders
/bin
and /usr/bin
. You can learn a bit more
about where files live on a Linux system by reading a manual page:
man hier
### OR
man 7 hier # hier is in section 7 for miscellaneous
Your shell uses a special environment variable called
$PATH
(often called the $PATH
) to
find where the command you just entered exists as a file. The
$PATH
contains a list of directories that your shell will
look in to find the file representing the command you asked it to run.
Different shells use different separators for directory entries in the
path. Both tcsh
and bash
use a colon
:
to separate directories.
As above, use the echo
command to print out what the
$PATH
is right now in your shell:
echo $PATH
Your $PATH
will contain many folders on Aviary, but it
importantly includes two directories:
.
: the current directory, and this is why you don’t have to type./
in front of programs you’ve written and compiled yourself.~/bin
: a directory namedbin
relative to your home directory. When youecho $PATH
, the~
will be listed as an absolute path (e.g.,/home/student/you/bin
).
What this means is that we can put scripts we write into the
directory ~/bin
, then we can run them
anywhere.
The directory ~/bin
may not exist in your user
directory on Aviary. Create the directory, then move the la
script you wrote above into this directory.
This applies only to tcsh
:
tcsh
caches the
names of commands that are in folders on the $PATH
. After
you add something to ~/bin
(which is on your
$PATH
), you’ve got to get tcsh
to regenerate
this cache. You can regenerate the cache in tcsh
by running
the command:
rehash
You must do this any time you add programs or commands to folders on
the $PATH
that you want to work in other places, but this
only applies to tcsh
. If you’re using a different shell
like bash
or fish
, then your shell will almost
certainly do this for you.
Now change back to your user directory:
cd ~
### Or just `cd` with no arguments
cd
And run la
. 🎉, now you can run la
in
any directory.
You can find out which environment variables are currently set and
what their values are using the env
command.
Arguments
Shell scripts that you write can accept arguments, just like programs you write in other programming languages. In both C and Java (and Python, technically), you can access arguments passed on the command line to your program as arrays of strings.
Shell scripts can access command line arguments using variables, but
you can directly access arguments on the command line as numbered
variables like $1
.
Here’s a small program that will print out the values passed to it as arguments on the command line:
#!/usr/bin/env bash
echo "The first argument is $1"
echo "The second argument is $2"
echo "The third argument is $3"
echo "All arguments are $*"
Write this script and try running it with different arguments to see
how the output changes. Don’t forget to use chmod
to set
execute permissions for your script!
A common use of arguments on the command line for shell scripts is to pass the name of a file or directory you want to operate on.
Let’s upgrade our la
script a little bit.
Remember that ls
can run with no arguments, and when run
with no arguments it’s defaulting to printing out the contents of the
directory .
. But ls
can accept
arguments. Our la
script doesn’t right now.
Change your la
script to accept an argument and pass it
to ls
:
#!/usr/bin/env bash
ls -al "$1" # quotes in case of spaces!
Now run la
again, but pass it an argument:
la .
Neat.
Why “quotes in case of spaces”? Try this:
- Remove the quotes around
$1
in your script. - Create a directory that has a space in its name
(
mkdir "space dir"
). - Try running
la "space dir"
. - Put back the quotes around
$1
in your script.
When the shell “expands” the variable $1
, it’s replacing
the value of that variable into the command literally. If the
variable contains spaces, it will be replaced in the command spaces and
all. In other words,
ls -al $1 # becomes:
ls -al space dir
If you remember way back a long time ago, we had to put
quotes around names with spaces when using mkdir
because
mkdir
would turn space dir
into two
directories. Similarly, ls
is looking for two separate
directories.
Including the quotes around $1
makes sure that even if
the variable contains spaces, it’s going to be quoted when it’s passed
to the command:
ls -al "$1" # becomes
ls -al "space dir"
While we’ve improved la
slightly here, we’ve also broken
it. Try running la
by itself with no arguments.
…
Oops. Now we need to test for the special case of no arguments being passed. We’re going to need some more tools for that: conditional statements.
Structures: conditional statements and loops
Shell scripting languages are fully featured programming languages and include structures like conditional statements and loops. They contain other structures, too, but let’s stick to the basics.
Conditional statements
Conditional statements in bash
use the familiar
if
keyword and resemble the expressions you’ve
seen in other languages.
One of the major differences in bash
are expressions
themselves: most of the questions you’re going to be asking about a
variable use unary operators.
Here’s what a bash
conditional statement looks like:
if [[ -a hello.c ]]; then
echo "hello.c exists"
else
echo "hello.c does not exist"
fi
The -a
is a unary operator on file names.
-a
returns true if the file exists, and returns false if
the file does not exist.
Spacing is important here! bash
is not a very smart
language. You might be tempted to leave out spaces between
[[
and -a
or between hello.c
and
]]
, but you must have spaces between these
symbols.
WHY?!
bash
is, uh, weird. [[
is technically a
command that takes arguments. The arguments the [[
command
is getting in the above example are -a
,
hello.c
, and ]]
. The ;
is a line
separator (like in Python it’s optional, but can be used).
Yeah, weird.
bash
has many unary operators that you can use to test
files or variables. The one we care about right now is the
-n
operator, asking if a string is non-zero in length.
Let’s add a conditional statement to la
to test for the
presence of arguments:
#!/usr/bin/env bash
if [[ -n "$1" ]] ; then
ls -al "$1"
else
ls -al
fi
You can find more unary operators in bash
by reading the
CONDITIONAL EXPRESSIONS
section of its manual page, but
here are some examples:
Operator | Meaning |
---|---|
-a file |
True if file exists. |
-d file |
True if file exists and is a directory. |
-r file |
True if file exists and is readable. |
-s file |
True if file exists and has a size greater than 0. |
string1 == string2 |
True if the strings are equal. |
Loops
We can’t talk about conditional statements without at least saying something about loops!
Similar to conditional statements, loops use the familiar
for
keyword. Bash also supports while
and
until
loops, but most of the time you’re using a loop in
Bash, you’re operating on some sequence of file names rather than until
some event happens.
The structure of a for
loop is frustratingly different
from conditional statements in a way it’s not in other programming
languages — the conditional statements you saw above use the
[[
and ]]
brackets for wrapping the
expression, but for
loops generally do not use brackets or
parenthesis in Bash.
Here’s what a for
loop looks like in Bash:
for f in * ; do
echo $f
done
- The
for
is …for
, it’s the start of the loop. - The
f
is the name of the variable you want to use as the name for the value in the current iteration of the loop over the sequence. - The
in
is a separator between the variable name and the sequence. - In this case
*
is the sequence. This is a “glob” or a pattern, and this glob in the shell means “all files in the current directory”. - The semi-colon
;
, like in conditional statements above, ends the current statement. do
, then is the beginning of the body of the loop.echo $f
is one command you want run on the variable. This will print out the variable’s name.
done
ends the body of the loop.
Maybe this looks sort of familiar. Maybe this looks like what we were
doing with find
and -exec
. They do accomplish
similar results!
Both work, and both are effective. One way to think about this
matching of ideas is that find
and -exec
are
more of a functional programming paradigm (this is a map
operation), for
loops are more of a procedural
paradigm.
When you’re writing for
loops, the sequence can either
use the patterns you’ve seen before (like *.md
), or can be
the result of a command.
In fact, we can rewrite the for
loop from above using
find
!
for f in $(find . -maxdepth 1) ; do
echo $f
done
The output looks a little bit different, but the result is the same.
Another common kind of loop you may want to write is one that
iterates over a sequence of numbers (like the traditional
for
loop you’ve seen in languages like Java). To do that
you can use the seq
command:
for num in $(seq 1 10) ; do
echo $num
done
Further reading
Just like programming, shell scripting goes way beyond what you’ve been introduced to here. You’ve got a good start, but as you keep working with shell scripting, you’ll find yourself running into situations where you need to get some more help.
- You can read the manual page for your shell to learn more about its
scripting language (e.g.,
man bash
orman tcsh
).- Sections of interest in the manual page for
bash
include theCONDITIONAL EXPRESSIONS
section and theCompound Commands
subsection.
- Sections of interest in the manual page for
- Joshua Levy’s “The Art of Command Line” is a very good resource that’s been translated into many languages. It’s a good reference to keep in your bookmarks.
- ShellCheck is a tool for identifying and then helping you fix possible bugs in your shell scripts.
- The Advanced Bash-Scripting Guide is a comprehensive guide for shell scripting with Bash.