---
title: "Finding files"
author: Franklin Bristow
---
Finding files
=============
::: outcomes
* [X] Find files on the command line by name using patterns.
:::
You might be saying to yourself "[wat]". "Why would I ever need to find files on
the command line?"
I'd say to you: "You're right, we just spent time figuring out an appropriate
folder structure for your courses!"
But then I'd also say: "You're not always going to be working with folder
structures that you made. You're not always going to be working with files
that you've created. Sometimes you'll know what the name of a file is (or
something like what the file name should look like), but not where it is within
a deeply nested or complex folder structure."
So here I am saying it: You're not always going to be working with folder
structures that you made. You're not always going to be working with files
that you've created. Sometimes you'll know what the name of a file is (or
something like what the file name should look like), but not where it is within
a deeply nested or complex folder structure.
We're going to be using a program that has an amazingly appropriate name:
`find`.
::: aside
If you're using macOS, *some* of these examples will work on your local
computer. If you're using Windows, **none** of these examples will work on your
local computer. If you're using a Linux distribution, ***all*** of these
examples will work on your local computer.
Somewhat annoyingly, some of the examples that we'll look at will work
*differently* between macOS and Linux. We will try to use examples that work on
both the same way, but...
Please connect to Aviary before you try to run any of these examples.
:::
::: example
The simplest way to run `find` is to run it with a single argument: the
directory that you want to find files in:
```bash
[you@bird ~]> find .
```
`find` will proceed to list all of the files that it can find starting in the
current directory, and that's probably a lot more than you expected.
:::
[wat]:
https://i.kym-cdn.com/photos/images/original/000/000/151/n725075089_288918_2774.jpg
Finding files by name
---------------------
Normally this isn't how you would use `find`. This is certainly *a* way you can
run `find`, but it's not the typical usage.
One fairly common way to use `find` is to search for files by their name. We're
going to look at this in several ways:
* Finding files by an exact file name that we know in advance.
* Finding files by a file name that we know matches a pattern.
* Finding files by a file name that we know matches a pattern, but we don't know
the *case* of the file's name (UPPER CASE or lower case).
You're going to be working with a directory structure that you don't know
anything about and is too deep and complex for you to manually search through.
I really hope the directories you see in your job or your academic life don't
look anything like this!
Start by downloading this file, but make sure that you **do not** download it into
your repository:
https://www.cs.umanitoba.ca/~fbristow/crazy-directories.tar
**Note**: There are over 7000 directories in this folder structure.
It's not worth your time trying to find anything trying to use `cd` and `ls`.
This is a `.tar` file (a "[Tape ARchive]"). Similar to a `.zip` file, this is
a kind of "compressed" file --- there are many files *within* this single file,
and you need to use a program to "expand" the archive. For `.tar` files, the
program's name is `tar`:
```bash
tar -xf crazy-directories.tar # Th eXtract File
```
**Note**: **There are over 7000 directories in this folder
structure**. Extracting the entire directory structure on Aviary will probably
take 5 or more minutes to complete.
[Tape ARchive]: https://en.wikipedia.org/wiki/Tar_(computing)
::: aside
A **single command** was used to create this directory structure:
```bash
mkdir -p {a..z}/{0..9}/{a..z}
```
This uses a feature that's present in *some* shells called brace expansion.
This **will not** work on Aviary when you first connect. Aviary, by default,
uses a shell program called [tcsh]. tcsh does not support brace expansion.
[Bash] supports brace expansion, and that's the shell that was used to create
this directory structure. You can run `bash` on the command line to start the
Bash shell.
::: aside
An aside *within an aside*? Is that even legal?
Bash and tcsh are not the only shell programs. There are [many shell programs].
We're unfortunately not going to spend time talking about shells. Most people
don't even think about shells and just use the one that starts automatically
when the terminal opens or when they connect to a remote computer.
When you take the time to try out different shells, though, it's a similar
choice to choosing a text editor.
[many shell programs]:
https://en.wikipedia.org/wiki/List_of_command-line_interpreters
:::
[tcsh]: https://en.wikipedia.org/wiki/Tcsh
[Bash]: https://en.wikipedia.org/wiki/Bash_(Unix_shell)
:::
### Finding an exact file name
Let's imagine that we know the exact name of a file. I'm going to tell it to
you. The file name that you want to find is "`bananas.md`".
::: example
Change into the `crazy-directories` directory (`cd`).
We can use the `-name` option on `find` to look for a file by name when we know
the exact name of the file:
```bash
find . -name "bananas.md"
```
:::
### Finding a file name matching a pattern
Finding a file using a name that we know is straightforward.
Sometimes (maybe more frequently), we only know what the file name *looks like*.
Examples of situations like this might be:
* We want to find all files that have a name ending in "`.java`".
* We want to find all files that have a name containing "`List`".
* We want to find all files that have a name starting with "`Hello`".
Thankfully, we can use "patterns" to help us find files with names where we
don't know the *exact* name of the file.
Patterns require using characters that have special meanings:
* `*` means "anything": any number of characters (0 or more) and any character.
* `?` means "one character": exactly one character, but it can be any character.
* `[` and `]` is a "character class"; `[abc]` means "any of `a`, `b`, or `c`".
::: example
Starting from the `crazy-directories` directory, we're going to be looking for
files that match some specific patterns.
You can find the same file that we saw before (`bananas.md`) using patterns:
```bash
find . -name "*nanas.md" # anything ending with "nanas.md"
find . -name "*.md" # anything ending with ".md"
find . -name "?ananas.md" # anything that has exactly one character, then
# ananas.md
find . -name "*bananas*" # anything that contains "bananas"
find . -name "banana*" # anything that starts with "banana"
find . -name "[AbC]ananas.md"
# files named "bananas.md", "Aananas.md", or
# "Cananas.md"
```
:::
### Finding a file names with unknown case
You may have encountered this before while working on the command line,
especially if you're moving between Windows on your personal computer and
Aviary: file names on Unix or Unix-like systems (Linux or macOS) are case
sensitive. That means that "`BANANAS.md`" and "`bananas.md`" and "`bAnAnAs.md`"
are three different file names that could be completely unique on Linux or
macOS.
We *could* use the character class pattern to find files that have any case, but
typing this out would be extraordinarily tedious:
```bash
find . -name "[Bb][Aa][Nn][Aa][[Nn][Aa][Ss].md"
```
Thankfully, there's another option for `find`: `-iname`.
::: example
When you know the general pattern that your file has for its name, but you don't
know what case the file name uses, you can use the `-iname` option; the `i` in
`-iname` stands for "ignore case".
```bash
find . -iname "bananas.md"
```
You can use any of the special characters (`*`, `?`, or `[]`) with the `-iname`
option.
:::
Running programs on the files you find
--------------------------------------
::: outcomes
* [X] Execute commands on the files that match the pattern.
:::
Sometimes finding the files is enough (i.e., "Great! Now I can open the file
with `vi` or `vim` or `emacs` or ..."). Sometimes you actually want to *do
something* with the files that you find. Let's first explore some of the things
that we might want to *do* to files, and then look at the way we would use
`find` to do those things on our behalf.
### What would we want to do?
What kinds of things might you want to do with the files that you find on the
command line using `find`?
* Converting between file formats (e.g., you like to download music using the
[FLAC] format and need to convert to [MP3]).
* Or... you have many "`.md`" files that you want to convert to "`.docx`".
* Changing files (e.g., you have many very large photos that you want to also
have smaller versions of, or "thumbnails").
* Deleting files (e.g., you have many `.class` files, and you want to delete
them all).
[FLAC]: https://en.wikipedia.org/wiki/FLAC
[MP3]: https://en.wikipedia.org/wiki/MP3
### Using `find` to accomplish our goal
We're going to use two options on `find` to accomplish our goal: `-exec` and
`-delete`.
#### `-delete`
Deleting files is pretty straightforward using the `-delete` option.
::: warning
Remember: **There is no "Recycle Bin" or "Trash Can" on the command line**. When
you delete something using either with `rm` or with the option we're about to see,
it's **gone**.
:::
::: example
Let's delete the file with the exact file name `bananas.md`:
```bash
find . -name "bananas.md" -delete
```
The `-delete` option on `find` deletes any file that has a name matching the
pattern that you've described to `find`. That means that if multiple files
match the pattern, then all of those files would be deleted with the `-delete`
option.
:::
#### `-exec`
More often than deleting stuff, you're going to want to run a program on the
things that you find.
Similar to the way that the `-delete` option works (it deletes all files that
have a name matching the pattern), the `-exec` option will run a program for
each file that has a name matching the pattern.
The `-exec` option is *slightly* differently from `-delete` in that you need to
include how to run the command, but the best way to describe this is by example.
::: example
Let's start by finding all Markdown files in the `crazy-directories` directory:
```bash
find . -name "*.md" # find all files with the extension .md
```
There are a few more than expected!
We want to convert all of these Markdown files to Word files (`.docx`). Doing
this manually is possible, but tedious, so we'll let `find` to the work for us:
```bash
find . -name "*.md" -exec pandoc '{}' -f markdown+emoji -o '{}'.docx \;
```
There's some weird looking stuff in there, so let's break it down:
* ```bash
find . -name "*.md" # find all files with the extension .md
```
This is the same as what we've seen before: find all files with names that
match the pattern.
* ```bash
-exec
```
Here's the start of our new option for `find`. Everything that follows the
`-exec` option is another complete command.
* ```bash
pandoc '{}' -f markdown+emoji -o '{}'.docx \;
# run pandoc on the name of the file
# quote it in case there are spaces;
# The output file is the same file name
# with a .docx extension (Word), and the
# command is terminated with \;
```
* We've seen `pandoc` before lots of times.
* One new thing here is the `'{}'` (twice).
* When `pandoc` is run by `find`, the `{}` is replaced with the name of the
file that `find` found. The quotes are around `{}` in case the name of the
file that `find` found contains spaces. In our example, one of the files
`find` finds is `bAnAnAs.md`, and this would make the command look like:
```bash
pandoc 'b/5/s/bAnAnAs.md' -f markdown+emoji -o 'b/5/s/bAnAnAs.md'.docx \;
```
* The `-f markdown+emoji` is an option to `pandoc` to say that we want it to
actually render emoji using shortcodes.
* The `\;` is the "end of command" marker that the `-exec` option looks for so
it knows where the command ends.
If everything works out, we shouldn't see any output (`pandoc` is pretty quiet).
Now check out the new files you have in your directory:
```bash
find . -name "*.docx"
```
:::
The `-exec` option sure looks *weird*, but it's a pretty powerful option: we can
find files that have a name matching a pattern, and then run commands on those
files with a single command.