# Problem: Finder

Questions? Feel free to head to CS50 on Reddit, CS50 on StackExchange, the `#cs50ap` channel on CS50x Slack (after signing up), or the CS50 Facebook group.

## tl;dr

Implement a program to recursively search for files, per the below:

``````$./finder hello ~/$ cat staff_found.txt
/nix/.node-gyp/4.3.1/include/node/openssl/symhacks.h
/nix/.node-gyp/4.3.1/include/node/openssl/ssl.h
[...]
/home/ubuntu/.nvm/versions/node/v4.7.3/include/node/openssl/ssl.h``````

This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.

### Reasonable

• Communicating with classmates about problems in English (or some other spoken language).

• Discussing the course’s material with others in order to understand it better.

• Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.

• Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.

• Reviewing past years' quizzes, tests, and solutions thereto.

• Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.

• Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.

• Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.

• Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.

• Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

### Not Reasonable

• Accessing a solution to some problem prior to (re-)submitting your own.

• Asking a classmate to see his or her solution to a problem before (re-)submitting your own.

• Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.

• Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.

• Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.

• Looking at another individual’s work during a quiz or test.

• Paying or offering to pay an individual for work that you may submit as (part of) your own.

• Providing or making available solutions to problems to individuals who might take this course in the future.

• Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.

• Searching for or soliciting outright solutions to problems online or elsewhere.

• Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).

• Submitting (after possibly modifying) the work of another individual beyond allowed snippets.

• Submitting the same or similar work to this course that you have submitted or will submit to another.

• Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.

• Viewing another’s solution to a problem and basing your own solution on it.

## Assessment

Your work on this problem set will be evaluated along four axes primarily.

Scope

To what extent does your code implement the features required by our specification?

Correctness

To what extent is your code consistent with our specifications and free of bugs?

Design

To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?

Style

To what extent is your code readable (i.e., commented and indented with variables aptly named)?

To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.

Curl up with some shorts to re-familiarize yourself with file I/O and recursion.

First, here’s Zamyla with recursion:

And Jason, with file I/O:

These two concepts are the basis of this problem, so make sure you understand how both work before proceeding further.

## Getting Started

As usual, after logging into cs50.io, execute

``update50``

within a terminal window to make sure your workspace is up-to-date.

Next, execute

``cd ~/workspace/chapter4``

at your prompt to ensure that you’re inside of `chapter4` (which is inside of `workspace` which is inside of your home directory). Then execute

``wget http://docs.cs50.net/2017/ap/problems/finder/finder.zip``

Then,

``unzip finder.zip``

to unzip the file. Remove the ZIP file (remember how?), then proceed to execute

``cd finder``

followed by

``ls``

and you should see that the directory contains a number of files as well as directories. If you `cd` into some of those directories as well, you’ll see each of them contain their own files and/or directories. Feel free to open some of the files up; save for `finder.c`, you’ll see that they are each text files containing some strings. Looks like we’ve got a lot to work with here! Return to the finder directory by using

``cd ..``

as needed, to "move up" one level back to the `finder` directory, or go there directly by providing an absolute path as the argument to `cd`:

``cd ~/workspace/chapter4/finder``

## (g)report

We have seen a good number of terminal commmands so far in this course that have proved to be very useful. Some, like `cd`, `unzip`, `ls`, and `wget` we have used on nearly every single problem.

There are a multitude of other terminal commands, however, with which you may still be unfamiliar but that, once known, can prove to be very useful as well. One of these is called `grep`.[1].

`grep` allows a user to search any given input files, selecting lines that match one or more string patterns given. For example, if you were to run

``grep "foo" bar.txt``

or, if as here you are just searching for a single word, the quotation marks could be omitted:

``grep foo bar.txt``

The call would print to the terminal all lines in `bar.txt` that contain the string `foo`.

`grep` can also search through entire directories, rather than just individual files, searching every file in the directory for the requested string (even if the string is deeply nested several directory levels down from the starting location). This requires an extra `-r` flag in the command. Can you guess what the `-r` stands for?

That’s right, recursion! How fun.

In any case, executing

``grep -r foo mydirectory``

will print to the terminal a list of lines containing the string `foo` found in all files within `mydirectory`, including those files that might exist in subdirectories thereof. Thankfully, it also prints the name of the file containing the line, so you can easily find whatever it is you’re looking for.

You can imagine scenarios in which `grep` comes in handy. Imagine having to look through thousands of files, and you can search through all of them for any particular string you’d like with one simple command. In fact, if a Mac OS user[2], you may be familiar with a similar GUI tool called "Finder". In this example, we’re searching our entire `Dropbox` folder for the string "I took CS50":

In this problem, you will be implementing your own pared-down version of `grep`. Once implemented, your program should output (in a separate file!) a list of all files containing the search string in whatever directory you’ve looked through.

Your version, unlike the native `grep` utility, will only support searching through directories (and therefore the files contained within them); you will not be able to search only one individual file unless that file is the only one contained in the directory. We also won’t worry about finding the specific line number or printing the line once the string is found. In that sense, this implementation of `grep` will behave somewhat more like the way "Finder" does.

## Seek and Find

Let’s dive into the distribution code for the `finder` program, all found in `finder.c`. Open up `finder.c` and take a moment to scroll through it. Notice there are three functions contained in the program, one of which has been implemented for you. Let’s start from the top.

• Atop the file you’ll see some cryptic `#define`s. These simply give special instructions to the compiler to allow the program to work as intended, and you can ignore them.

• Next up are some familiar header files, along with a new one we haven’t yet seen. More on that later.

• Now, onto the interesting stuff! The next part of the file is a definition for a `struct` we’ve called `path`. This `struct` contains two elements, `name` and `type`. The `name` field will hold the filepath to a particular file or directory, relative to the directory `finder.c` is in.

For example, there is a file inside of the `finder` directory called `hello.txt`, nested within severage directories. `name` for this particular file might be `./this/is/cs50/hello.txt`. Make sure you understand why that is the case before moving on.

The type field will hold one of two strings: either "file" or "directory", depending on the type of the file stored at the filepath in `name`. `hello.txt`, of course, would be a "file". Were our `name` to be `./this/is/cs50`, however, the `type` in that case would be "directory".

• Below that you will see another `struct` definition, this one for a type called `directory`. This struct also has a field called `name`, which will once again hold the relative filepath to the directory. `npaths` is an `int` that will hold the number of files (both regular files and directories) found inside the directory.

At this point, we come across something new: `path*`. The `*` character tells us that this variable is actually something called a pointer to a `path`, rather than simply a `path` itself. We will go into more detail about pointers later in the course, but for now, treat this variable as nothing more than an array of `path`s. The number of elements in the array is equal to `npaths`, and each element of that array is itself a `path`, an entry for a file in the directory (containing that file’s name and type). You can index into it as normal.

• Finally, we see a global variable called `key` which will hold the string we’d like to search for in our files, and two function declarations that will be covered below.

On to `main`!

### main

Recall what the variables `argc` and `argv` represent in terms of command line arguments. All `main` will do in this program (except for a little bit of setup) is call `seek` at the bottom; but we need to make sure the inputs to the function are valid before we can call it!

Your program should handle up to two command line arguments; the first, the search string, is mandatory. The second, the directory in which to search, is optional. For example, then, the following variations of command line inputs to seek for a string `foo` are acceptable:

``./finder foo``

and

``./finder foo bar/``

Note that if the second command line argument is entered, it must end with a trailing slash, or the program will not work.

So before you call `seek` at the bottom of `main`, you must do the following:

1. Ensure you have the correct number of command line arguments, remembering that the user can input either just the string to search for or a string and a specific directory. Remember how? If the correct number of arguments have not been entered, you should print a helpful message to the terminal and return `1`.

2. Set the value of the global variable `key` referenced above.

3. Set the `name` value of the directory struct `dir` that has been created for you. If the user has input a directory, you should set `name` to that string; else, the name variable should be set to your current directory, which you do by way of the string `"./"`.

And that’s all for `main`! Let’s move on.

### populate

The entirety of the populate function has been written for you and contains some complex pieces, so no need to worry about understanding everything fully in this part - though you should understand the gist of what the function does. Let’s take a quick tour.

• We first begin by simply initializing the values that will be used later on in the function. Note that `npaths` is set to 0, and the value of our pointer `paths` is set to `NULL`.

• The next piece of code involves a system call, `opendir`, defined in the library `dirent.h` we saw at the top of the program.[3] If the filepath contained in `dir.name` is valid, this function effectively opens that directory for reading, much like files are opened with the call `fopen` (remember that from Jason’s short? If not, it might be a good idea to go back and review!).

The call `readdir` acts much like a call to either `fscanf` or `fgets`, and grabs each element in the directory one at a time. These directory entries contain the file names, types, and other information, as we would expect.

• The next section of this function checks whether or not the directory entry is a file or a directory, and updates the `path` structure `newPath` accordingly. The `name` field of `newPath` is set to the relative filepath, and the `type` field is set to either "file" or "directory".

Of note in this section as well are the function calls `realloc` and `calloc`, which are variations of a call named `malloc`. You will become very familiar with `malloc` later on in this course, but for now, understand that these calls explictly request and set aside for use some memory to hold the variables declared.[4] In the future, you will be expected to free all memory you have allocated. Feel free to type `man malloc`, `man realloc`, or `man free` into the terminal if curious for more information.

• Finally, we close the directory we have opened. This is good practice and goes for files as well; in the `seek` function below, make sure to close every file you open using `fclose`.

### seek

Darn! This one’s just a big `TODO`. Here is where your coding skills and newfound expertise with file I/O and recursion will come into play. This function should eventually take a `directory` structure as its parameter and iterate over all files in that directory (good thing you know how many files there are in each directory!), searching for the string stored in the global variable `key`. If there exists a directory inside another, we must search everything inside that one too. Do you have a guess how? Let’s break it down.

1. Recall that our first call to `seek` up in `main` takes in either the directory the user has input at the command line or `./`. But having merely a name for a directory does not do us much good; we need to populate it with information about what’s inside it first! Wonder if there’s a function that might help with that…​

2. Once you have a full `directory` structure, you can begin iterating through the entries contained in it.

1. If the entry is of type "file", you should open it up to read from (remember how?). Then you will want to begin scanning through the file searching for the string contained in `key`. You will likely find the functions `fgets` and `strstr` to be of use to you here (declaring in `stdio.h` and `string.h`, respectively). If using `fgets`, it is worth noting that the maximum length of a string in the distribution text files is 50 characters long, though you can use a larger value if you’d like.

1. If indeed you successfully found `key` inside the file, open an outfile called `found.txt` for appending, not writing. (This way every time you open and write to the file, you will not overwrite any file names already stored in there.)

Copy the name of the file where you found the string into `found.txt`. You should additionally add a newline character at the end to ensure separation of each file name.

Additionally, you will only want to write a file name into your outfile once, even if the string is found in multiple locations within that same file, so perhaps you can devise an easy way keep track of whether or not[5] you’ve already output that file.

2. Ensure you close every file you open, but also ensure that if you try to close a file, it has indeed been opened. Some error checking around `fopen` should help you here.

2. If, on the other hand, the entry is of type `directory`, you should create a new `directory` structure, populate it, and set the `name` field accordingly. You then will need to iterate through this new directory you’ve created in the same way you’ve iterated through the others. Think recursively!

3. Once you’ve iterated through everything, you should return `0` at the end of `seek` to indicate success.

Phew, we made it. If unsure of how to start, review the shorts above! And as always, it helps to break it down into pieces. Handle one case (file or directory) first, and then the other.

You should also feel free to test your code by creating new directories and files containing any strings your heart desires, and searching for those strings to see if your program can find them correctly.

### `check50`

``check50 cs50/2017/ap/finder``

If you’d like to play around with the staff’s solution, execute

``~cs50/chapter4/finder``

That program will generate a file in the same directory as `finder.c` entitled `staff_found.txt`. If you open the file and look at it, you’ll see the names of all the files (with file paths relative to the directory `finder.c` is in) that contain the search string you sought out.

## How to Submit

### Step 1 of 3

``update50``

### Step 2 of 3

Be sure that each of your files are in `~/workspace/chapter4/finder`, as with:

``````cd ~/workspace/chapter4/finder
ls``````

If any file is not in `~/workspace/chapter4/finder`, move it into that directory, as via `mv` (or via CS50 IDE’s lefthand file browser).

### Step 3 of 3

Submit `finder`:

``````cd ~/workspace/chapter4/finder
submit50 cs50/2017/ap/finder``````

If you run into any trouble, email sysadmins@cs50.harvard.edu!

You may resubmit any problem as many times as you’d like.

Your submission should be graded for correctness within 2 minutes, at which point your score will appear at cs50.me!

This was Finder.

1. If curious, this is an acronym for the rather cumbersome "globally search for a regular expression and print"
2. This feature exists in Windows as well, and has become more prominent in more recent versions
3. You may recall that C was originally developed in the early 1970s alongside the operating system UNIX. Though its ability to directly manipulate the operating system like this is not used terribly frequently in CS50, it’s a very powerful tool when used correctly!
4. Incidentally, when you do allocate memory explicitly as we do here, you must later free the memory to avoid what are called memory leaks. For this problem, we will not require any explicit calls to `free` in your code, and therefore will be okay with your program suffering from memory leaks.
5. Hmm…​ only two options for this. Seems a bit like yes or no, on or off, `true` or `false` …​