Implement a program to recursively search for files, per the below:
$ ./finder hello ~/ $ cat staff_found.txt /nix/.node-gyp/4.3.1/include/node/openssl/symhacks.h /nix/.node-gyp/4.3.1/include/node/openssl/ssl.h [...] /home/ubuntu/.nvm/versions/node/v4.7.3/include/node/openssl/ssl.h
Chapters 28-29 of Programming in C.
This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.
The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.
Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.
Communicating with classmates about problems in English (or some other spoken language).
Discussing the course’s material with others in order to understand it better.
Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.
Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.
Reviewing past years' quizzes, tests, and solutions thereto.
Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.
Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.
Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.
Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.
Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.
Accessing a solution to some problem prior to (re-)submitting your own.
Asking a classmate to see his or her solution to a problem before (re-)submitting your own.
Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.
Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.
Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.
Looking at another individual’s work during a quiz or test.
Paying or offering to pay an individual for work that you may submit as (part of) your own.
Providing or making available solutions to problems to individuals who might take this course in the future.
Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.
Searching for or soliciting outright solutions to problems online or elsewhere.
Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).
Submitting (after possibly modifying) the work of another individual beyond allowed snippets.
Submitting the same or similar work to this course that you have submitted or will submit to another.
Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.
Viewing another’s solution to a problem and basing your own solution on it.
Your work on this problem set will be evaluated along four axes primarily.
To what extent does your code implement the features required by our specification?
To what extent is your code consistent with our specifications and free of bugs?
To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?
To what extent is your code readable (i.e., commented and indented with variables aptly named)?
To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.
Curl up with some shorts to re-familiarize yourself with file I/O and recursion.
First, here’s Zamyla with recursion:
And Jason, with file I/O:
These two concepts are the basis of this problem, so make sure you understand how both work before proceeding further.
As usual, after logging into cs50.io, execute
within a terminal window to make sure your workspace is up-to-date.
at your prompt to ensure that you’re inside of
chapter4 (which is inside of
workspace which is inside of your home directory). Then execute
to unzip the file. Remove the ZIP file (remember how?), then proceed to execute
and you should see that the directory contains a number of files as well as directories. If you
cd into some of those directories as well, you’ll see each of them contain their own files and/or directories. Feel free to open some of the files up; save for
finder.c, you’ll see that they are each text files containing some strings. Looks like we’ve got a lot to work with here! Return to the finder directory by using
as needed, to "move up" one level back to the
finder directory, or go there directly by providing an absolute path as the argument to
and you’re ready to start.
We have seen a good number of terminal commmands so far in this course that have proved to be very useful. Some, like
wget we have used on nearly every single problem.
There are a multitude of other terminal commands, however, with which you may still be unfamiliar but that, once known, can prove to be very useful as well. One of these is called
grep allows a user to search any given input files, selecting lines that match one or more string patterns given. For example, if you were to run
grep "foo" bar.txt
or, if as here you are just searching for a single word, the quotation marks could be omitted:
grep foo bar.txt
The call would print to the terminal all lines in
bar.txt that contain the string
grep can also search through entire directories, rather than just individual files, searching every file in the directory for the requested string (even if the string is deeply nested several directory levels down from the starting location). This requires an extra
-r flag in the command. Can you guess what the
-r stands for?
That’s right, recursion! How fun.
In any case, executing
grep -r foo mydirectory
will print to the terminal a list of lines containing the string
foo found in all files within
mydirectory, including those files that might exist in subdirectories thereof. Thankfully, it also prints the name of the file containing the line, so you can easily find whatever it is you’re looking for.
You can imagine scenarios in which
grep comes in handy. Imagine having to look through thousands of files, and you can search through all of them for any particular string you’d like with one simple command. In fact, if a Mac OS user, you may be familiar with a similar GUI tool called "Finder". In this example, we’re searching our entire
Dropbox folder for the string "I took CS50":
In this problem, you will be implementing your own pared-down version of
grep. Once implemented, your program should output (in a separate file!) a list of all files containing the search string in whatever directory you’ve looked through.
Your version, unlike the native
grep utility, will only support searching through directories (and therefore the files contained within them); you will not be able to search only one individual file unless that file is the only one contained in the directory. We also won’t worry about finding the specific line number or printing the line once the string is found. In that sense, this implementation of
grep will behave somewhat more like the way "Finder" does.
Let’s dive into the distribution code for the
finder program, all found in
finder.c. Open up
finder.c and take a moment to scroll through it. Notice there are three functions contained in the program, one of which has been implemented for you. Let’s start from the top.
Atop the file you’ll see some cryptic
#defines. These simply give special instructions to the compiler to allow the program to work as intended, and you can ignore them.
Next up are some familiar header files, along with a new one we haven’t yet seen. More on that later.
Now, onto the interesting stuff! The next part of the file is a definition for a
structcontains two elements,
namefield will hold the filepath to a particular file or directory, relative to the directory
For example, there is a file inside of the
hello.txt, nested within severage directories.
namefor this particular file might be
./this/is/cs50/hello.txt. Make sure you understand why that is the case before moving on.
The type field will hold one of two strings: either "file" or "directory", depending on the type of the file stored at the filepath in
hello.txt, of course, would be a "file". Were our
./this/is/cs50, however, the
typein that case would be "directory".
Below that you will see another
structdefinition, this one for a type called
directory. This struct also has a field called
name, which will once again hold the relative filepath to the directory.
intthat will hold the number of files (both regular files and directories) found inside the directory.
At this point, we come across something new:
*character tells us that this variable is actually something called a pointer to a
path, rather than simply a
pathitself. We will go into more detail about pointers later in the course, but for now, treat this variable as nothing more than an array of
paths. The number of elements in the array is equal to
npaths, and each element of that array is itself a
path, an entry for a file in the directory (containing that file’s name and type). You can index into it as normal.
Finally, we see a global variable called
keywhich will hold the string we’d like to search for in our files, and two function declarations that will be covered below.
Recall what the variables
argv represent in terms of command line arguments. All
main will do in this program (except for a little bit of setup) is call
seek at the bottom; but we need to make sure the inputs to the function are valid before we can call it!
Your program should handle up to two command line arguments; the first, the search string, is mandatory. The second, the directory in which to search, is optional. For example, then, the following variations of command line inputs to seek for a string
foo are acceptable:
./finder foo bar/
Note that if the second command line argument is entered, it must end with a trailing slash, or the program will not work.
So before you call
seek at the bottom of
main, you must do the following:
Ensure you have the correct number of command line arguments, remembering that the user can input either just the string to search for or a string and a specific directory. Remember how? If the correct number of arguments have not been entered, you should print a helpful message to the terminal and return
Set the value of the global variable
namevalue of the directory struct
dirthat has been created for you. If the user has input a directory, you should set
nameto that string; else, the name variable should be set to your current directory, which you do by way of the string
And that’s all for
main! Let’s move on.
The entirety of the populate function has been written for you and contains some complex pieces, so no need to worry about understanding everything fully in this part - though you should understand the gist of what the function does. Let’s take a quick tour.
We first begin by simply initializing the values that will be used later on in the function. Note that
npathsis set to 0, and the value of our pointer
pathsis set to
The next piece of code involves a system call,
opendir, defined in the library
dirent.hwe saw at the top of the program. If the filepath contained in
dir.nameis valid, this function effectively opens that directory for reading, much like files are opened with the call
fopen(remember that from Jason’s short? If not, it might be a good idea to go back and review!).
readdiracts much like a call to either
fgets, and grabs each element in the directory one at a time. These directory entries contain the file names, types, and other information, as we would expect.
The next section of this function checks whether or not the directory entry is a file or a directory, and updates the
newPathis set to the relative filepath, and the
typefield is set to either "file" or "directory".
Of note in this section as well are the function calls
calloc, which are variations of a call named
malloc. You will become very familiar with
malloclater on in this course, but for now, understand that these calls explictly request and set aside for use some memory to hold the variables declared. In the future, you will be expected to free all memory you have allocated. Feel free to type
man realloc, or
man freeinto the terminal if curious for more information.
Finally, we close the directory we have opened. This is good practice and goes for files as well; in the
seekfunction below, make sure to close every file you open using
Darn! This one’s just a big
TODO. Here is where your coding skills and newfound expertise with file I/O and recursion will come into play. This function should eventually take a
directory structure as its parameter and iterate over all files in that directory (good thing you know how many files there are in each directory!), searching for the string stored in the global variable
key. If there exists a directory inside another, we must search everything inside that one too. Do you have a guess how? Let’s break it down.
Recall that our first call to
maintakes in either the directory the user has input at the command line or
./. But having merely a name for a directory does not do us much good; we need to populate it with information about what’s inside it first! Wonder if there’s a function that might help with that…
Once you have a full
directorystructure, you can begin iterating through the entries contained in it.
If the entry is of type "file", you should open it up to read from (remember how?). Then you will want to begin scanning through the file searching for the string contained in
key. You will likely find the functions
strstrto be of use to you here (declaring in
string.h, respectively). If using
fgets, it is worth noting that the maximum length of a string in the distribution text files is 50 characters long, though you can use a larger value if you’d like.
If indeed you successfully found
keyinside the file, open an outfile called
found.txtfor appending, not writing. (This way every time you open and write to the file, you will not overwrite any file names already stored in there.)
Copy the name of the file where you found the string into
found.txt. You should additionally add a newline character at the end to ensure separation of each file name.
Additionally, you will only want to write a file name into your outfile once, even if the string is found in multiple locations within that same file, so perhaps you can devise an easy way keep track of whether or not you’ve already output that file.
Ensure you close every file you open, but also ensure that if you try to close a file, it has indeed been opened. Some error checking around
fopenshould help you here.
If, on the other hand, the entry is of type
directory, you should create a new
directorystructure, populate it, and set the
namefield accordingly. You then will need to iterate through this new directory you’ve created in the same way you’ve iterated through the others. Think recursively!
Once you’ve iterated through everything, you should return
0at the end of
seekto indicate success.
Phew, we made it. If unsure of how to start, review the shorts above! And as always, it helps to break it down into pieces. Handle one case (file or directory) first, and then the other.
You should also feel free to test your code by creating new directories and files containing any strings your heart desires, and searching for those strings to see if your program can find them correctly.
If you’d like to play around with the staff’s solution, execute
That program will generate a file in the same directory as
staff_found.txt. If you open the file and look at it, you’ll see the names of all the files (with file paths relative to the directory
finder.c is in) that contain the search string you sought out.
Update your IDE:
Be sure that each of your files are in
~/workspace/chapter4/finder, as with:
cd ~/workspace/chapter4/finder ls
If any file is not in
~/workspace/chapter4/finder, move it into that directory, as via
mv (or via CS50 IDE’s lefthand file browser).
cd ~/workspace/chapter4/finder submit50 cs50/2017/ap/finder
inputting your GitHub username and GitHub password as prompted.
If you run into any trouble, email email@example.com!
You may resubmit any problem as many times as you’d like.
Your submission should be graded for correctness within 2 minutes, at which point your score will appear at cs50.me!
This was Finder.
freein your code, and therefore will be okay with your program suffering from memory leaks.