Problem: Find

Questions? Feel free to head to CS50 on Reddit, CS50 on StackExchange, the #cs50ap channel on CS50x Slack (after signing up), or the CS50 Facebook group.

tl;dr

Implement a program that finds a number among numbers, per the below.

$ ./generate 1000 | ./find 42
Didn't find needle in haystack.

This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.

Reasonable

Communicating with classmates about problems in English (or some other spoken language).
Discussing the course’s material with others in order to understand it better.
Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.
Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.
Reviewing past years' quizzes, tests, and solutions thereto.
Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.
Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.
Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.
Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.
Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

Not Reasonable

Accessing a solution to some problem prior to (re-)submitting your own.
Asking a classmate to see his or her solution to a problem before (re-)submitting your own.
Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.
Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.
Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.
Looking at another individual’s work during a quiz or test.
Paying or offering to pay an individual for work that you may submit as (part of) your own.
Providing or making available solutions to problems to individuals who might take this course in the future.
Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.
Searching for or soliciting outright solutions to problems online or elsewhere.
Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).
Submitting (after possibly modifying) the work of another individual beyond allowed snippets.
Submitting the same or similar work to this course that you have submitted or will submit to another.
Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.
Viewing another’s solution to a problem and basing your own solution on it.

Assessment

Your work on this problem set will be evaluated along four axes primarily.

Scope: To what extent does your code implement the features required by our specification?
Correctness: To what extent is your code consistent with our specifications and free of bugs?
Design: To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?
Style: To what extent is your code readable (i.e., commented and indented with variables aptly named)?

To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.

Getting Ready

Lots of recommended videos for you for this problem, although a few of them you’ve seen before and should feel free to skip over. Chances are at least a few of these will come in handy.

First up, sorting algorithms with Jackson and Tommy:

And then a discussion of linear and binary search with Patrick (don’t worry too much about when Patrick turns the discussion toward binary search trees in the second half of the binary search video… we’ll get there soon enough, though!):

Getting Started

Log into cs50.io and execute

update50

within a terminal window to make sure your workspace is up-to-date.

Then, execute

cd ~/workspace/chapter3

at your prompt to ensure that you’re inside of chapter3 (which is inside of workspace which is inside of your home directory). Then execute

wget http://docs.cs50.net/2017/ap/problems/find/find.zip

to download a ZIP of this problem’s distro into your workspace (with a command-line program called wget). You should see a bunch of output followed by:

'find.zip' saved

Confirm that you’ve indeed downloaded find.zip by executing

ls

and then run

unzip find.zip

to unzip the file. If you then run ls again, you should see that you have a newly unzipped directory called find as well. You can now delete the ZIP, with:

rm find.zip

confirming your intent to delete that file, then proceed to execute

cd find

followed by

ls

and you should see that the directory contains five files:

Makefile  generate.c  helpers.c  helpers.h  find.c

Understanding

Implemented in generate.c is a program that uses a "pseudorandom-number generator" (via a function called drand48) to generate a whole bunch of random (well, pseudorandom, since computers can’t actually generate truly random) numbers, one per line, each of which is in [0, LIMIT), where LIMIT is a constant defined within the file, so to speak. That is, each is greater than or equal to 0 and less than LIMIT.

Go ahead and compile this program by executing the command below.

make generate

Now run the program you just compiled by executing the command below.

./generate

You should be informed of the program’s proper usage, per the below.

Usage: generate n [s]

As this output suggests, this program expects one or two command-line arguments. The first, n, is required; it indicates how many pseudorandom numbers you’d like to generate. The second, s, is optional, as square brackets imply; if supplied, it represents the value that the pseudorandom-number generator should use as its "seed." A seed is simply an input to a pseudorandom-number generator that influences its outputs. For instance, if you seed drand48 by first calling srand48 (another function whose purpose is to "seed" drand48) with an argument of, say, 0, and then call drand48 itself three times, drand48 might return 0.170828, then 0.749902, then 0.096372. But if you instead seed drand48 by first calling srand48 with an argument of, say, 1, and then call drand48 itself three times, drand48 might instead return 0.041630, then 0.454492, then 0.834817. But if you re-seed drand48 by calling srand48 again with an argument of 0, the next three times you call drand48, you’ll again get 0.170828, then 0.749902, then 0.096372! See, not so random.

Go ahead and run this program again, this time with a value of, say, 10 for n, as in the below; you should see a list of 10 pseudorandom numbers.

./generate 10

Run the program a third time using that same value for n; you should see a different list of 10 numbers. Now try running the program with a value for s too (e.g., 0), as in the below.

./generate 10 0

Now run that same command again:

./generate 10 0

Bet you saw the same "random" sequence of ten numbers again? Yup, that’s what happens if you don’t vary a pseudorandom number generator’s initial seed.

Now take a look at generate.c itself. (Remember how?) Comments atop that file explain the program’s overall functionality. But it looks like we forgot to comment the code itself. Read over the code carefully until you understand each line and then comment our code for us, replacing each TODO with a phrase that describes the purpose or functionality of the corresponding line(s) of code. (Know that an unsigned int is just an int that cannot be negative.) And for more details on drand48 and srand48, recall that you can execute:

man drand48

and:

man srand48

Once done commenting generate.c, re-compile the program to be sure you didn’t break anything by re-executing the command below.

make generate

If generate no longer compiles properly, take a moment to fix what you broke!

Now, recall that make automates compilation of your code so that you don’t have to execute clang manually along with a whole bunch of switches. Notice, in fact, how make just executed a pretty long command for you, per the tool’s output. However, as your programs grow in size, make won’t be able to infer from context anymore how to compile your code; you’ll need to start telling make how to compile your program, particularly when they involve multiple source (i.e., .c) files. And so we’ll start relying on "Makefiles," configuration files that tell make exactly what to do.

How did make know how to compile generate in this case? It actually used a configuration file that we wrote. Go ahead and look at the file called Makefile that’s in the same directory as generate.c. This Makefile is essentially a list of rules that we wrote for you that tells make how to build generate from generate.c for you. The relevant lines appear below.

generate:
    clang -ggdb3 -O0 -std=c11 -Wall -Werror -o generate generate.c

The first line tells make that the "target" called generate should be built by invoking the second line’s command. Know that the leading whitespace on that second line is not a sequence of spaces but, rather, a tab. Unfortunately, make requires that commands be preceded by tabs, so be careful not to change them to spaces, else you may encounter strange errors! The -Werror flag, recall, tells clang to treat warnings (bad) as though they’re errors (worse) so that you’re forced (in a good, instructive way!) to fix them.

Now take a look at find.c. Notice that this program expects a single command-line argument: a "needle" to search for in a "haystack" of values. Once done looking over the code, go ahead and compile the program by executing the command below.

make find

Notice, per that command’s output, that make actually executed the below for you.

clang -ggdb3 -O0 -std=c11 -Wall -Werror -o find find.c helpers.c -lcs50 -lm

Notice further that you just compiled a program comprising not one but two .c files: helpers.c and find.c. How did make know what to do? Well, again, open up Makefile to see the man behind the curtain. The relevant lines appear below.

find: find.c helpers.c helpers.h
    clang -ggdb3 -O0 -std=c11 -Wall -Werror -o find find.c helpers.c -lcs50 -lm

Per the dependencies implied above (after the colon), any changes to find.c, helpers.c, or helpers.h will compel make to rebuild find the next time it’s invoked for this target.

Go ahead and run this program by executing, say, the below.

./find 13

You’ll be prompted to provide some hay (i.e., some integers), one "straw" at a time. As soon as you tire of providing integers, hit ctrl-d to send the program an EOF (end-of-file) character. That character will compel get_int from the CS50 Library to return INT_MAX, a constant that, per find.c, will compel find to stop prompting for hay. The program will then look for that needle in the hay you provided, ultimately reporting whether the former was found in the latter. In short, this program searches an array for some value. At least, it should, but it won’t find anything yet! That’s where you come in. More on your role in a bit.

In turns out you can automate this process of providing hay, though, by "piping" the output of generate into find as input. For instance, the command below passes 1,000 pseudorandom numbers to find, which then searches those values for 42.

./generate 1000 | ./find 42

Note that, when piping output from generate into find in this manner, you won’t actually see generate's numbers, but you will see find's prompts.

Alternatively, you can "redirect" generate's output to a file with a command like the below.

./generate 1000 > numbers.txt

You can then redirect that file’s contents as input to find with the command below.

./find 42 < numbers.txt

Let’s finish looking at that Makefile. Notice the line below.

all: find generate

This target implies that you can build both generate and find simply by executing the below.

make all

Even better, the below is equivalent (because make builds a Makefile's first target by default).

make

If only you could whittle this whole problem set down to a single command! Finally, notice these last lines in Makefile:

clean:
    rm -f *.o a.out core find generate

This target allows you to delete all files ending in .o or called core (more on that soon!), find, or generate simply by executing the command below.

make clean

Be careful not to add, say, *.c to that last line in Makefile! (Why?)

Notice now that, in find.c, main calls search, a function declared in helpers.h. Unfortunately, we forgot to implement that function fully in helpers.c! Indeed, take a peek at helpers.c, and you’ll see that search always returns false, whether or not value is in values. If you do plan on using additional functions in helpers.c, be sure to include the function prototypes in helpers.c ^[1].To be sure, we could have put the contents of helpers.h and helpers.c in find.c itself. But it’s sometimes better to organize programs into multiple files, especially when some functions are essentially "utility functions" that might later prove useful to other programs as well, much like those in the CS50 Library.

Notice too, per helpers.h, that the prototype for search is:

bool search(int value, int values[], int n);

And the prototype for sort is:

void sort(int values[], int n);

Both functions take an array, values, as one of their arguments as well as an integer, n, the size of that array. That’s because, when passing an array to a function, you have to pass in its size separately; you can’t infer an array’s size from the array itself.

Specification

Complete the implementation of find by completing the implementation of search and sort in helpers.c.

`search`

Your implementation must return false immediately if n is non-positive.
Your implementation must return true if value is in values and false if value is not in values.
The running time of your implementation must be in O(log n).
You may not alter the function’s declaration. Its prototype must remain:
```
bool search(int value, int values[], int n);
```

`sort`

Your implementation must sort, from smallest to largest, the array of numbers that it’s passed.
Assume that each of the array’s numbers will be non-negative and less than 65,536. But the array might contain duplicates.
The running time of your implementation must be in O(n), where n is the array’s size. Yes, linear! Keep in mind that 65,536 is a constant.
You may not alter the function’s declaration. Its prototype must remain:
```
void sort(int values[], int n);
```

Walkthroughs

search

sort

Usage

Your program should behave per the examples below. Assumed that the underlined text is what some user has typed. (^d represents the ctrl-d character described above)

$ ./find 42
50
43
^d
Didn't find needle in haystack.

$ ./find 42
50
42
^d
Found needle in haystack!

Testing

When ready to check the correctness of your program, try running the command below.

./generate 1000 50 | ./find 127

Because one of the numbers outputted by generate, when seeded with 50, is 127, your code should find that "needle"! By contrast, try running the command below as well.

./generate 1000 50 | ./find 128

Because 128 is not among the numbers outputted by generate, when seeded with 50, your code shouldn’t find that needle. Best to try some other tests as well, as by running generate with some seed, taking a look at its output, then piping that same output to find, looking for a "needle" you know to be among the "hay".

Incidentally, note that main in find.c is written in such a way that find returns 0 if the needle is found, else it returns 1. You can check the so-called "exit code" with which main returns by executing

echo $?

after running some other command. For instance, assuming your implementation of search is correct, if you run

./generate 1000 50 | ./find 127
echo $?

you should see 0, since 127 is, again, among the 1,000 numbers outputted by generate when seeded with 50, and so search (written by you) should return true, in which case main (written by us) should return (i.e., exit with) 0. By contrast, assuming your implementation of search is correct, if you run

./generate 1000 50 | ./find 128
echo $?

you should see 1, since 128 is, again, not among the 1,000 numbers outputted by generate when seeded with 50, and so search (written by you) should return false, in which case main (written by us) should return (i.e., exit with) 1. Make sense?

`check50`

check50 cs50/2017/ap/find/more

Staff’s Solution

~cs50/pset3/find

Hints

Before you implement search in O(log n) time, you might want to implement it temporarily in O(n) time, as with linear search, if only because it’s a bit easier to get right. That way, you can move on to sort, knowing that search already works. And once sort works, you can go back and re-implement search in O(log n) time, as with binary search. Just remember to!

Ultimately, you are welcome to implement search iteratively (with a loop) or recursively (wherein a function calls itself). If you pursue the latter, though, know that you may not change our declaration of search, but you may write a new, recursive function (that perhaps takes different parameters) that search itself calls.

We leave it to you to determine how best to test your implementation of search and sort. But don’t forget that eprintf is your friend while debugging! And don’t forget that you can generate the same sequence of pseudorandom numbers again and again by explicitly specifying generate's seed.

How to Submit

Step 1 of 2

Ensure that find.c is in ~/workspace/chapter3/find, as with:

cd ~/workspace/chapter3/find
ls

If find.c is not in ~/workspace/chapter3/find, move it into that directory, as via mv (or via CS50 IDE’s lefthand file browser).

Step 2 of 2

To submit find, execute
```
cd ~/workspace/chapter3/find
submit50 cs50/2017/ap/find/more
```
If you run into any trouble, email sysadmins@cs50.harvard.edu!

You may resubmit any problem as many times as you’d like before the deadline.

Your submission should be graded for correctness within 2 minutes, at which point your score will appear at cs50.me

This was Find.

FAQs

None so far! Reload this page periodically to check if any arise!

Changelog

2017-03-09
- Clarified usage
2016-09-21
- Update to search walkthrough
2016-09-17
- Corrected "non-negative" to "non-positive."
- Clarified that the prototype of search cannot be alterered either.
- Removed incorrect mention of O(n²).
2016-09-16
- Initial release.

1. This isn’t necessarily the best practice; you’d typically have all your prototypes in helpers.h, but since check50 references our own helpers.h, for checking purposes, you’ll have to include the prototypes in helpers.c, which is passed in directly when running check50