Problem: Sort Race

Questions? Feel free to head to CS50 on Reddit, CS50 on StackExchange, the #cs50ap channel on CS50x Slack (after signing up), or the CS50 Facebook group.

Objectives

  • Implement several O(n2) sorting algorithms.

  • Experience principles of abstraction.

  • Write code that must conform into an existing framework.

  • Experience how real-world run time of algorithms differs from theoretical run time.

Academic Honesty

This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.

Reasonable

  • Communicating with classmates about problems in English (or some other spoken language).

  • Discussing the course’s material with others in order to understand it better.

  • Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.

  • Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.

  • Reviewing past years' quizzes, tests, and solutions thereto.

  • Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.

  • Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.

  • Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.

  • Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.

  • Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

Not Reasonable

  • Accessing a solution to some problem prior to (re-)submitting your own.

  • Asking a classmate to see his or her solution to a problem before (re-)submitting your own.

  • Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.

  • Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.

  • Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.

  • Looking at another individual’s work during a quiz or test.

  • Paying or offering to pay an individual for work that you may submit as (part of) your own.

  • Providing or making available solutions to problems to individuals who might take this course in the future.

  • Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.

  • Searching for or soliciting outright solutions to problems online or elsewhere.

  • Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).

  • Submitting (after possibly modifying) the work of another individual beyond allowed snippets.

  • Submitting the same or similar work to this course that you have submitted or will submit to another.

  • Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.

  • Viewing another’s solution to a problem and basing your own solution on it.

Assessment

Your work on this problem set will be evaluated along four axes primarily.

Scope

To what extent does your code implement the features required by our specification?

Correctness

To what extent is your code consistent with our specifications and free of bugs?

Design

To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?

Style

To what extent is your code readable (i.e., commented and indented with variables aptly named)?

To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.

Getting Ready

First, refresh your memory on the O(n2) sorting algorithms we’ve learned about so far in the course. Here are Jackson and Tommy to explain:

Before moving on, be sure you’re comfortable answering the following questions:

  • Why is bubble sort in O(n2)?

  • Why is insertion sort in Ω(n)?

  • How does selection sort work?

Getting Started

As always, first log into your CS50 IDE at cs50.io and execute

update50

within a terminal window to make sure your workspace is up-to-date. Next, execute

cd ~/workspace/chapter3

at your prompt to ensure that you’re inside of the chapter3 directory within your workspace directory. Then execute

wget http://docs.cs50.net/2016/ap/problems/race/race.zip

to download a ZIP of this problem’s distro into your workspace. You should see a bunch of output followed by:

'race.zip' saved

Confirm that you’ve indeed downloaded race.zip by executing

ls

and then run

unzip race.zip

to unzip the file. If you then run ls again, you should see that you have a newly unzipped directory called race as well. You can now delete the ZIP, with:

rm -f race.zip

Careful! We’ve added the -f flag this time, so rm will not confirm that you want to delete the file. If you like the comfort of having the system double-check with you, just omit -f from your command. Lastly, execute

cd race

followed by

ls

and you should see that the directory contains four files:

Makefile  helpers.c  helpers.h  race.o

Off we go!

Object Orienting

In this problem, you’ll be racing the three O(n2) sorting algorithms we’ve seen under a few different test conditions, to see how they perform against one another. Those test conditions will be:

  • arrays that are almost sorted, with two elements out of place,

  • arrays in reverse order,

  • arrays in completely random order, and

  • arrays that are already sorted.

The good news is that you don’t have to implement anything involving populating the arrays! You only have to do a tiny amount of command-line validation and implement the three sorting algorithms themselves.

But we’re getting a bit ahead of ourselves. First we need to deal with the contents of the directory you just unzipped. Have a peek at the Makefile we’ve prepared for you. In particular, focus on this portion.

race: race.o helpers.c helpers.h
	clang -ggdb3 -O0 -std=c11 -Wall -Werror -o race race.o helpers.c -lcs50 -lm

Per the dependencies implied above (after the colon), any changes to either race.o, helpers.c, or helpers.h will compel make to rebuild race the next time it’s invoked for this target. In other words, this means that race is not simply comprised of a single source file, but rather of three separate files. helpers.c and helpers.h you probably can figure out. But what the heck is race.o? A refresher on the compilation process might be in order here, first. Take it away, Rob:

Interesting…​ so it turns out that race.o contains the object code that was generated from a source file (presumably called race.c) that we wrote and partially compiled, but then chose not to include in the distro. We made this choice on the one hand because the code therein is a bit complicated at this stage of the course (thus allowing us to abstract some of the complex detail), but further because it provides you with an opportunity to write code that simply must conform to a precise specification, or it won’t work!

It turns out that race.o contains the object code for, among other things, main. And if you can’t change main then you can’t change the way that main calls any functions, including the functions you’ll be tasked with writing in this problem. Bummer!

If this seems unfair, know that it’s also a really good indicator of what you’ll experience in the real world if you continue with programming as a career. Often times large groups of people collaborate on a single project, and there are standards and specifications that must be adhered to so that everyone’s components interoperate smoothly. Veering from those standards will mean your code is incompatible with the project at large, which will mean you will have wasted some of your valuable time (and probably irritated your colleagues, too)!

Incidentally, because it is not a so-called "target" specified in the Makefile, if when working on this problem you (inadvertently or intentionally) try to

make helpers

you’ll actually default to using the standard Makefile included with CS50 IDE[1] which will just try to compile the helpers.c file alone into its own program. Problem is, if you open up helpers.c, there’s no main function, so you’ll probably get a whole bunch of cryptic error messages concluding with one along these lines:

/usr/bin/../lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'

So do be sure that whenever you try to compile this program, you do so with

make race

or, in fact, because of the all target specified in Makefile, you could also just

make all

or, in fact, because race is the first target listed in Makefile and absent any other command-line arguments supplied to make it will simply default to compiling the first target listed in the Makefile, you can even say just

make

Convenient, eh?

The Race Begins

All of the work you’ll be doing in this problem will be confined to helpers.c and possibly helpers.h. In particular, you have to implement the four functions prototyped therein: check_flag, bubble, selection, and insertion.

check_flag

If you try to compile race from the distro and run it without any command line arguments, you’re immediately notified of the proper usage of the program.

Usage: ./race array-type size

And then, if you supply it with three command line arguments, regardless of what those arguments are, you’ll see the following:

Invalid array-type. Must be -a, -b, -r, or -s

Why? Because right now if you have a look at check_flag in helpers.c, you’ll see that it always returns false. But eventually what check_flag should do, per the comment atop its prototype, is return true if the argument passed in (which happens to be argv[1]) is -a, -b, -r, or -s, and return false if the argument passed in is anything other than that.

Does the format of those strings remind you of anything you’ve seen recently? Recall the command we recommended you use to get rid of race.zip above:

rm -f race.zip

In this case, we would term -f a flag, which is just another way of describing a command-line argument to a particular program or Linux command (in this case, rm) that modifies the behavior of that program. In the case of -f and rm, that flag tells rm to not confirm with you whether you intend to delete the file(s) in question; it just deletes them right away.

So all check_flag is doing is confirming whether argv[1] is one of those four things. If doesn’t report out which one it is, just that it’s one of them. Odds are there’s a function that might help with checking that.

Incidentally, what do these four flags represent? They determine what type of array will be the test case for the various sorting algorithms:

  • -a for almost sorted arrays. These arrays are already sorted except for two elements which have been randomly switched.

  • -b for backwards arrays. These arrays are sorted, but in reverse order: left-to-right, largest-to-smallest.

  • -r for random arrays. These arrays have no particular order.

  • -s for sorted arrays. These arrays are already properly sorted in order from left-to-right, smallest-to-largest.

Again, you needn’t worry about implementing the functionality of populating the arrays. That was done by us, and the object code resulting from that implementation lives in race.o.

bubble, selection, and insertion

In the functions bubble, selection, and insertion you will be implementing, respectively, bubble sort, selection sort, and insertion sort. Remember that you are not allowed to modify the prototypes of bubble, selection, or insertion, but you are welcome to create any additional "helper" functions that you wish, placing their prototypes in helpers.h and their definitions in helpers.c.

We’re not going to give you much more than that! But do make sure to implement all three algorithms which, per the shorts atop this specification, behave quite differently even though all three have the same ultimate result. As a tip, you may want to start with fairly small size (aka argv[2]) arguments, and you may want to do some debugging to ensure that your sorting algorithms are actually sorting the array properly.

Showcase

Once you’ve implemented check_flag, if you try to run your program you’ll see that when it runs you get output like the following:

bubble sort benchmark:         0.000 seconds
selection sort benchmark:      0.000 seconds
insertion sort benchmark:      0.000 seconds

So this is where the "race" really happens. Time to see which of these algorithms is the fastest. But…​ wait? Aren’t they all O(n2) algorithms? Shouldn’t they all run at exactly the same speed? Well, not exactly.

Theoretically, as n gets larger and larger, yes, these three algorithms will tend to run at closer and closer speeds. But theoretical runtime is not the same as real-world runtime, and so under average and varying test conditions, the performance of these three algorithms will differ, sometimes substantially. For example, see the below wherein underlined text represents user input to the program.

~/workspace/chapter3/race $ ./race -b 2000
bubble sort benchmark:         0.012 seconds
selection sort benchmark:      0.008 seconds
insertion sort benchmark:      0.004 seconds

Not too much of a difference. But what about

~/workspace/chapter3/race $ ./race -r 100000
bubble sort benchmark:         52.032 seconds
selection sort benchmark:      28.944 seconds
insertion sort benchmark:      8.808 seconds

Ouch! Or lastly

username@ide50:~/workspace/unit3/race $ ./race -s 20000
bubble sort benchmark:         0.000 seconds
selection sort benchmark:      0.592 seconds
insertion sort benchmark:      0.000 seconds

Hmm…​ selection sort still took that much time to "sort" an already-sorted array? Is the difference between Ω(n2) and Ω(n) now a bit more clear?

And know that because of varying processor performance and system load, under otherwise-identical conditions from run-to-run the running time of these algorithms may vary somewhat.

check50 is not capable of detecting whether you are implementing bubble, selection, or insertion sort correctly. It is only capable of determining whether your output is indeed sorted. Because the crux of this problem lies in implementing these sorts correctly, we leave it to you (and GDB!) to ensure that your three functions are implemented properly.

To run the staff solution, simply execute:

~cs50/chapter3/race

passing in appropriate command-line arguments.

In no part of this problem are you expected to optimize your runtimes for any of these algorithms (beyond, of course, implementing them correctly). Rather, after you get them implemented you should test different arrays of different sizes and different configurations to see under which circumstances each algorithm "shines". So you can see actual differences between these algorithms, we recommend that your size argument be at least 1000, as that way they’ll tend to take at least a few thousandths of a second to sort.

This was Sort Race.


1. The existence of this file is why your programs in Units 1 and 2 compiled with no trouble despite not having a Makefile in the directory.