Problem: Mispellings (Part 2)
Implementation of a complex data structure.
Pages 18 – 20, 27 – 30, 33, 36, and 37 of http://www.howstuffworks.com/c.htm.
Chapter 26 of Absolute Beginner’s Guide to C.
Chapter 17 of Programming in C.
This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.
The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.
Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.
Communicating with classmates about problems in English (or some other spoken language).
Discussing the course’s material with others in order to understand it better.
Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.
Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.
Reviewing past years' quizzes, tests, and solutions thereto.
Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.
Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.
Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.
Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.
Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.
Accessing a solution to some problem prior to (re-)submitting your own.
Asking a classmate to see his or her solution to a problem before (re-)submitting your own.
Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.
Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.
Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.
Looking at another individual’s work during a quiz or test.
Paying or offering to pay an individual for work that you may submit as (part of) your own.
Providing or making available solutions to problems to individuals who might take this course in the future.
Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.
Searching for or soliciting outright solutions to problems online or elsewhere.
Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).
Submitting (after possibly modifying) the work of another individual beyond allowed snippets.
Submitting the same or similar work to this course that you have submitted or will submit to another.
Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.
Viewing another’s solution to a problem and basing your own solution on it.
Your work on this problem set will be evaluated along four axes primarily.
To what extent does your code implement the features required by our specification?
To what extent is your code consistent with our specifications and free of bugs?
To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?
To what extent is your code readable (i.e., commented and indented with variables aptly named)?
To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.
Let’s brush up on a few recent topics before diving in. First, join Jackson and Lauren for tours of singly linked lists and hash tables.
Then, brush up your
valgrind knowledge with Nate!
If still desiring more practice with linked lists or hash tables, Doug is here to help!
Below are two options for getting started with this problem. The first option is for those who wish to start with the staff’s implementation of the features from Problem 5-5 already implemented for them. The second option is for those who wish to complete their own implementation of the spell checker, relying on their own solution to the first part of the problem as their foundation. Only choose one of the below two options.
Regardless, log into CS50 IDE and, in a terminal window, execute
to ensure that your workspace is up-to-date!
In your terminal window,
cd into the directory where you have been doing your work for the problems in Chapter A, then execute:
in order to download a ZIP (i.e., compressed version) of this problem distro. Unzip the file and navigate to the newly-created directory. You should see the below inside:
Makefile dictionaries/ dictionary.c dictionary.h keys/ questions.txt speller.c staff.o texts/
We’ve switched the contents of staff.o and dictionary.c from the previous problem, as you now have to implement the other half!
Take the same steps as in Option 1, and make sure you are in the
spelling directory. Execute the following two commands in order:
rm -f staff.o cp ../mispellings/dictionary.c pt1.c
Then open up the
Makefile and change the
SRCS line to read:
SRCS = speller.c dictionary.c pt1.c
And change the line under
rm -f core $(EXE) *.o
Making sure not to delete the TAB character at the front of that line. Now you can simply extend your spell checker!
Alright, the challenge now before you is to implement
size as efficiently as possible, in such a way that
TIME IN load,
TIME IN check,
TIME IN size, and
TIME IN unload are all minimized. To be sure, it’s not obvious what it even means to be minimized, inasmuch as these benchmarks will certainly vary as you feed
speller different values for
dictionary and for
text. But therein lies the challenge, if not the fun, of this problem set. This problem set is your chance to design and optimize for the real world! Although we also invite you to minimize space, your ultimate enemy is time. But before you dive in, some specifications from us.
You may not alter
You may alter
dictionary.c(and, in fact, must in order to complete the implementations of
size), but you may not alter the declarations of
You may alter
dictionary.h, but you may not alter the declarations of
unload. You may alter the declaration of
You may alter
You may add functions to
dictionary.cor to files of your own creation so long as all of your code compiles via
Your implementation of
checkmust be case-insensitive. In other words, if
foois in dictionary, then
checkshould return true given any capitalization thereof; none of
FOOshould be considered misspelled.
Capitalization aside, your implementation of
checkshould only return
truefor words actually in
dictionary. Beware hard-coding common words (e.g.,
the), lest we pass your implementation a
dictionarywithout those same words. Moreover, the only possessives allowed are those actually in
dictionary. In other words, even if
foo’sis not also in
You may assume that
checkwill only be passed strings with alphabetical characters and/or apostrophes.
You may assume that any
dictionarypassed to your program will be structured exactly like ours, lexicographically sorted from top to bottom with one word per line, each of which ends with
\n. You may also assume that
dictionarywill contain at least one word, that no word will be longer than
LENGTH(a constant defined in
dictionary.h) characters, that no word will appear more than once, and that each word will contain only lowercase alphabetical characters and possibly apostrophes.
Your spell-checker may only take
dictionaryas input. Although you might be inclined (particularly if among those more comfortable) to "pre-process" our default dictionary in order to derive an "ideal hash function" for it, you may not save the output of any such pre-processing to disk in order to load it back into memory on subsequent runs of your spell-checker in order to gain an advantage.
You MAY alter
hashand may look up hash functions on the internet, but make sure to cite your source!
Alright, ready to go?
Allow us to suggest that you whip up some dictionaries smaller than the 143,091-word default with which to test your code during development. And here’s Zamyla with some additional guidance:
If you planned ahead (and you should have), this one is easy! Hint: check the globals!
Take a look at
hash. It’s actually a pretty bad hash function, all things considered. Feel free to do some research into better hash functions, and replace it, though do take care to cite your source. Also note that there is a defined constant (
BUCKETS) that you should probably change if you alter the hash function.
As before, be sure that your spell-checker doesn’t leak any memory at all. Know that
valgrind watches for leaks while your program is actually running, so be sure to provide command-line arguments if you want
valgrind to analyze
speller while you use a particular
dictionary and/or text, as in the below.
valgrind --leak-check=full ./speller texts/austinpowers.txt
If you run
valgrind without specifying a
speller, your implementations of
unload won’t actually get called (and thus analyzed).
And don’t forget about your other good buddy,
How to check whether your program is outting the right misspelled words? Well, you’re welcome to consult the "answer keys" that are inside of the
keys directory that’s inside of your
spelling directory. For instance, inside of
keys/austinpowers.txt are all of the words that your program should think are misspelled.
You could therefore run your program on some text in one window, as with the below.
And you could then run the staff’s solution on the same text in another window, as with the below.
And you could then compare the windows visually side by side. That could get tedious quickly, though. So you might instead want to "redirect" your program’s output to a file (just like you may have done with
generate in Problem Set 3), as with the below.
./speller texts/austinpowers.txt > student.txt ~cs50/pset5/speller texts/austinpowers.txt > staff.txt
You can then compare both files side by side in the same window with a program like
diff, as with the below.
diff -y student.txt staff.txt
Alternatively, to save time, you could just compare your program’s output (assuming you redirected it to, e.g.,
student.txt) against one of the answer keys without running the staff’s solution, as with the below.
diff -y student.txt keys/austinpowers.txt
If your program’s output matches the staff’s,
diff will output two columns that should be identical except for, perhaps, the running times at the bottom. If the columns differ, though, you’ll see a
| where they differ. For instance, if you see
MISSPELLED WORDS MISSPELLED WORDS FOTTAGE FOTTAGE INT INT > EVIL'S s s > EVIL'S Farbissina Farbissina
that means your program (whose output is on the left) does not think that
EVIL’s is misspelled, even though the staff’s output (on the right) does, as is implied by the absence of
EVIL’s in the lefthand column and the presence of
EVIL’s in the righthand column.
To test your code less manually (though still not exhaustively), you may also execute the below.
check50 2015.fall.pset5.speller dictionary.c dictionary.h staff.o Makefile
check50 does not check for memory leaks, so be sure to run
valgrind as prescribed as well.
How to assess just how fast (and correct) your code is? Well, as always, feel free to play with the staff’s solution, as with the below, and compare its numbers against yours.
This was Misspellings (Part 2).