# Problem: Mispellings[1] (Part 1)

## Objectives

• Navigate a data structure of someone else’s design.

• Plug the memory leaks.

This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.

### Reasonable

• Communicating with classmates about problems in English (or some other spoken language).

• Discussing the course’s material with others in order to understand it better.

• Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.

• Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.

• Reviewing past years' quizzes, tests, and solutions thereto.

• Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.

• Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.

• Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.

• Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.

• Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

### Not Reasonable

• Accessing a solution to some problem prior to (re-)submitting your own.

• Asking a classmate to see his or her solution to a problem before (re-)submitting your own.

• Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.

• Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.

• Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.

• Looking at another individual’s work during a quiz or test.

• Paying or offering to pay an individual for work that you may submit as (part of) your own.

• Providing or making available solutions to problems to individuals who might take this course in the future.

• Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.

• Searching for or soliciting outright solutions to problems online or elsewhere.

• Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).

• Submitting (after possibly modifying) the work of another individual beyond allowed snippets.

• Submitting the same or similar work to this course that you have submitted or will submit to another.

• Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.

• Viewing another’s solution to a problem and basing your own solution on it.

## Assessment

Your work on this problem set will be evaluated along four axes primarily.

Scope

To what extent does your code implement the features required by our specification?

Correctness

To what extent is your code consistent with our specifications and free of bugs?

Design

To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?

Style

To what extent is your code readable (i.e., commented and indented with variables aptly named)?

To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.

Let’s brush up on a few recent topics before diving in. First, join Jackson and Lauren for tours of singly linked lists and hash tables.

Then, brush up your `valgrind` knowledge with Nate!

If still desiring more practice with linked lists or hash tables, Doug is here to help!

## Getting Started

``update50``

to ensure that your workspace is up-to-date! Next, navigate to the location in your IDE where you are writing code for Chapter A and execute the below:

``wget http://docs.cs50.net/2016/ap/problems/mispellings/mispellings.zip``

If you unzip the file and navigate inside the directory, you’ll see that it contains quite a few things!

``Makefile  dictionaries/  dictionary.c  dictionary.h  keys/  questions.txt  speller.c  staff.o  texts/``

The challenge ahead of you is to complete our already-started implementation of a spell-checker!

In `speller.c`, we’ve put together a program that’s designed to spell-check a file after loading a dictionary of words from disk into memory. Unfortunately, we didn’t get around to implementing the checking part. That (and a bit more) we leave to you!

Before we walk you through `speller.c`, go ahead and open up `dictionary.h`. Declared in that file are some functions; take note of what each should do. Now open up `dictionary.c`. Notice that we’ve implemented some of those functions, but only barely, just enough for this code to compile (the others, for now, are implemented completely in staff.o, you’ll have to write those in the second part of this problem).

Your job for now is to re-implement those functions as cleverly as possible so that this spell-checker works as advertised, based on the information we’ve left behind for you as to how this data structure is organized under the hood.

Let’s get you started.

### Making a Makefile

Recall that `make` automates compilation of your code so that you don’t have to execute `clang` manually along with a whole bunch of switches. However, as your programs grow in size, make won’t be able to infer from context anymore how to compile your code; you’ll need to start telling make how to compile your program, particularly when they involve multiple source (i.e., `.c`) files, as in the case of this problem set. And so we’ll utilize a `Makefile`, a configuration file that tells make exactly what to do. Open up `Makefile`, and let’s take a tour of its lines; notice that this `Makefile` in particular seems quite a bit more complex than others we have seen.

The line below defines a variable called `CC` that specifies that make should use `clang` for compiling.

``CC = clang``

The line below defines a variable called `CFLAGS` that specifies, in turn, that `clang` should use some flags, most of which should look familiar.

``CFLAGS = -ggdb3 -O0 -Qunused-arguments -std=c11 -Wall -Werror``

The line below defines a variable called `EXE`, the value of which will be our program’s name.

``EXE = speller``

The line below defines a variable called `HDRS`, the value of which is a space-separated list of header files used by `speller`.

``HDRS = dictionary.h``

The line below defines a variable called `LIBS`, the value of which is should be a space-separated list of libraries, each of which should be prefixed with `-l`. (Recall our use of `-lcs50` earlier this term.) Odds are you won’t need to enumerate any libraries for this problem set, but we’ve included the variable just in case.

``LIBS =``

The line below defines a variable called `SRCS`, the value of which is a space-separated list of C files (and in this case, also an object file for which you do not have the source code) that will collectively implement speller.

``SRCS = speller.c dictionary.c staff.o``

The line below defines a variable called `OBJS`, the value of which is identical to that of `SRCS`, except that each file’s extension is not `.c` but `.o`.

``OBJS = $(SRCS:.c=.o)`` The lines below define a "target" using these variables that tells make how to compile speller. ``````$(EXE): $(OBJS) Makefile$(CC) $(CFLAGS) -o$@ $(OBJS)$(LIBS)``````

The line below specifies that our `.o` files all "depend on" `dictionary.h` and `Makefile` so that changes to either induce recompilation of the former when you run `make`.

``$(OBJS):$(HDRS) Makefile``

Finally, the lines below define another target for cleaning up this problem set’s directory.

``````clean:
rm -f core \$(EXE) dictionary.o speller.o``````

Know that you’re welcome to modify this `Makefile` as you see fit. In fact, you should if you create any `.c` or `.h` files of your own. But be sure not to change any tabs (i.e., `\t`) to spaces, since `make` expects the former to be present below each target.

The net effect of all these lines is that you can compile `speller` with a single command, even though it comprises quite a few files:

``make speller``

Even better, you can also just execute:

``make``

And if you ever want to delete speller plus any `core` or `.o` files, you can do so with a single command:

``make clean``

In general, though, anytime you want to compile your code for this problem set, it should suffice to run:

``make``

### speller.c

Okay, next open up `speller.c` and spend some time looking over the code and comments therein. You won’t need to (and indeed, shouldn’t!) change anything in this file, but you should understand it nonetheless. Notice how, by way of `getrusage`, we’ll be "benchmarking" (i.e., timing the execution of) your implementations of `check`, `load`, `size`, and `unload`. Also notice how we go about passing `check`, word by word, the contents of some file to be spell-checked. Ultimately, we report each misspelling in that file along with a bunch of statistics.

Notice, incidentally, that we have defined the usage of `speller` to be

``Usage: speller [dictionary] text``

where `dictionary` is assumed to be a file containing a list of lowercase words, one per line, and `text` is a file to be spell-checked. As the brackets suggest, provision of `dictionary` is optional; if this argument is omitted, `speller` will use `dictionaries/large` by default. In other words, running

``./speller text``

will be equivalent to running

``./speller dictionaries/large text``

where `text` is the file you wish to spell-check. Suffice it to say, the former is easier to type! (Of course, `speller` will not be able to load any dictionaries until you implement `load` in `dictionary.c`! Until then, you’ll see Could not load.)

Within the default dictionary, mind you, are 143,091 words, all of which must be loaded into memory! In fact, take a peek at that file to get a sense of its structure and size. Notice that every word in that file appears in lowercase (even, for simplicity, proper nouns and acronyms). From top to bottom, the file is sorted lexicographically, with only one word per line (each of which ends with `\n`). No word is longer than 45 characters, and no word appears more than once. During development, you may find it helpful to provide `speller` with a `dictionary` of your own that contains far fewer words, lest you struggle to debug an otherwise enormous structure in memory. In `dictionaries/small` is one such dictionary. To use it, execute

``./speller dictionaries/small text``

where `text` is the file you wish to spell-check. Don’t move on until you’re sure you understand how `speller` itself works!

Odds are, you didn’t spend enough time looking over `speller.c`. Go back one square and walk yourself through it again!

### texts

So that you can test your implementation of `speller`, we’ve also provided you with a whole bunch of texts, among them the script from Austin Powers: International Man of Mystery, a sound bite from Ralph Wiggum, three million bytes from Tolstoy, some excerpts from Machiavelli and Shakespeare, the entirety of the King James V Bible, and more. So that you know what to expect, open and skim each of those files, all of which are in a directory called `texts` within your `mispellings` directory.

Now, as you should know from having read over `speller.c` carefully, the output of `speller`, if executed with, say,

``./speller texts/austinpowers.txt``

will eventually resemble the below. For now, try executing the staff’s solution (using the default dictionary) with the below.

``~cs50/pset5/speller texts/austinpowers.txt``

Below’s some of the output you’ll see. For amusement’s sake, we’ve excerpted some of our favorite "misspellings." And lest we spoil the fun, we’ve omitted our own statistics for now.

``````MISSPELLED WORDS

[...]
Bigglesworth
[...]
Virtucon
[...]
friggin'
[...]
trippy
[...]

WORDS MISSPELLED:
WORDS IN DICTIONARY:
WORDS IN TEXT:
TIME IN check:
TIME IN size:
TIME IN TOTAL:``````

`TIME IN load` represents the number of seconds that `speller` spends executing your implementation of `load`. `TIME IN check` represents the number of seconds that `speller` spends, in total, executing your implementation of `check`. `TIME IN size` represents the number of seconds that `speller` spends executing your implementation of `size`. `TIME IN unload` represents the number of seconds that `speller` spends executing your implementation of `unload`. `TIME IN TOTAL` is the sum of those four measurements.

Note that these times may vary somewhat across executions of `speller`, depending on what else CS50 IDE is doing, even if you don’t change your code.

Incidentally, to be clear, by "misspelled" we simply mean that some word is not in the `dictionary` provided.

And now this:

## Spell Checking

Alright, the challenge now before you is to implement `check` and `unload` in such a way that the program indeed outputs the correct list of misspelled words and the program suffers no memory leaks. But before you dive in, some specifications from us.

• You may not alter `speller.c`.

• You may not alter the function `hash` (yet!)

• You may alter `dictionary.c` (and, in fact, must in order to complete the implementations of `check` and `unload`), but you may not alter the declarations of either `check` or `unload` themselves.

• You may alter `dictionary.h`, but you may not alter the declarations of `hash`, `load`, `check`, `size`, or `unload`.

• You may alter `Makefile`.

• You may add functions to `dictionary.c` or to files of your own creation so long as all of your code compiles via `make`.

• Your implementation of `check` must be case-insensitive. In other words, if `foo` is in dictionary, then `check` should return true given any capitalization thereof; none of `foo`, `foO`, `fOo`, `fOO`, `fOO`, `Foo`, `FoO`, `FOo`, and `FOO` should be considered misspelled.

• Capitalization aside, your implementation of `check` should only return `true` for words actually in `dictionary`. Beware hard-coding common words (e.g., `the`), lest we pass your implementation a `dictionary` without those same words. Moreover, the only possessives allowed are those actually in `dictionary`. In other words, even if `foo` is in `dictionary`, `check` should return `false` given `foo’s` if `foo’s` is not also in `dictionary`.

• You may assume that `check` will only be passed strings with alphabetical characters and/or apostrophes.

• You may assume that any `dictionary` passed to your program will be structured exactly like ours, lexicographically sorted from top to bottom with one word per line, each of which ends with `\n`. You may also assume that `dictionary` will contain at least one word, that no word will be longer than `LENGTH` (a constant defined in `dictionary.h`) characters, that no word will appear more than once, and that each word will contain only lowercase alphabetical characters and possibly apostrophes.

• Your spell-checker may only take `text` and, optionally, `dictionary` as input. Although you might be inclined (particularly if among those more comfortable) to "pre-process" our default dictionary in order to derive an "ideal hash function" for it[2], you may not save the output of any such pre-processing to disk in order to load it back into memory on subsequent runs of your spell-checker in order to gain an advantage.

### check

Implement `check`!

Allow us to suggest that you whip up some small files to spell-check before trying out, oh, War and Peace. Here’s Zamyla:

Implement `unload`!

Be sure to free any memory that we allocated in `load`! (Per `dictionary.h`, the hash table itself is declared on the stack, but all of the nodes that are inserted into the hash table are dynamically allocated.) Remember that we used a hash table, and that you know how all the data in that hash table is organized and stored by virtue of its declaration and the definition of each node, even if you can’t see the exact mechanics we used to store it there. Here’s Zamyla with some final suggestions!

In fact, be sure that your spell-checker doesn’t leak any memory at all. Recall that `valgrind` is your newest best friend. Know that `valgrind` watches for leaks while your program is actually running, so be sure to provide command-line arguments if you want `valgrind` to analyze `speller` while you use a particular `dictionary` and/or text, as in the below.

``valgrind --leak-check=full ./speller texts/austinpowers.txt``

And don’t forget about your other good buddy, `gdb`.

## Checking Spell Checking

How to check whether your program is outting the right misspelled words? Well, you’re welcome to consult the "answer keys" that are inside of the `keys` directory that’s inside of your `mispellings` directory. For instance, inside of `keys/austinpowers.txt` are all of the words that your program should think are misspelled.

You could therefore run your program on some text in one window, as with the below.

``./speller texts/austinpowers.txt``

And you could then run the staff’s solution on the same text in another window, as with the below.

``~cs50/pset5/speller texts/austinpowers.txt``

And you could then compare the windows visually side by side. That could get tedious quickly, though. So you might instead want to "redirect" your program’s output to a file (just like you may have done quite a ways back, around Chapter 3), as with the below.

``````./speller texts/austinpowers.txt > student.txt
~cs50/pset5/speller texts/austinpowers.txt > staff.txt``````

You can then compare both files side by side in the same window with a utility like `diff`, as with the below.

``diff -y student.txt staff.txt``

Alternatively, to save time, you could just compare your program’s output (assuming you redirected it to, e.g., `student.txt`) against one of the answer keys without running the staff’s solution, as with the below.

``diff -y student.txt keys/austinpowers.txt``

If your program’s output matches the staff’s, `diff` will output two columns that should be identical except for, perhaps, the running times at the bottom. If the columns differ, though, you’ll see a `>` or `|` where they differ. For instance, if you see

``````MISSPELLED WORDS                                                MISSPELLED WORDS

FOTTAGE                                                         FOTTAGE
INT                                                             INT
> EVIL'S
s                                                               s
> EVIL'S
Farbissina                                                      Farbissina``````

that means your program (whose output is on the left) does not think that `EVIL’s` is misspelled, even though the staff’s output (on the right) does, as is implied by the absence of `EVIL’s` in the lefthand column and the presence of `EVIL’s` in the righthand column.

To test your code less manually (though still not exhaustively), you may also execute the below.

``check50 2015.fall.pset5.speller dictionary.c dictionary.h staff.o Makefile``

Note that `check50` does not check for memory leaks, so be sure to run `valgrind` as prescribed as well.

How to assess just how fast (and correct) your code is? Well, as always, feel free to play with the staff’s solution, as with the below, and compare its numbers against yours.

``~cs50/pset5/speller texts/austinpowers.txt``

This was Misspellings (Part 1).

1. Hope you appreciate the irony.
2. Don’t worry if you have no idea what we’re talking about, that would be some particularly advanced trickery!