# Problem: Server (Part 1)

Questions? Feel free to head to CS50 on Reddit, CS50 on StackExchange, the `#cs50ap` channel on CS50x Slack (after signing up), or the CS50 Facebook group.

## Objectives

• Become familiar with HTTP.

• Apply familiar techniques in unfamiliar contexts.

• Transition from C to web programming

This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.

### Reasonable

• Communicating with classmates about problems in English (or some other spoken language).

• Discussing the course’s material with others in order to understand it better.

• Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.

• Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.

• Reviewing past years' quizzes, tests, and solutions thereto.

• Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.

• Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.

• Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.

• Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.

• Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

### Not Reasonable

• Accessing a solution to some problem prior to (re-)submitting your own.

• Asking a classmate to see his or her solution to a problem before (re-)submitting your own.

• Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.

• Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.

• Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.

• Looking at another individual’s work during a quiz or test.

• Paying or offering to pay an individual for work that you may submit as (part of) your own.

• Providing or making available solutions to problems to individuals who might take this course in the future.

• Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.

• Searching for or soliciting outright solutions to problems online or elsewhere.

• Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).

• Submitting (after possibly modifying) the work of another individual beyond allowed snippets.

• Submitting the same or similar work to this course that you have submitted or will submit to another.

• Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.

• Viewing another’s solution to a problem and basing your own solution on it.

## Assessment

Your work on this problem set will be evaluated along four axes primarily.

Scope

To what extent does your code implement the features required by our specification?

Correctness

To what extent is your code consistent with our specifications and free of bugs?

Design

To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?

Style

To what extent is your code readable (i.e., commented and indented with variables aptly named)?

To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.

First, join David for a tour of HTTP, the “protocol” via which web browsers and web servers communicate.

Next, consider reviewing some of these examples from Week 7, via which we introduced HTML, the language in which web pages are written.

And also some of these examples, via which we introduced CSS, the language with which web pages can be stylized.

Next, consider reviewing some of these examples, via which we introduced HTML forms, which we used to submit GET queries to Google.

For another perspective altogether, join Daven for a tour of HTML too. Don’t miss the bloopers at the end!

Finally, join Joseph (and Rob) for a closer look at CSS.

## Getting Started

``update50``

within a terminal window to make sure your workspace is up-to-date.

Then execute

``cd ~/workspace/chapterB``

at your prompt to ensure that you’re inside of `unit6` (which is inside of `workspace` which is inside of your home directory). Then execute

``wget http://docs.cs50.net/2016/ap/problems/server/server.zip``

to download this problem’s distro. Unzip the ZIP file (remember how?) and then delete the ZIP file from your `unit6` directory. Navigate into your newly-created `server` directory (remember how?) and type:

``ls``

You should see that your directory contains five files and a folder.

``Makefile	parser.c	parser.h	pointers.c	public/		server.o``

Now execute

``tree``

(which is a hierarchical, recursive variant of `ls`), and you should see that the directory contains the below.

``````.
├── Makefile
├── public
│   ├── cat.html
│   ├── cat.jpg
│   ├── favicon.ico
│   ├── hello.html
│   ├── hello.php
│   └── test
│       └── index.html
└── server.o``````

Dang it, still C. But some other stuff too!

Go ahead and take a look at `cat.html`. Pretty simple, right? Looks like it has an `img` tag, the value of whose `src` attribute is `cat.jpg`.

Next, take a look at `hello.html`. Notice how it has a `form` that’s configured to submit via GET a `text` field called `name` to `hello.php`. Make sense? IIf not, try taking another look at the walkthrough for `search-0.html`.

Now take a look at `hello.php`. Notice how it’s mostly HTML but inside its `body` is a bit of PHP code:

``<?= htmlspecialchars($_GET[“name”]) ?>`` The `<?=` notation just means “echo the following value here”. `htmlspecialchars`, meanwhile, is just an atrociously named function whose purpose in life is to ensure that special (even dangerous!) characters like `<` are properly “escaped” as HTML “entities”. See http://php.net/manual/en/function.htmlspecialchars.php for more details if curious. Anyhow `$_GET` is a “superglobal” variable inside of which are any HTTP parameters that were passed via GET to `hello.php`. More specifically, it’s an “associative array” (i.e., hash table) with keys and values. Per that HTML form in `hello.html`, one such key should be `name`! But more on all that in a bit.

Now the fun part. Open up `parser.h` and `parser.c`.

The challenge ahead is to implement the parsing part of a web server that knows how to serve static content (i.e., files ending in `.html`, `.jpg`, et al.) and dynamic content (i.e., files ending in `.php`).

Want to try out the staff’s solution before we dive into distribution code? Execute the below to download the latest version of the staff’s solution, as the version in CS50 IDE by default is outdated. Note that the `O` in `-O` is a capitalized letter `O`, not a zero.

``````sudo wget –O ~cs50/chapterB/server http://docs.cs50.net/2016/ap/problems/server/server
sudo chmod a+x ~cs50/chapterB/server``````

Then execute the below to run the staff’s implementation of `server`.

``~cs50/chapterB/server``

You should see these instructions:

``Usage: server [-p port] /path/to/root``

Looks a bit complex, but that’s just a conventional way of saying:

• This program’s name is `server`.

• To specify a (TCP) port number on which `server` should listen for HTTP requests, include `-p` as a command-line argument, followed by (presumably) a number. The brackets imply that specifying a port is optional. (If you don’t specify, the program will default to port 8080, which is required by CS50 IDE.)

• The last command-line argument to `server` should be the path to your server’s “root” (the directory from which files will be served).

Let’s try it out. Execute the below within your own `~/workspace/chapterB` directory so that the staff’s solution uses your own copy of `public` as its root.

``~cs50/chapterB/server public``

You should see output like the below.

``````Using /home/Ubuntu/workspace/chapterB/public for server’s root
Listening on port 8080``````

Toward the top-right corner of CS50 IDE, meanwhile, you should see your workspace’s “fully qualified domain name,” an address of the form `ide50-username.cs50.io`, where `username` is your own username. Visit `https://ide50-username.cs50.io/` (where `username` is your own username) in another tab. You should see a “directory listing” (i.e. an unordered list) of everything that’s in `public`, yes? And if you click cat.jpg, you should see a happy cat. If not, do just reach out to classmates or staff for a hand!

Incidentally, even though `server` is running on port 8080, CS50-IDE’s is “port-forwarding” port 80 (which, recall, is browsers’ default) to 8080 for you. That’s why you don’t need to specify 8080 in the URL you just visited.

Anyhow, assuming you indeed saw a happy cat in that tab, you should also see

``GET /cat.jpg HTTP/1.1``

in your terminal window, which is the “request line” that your browser sent to the server (which is being outputted by `server` via `printf` for diagnostics’ sake). Below that you should see all of the headers that your browser sent to `server` followed by

``HTTP/1.1 200 OK``

which is the server’s response to the browser (which is also being outputted by `server` via `printf` for diagnostics’ sake).

Next, just like I did in that short on HTTP, open up Chrome’s developer tools, per the instructions at https://developer.chrome.com/devtools. Then, once open, click the tools’ Network tab, and then, while holding down Shift, reload the page.

Not only should you see Happy Cat again, you should also see the below in your terminal window.

``````GET /cat.jpg HTTP/1.1
HTTP/1.1 200 OK``````

You might also see the below.

``````GET /favicon.ico HTTP/1.1
HTTP/1.1 200 OK``````

What’s going on if so? Well, by convention, a lot of websites have in their root directory a `favicon.ico` file, which is just a tiny icon that’s meant to be displayed in a browser’s address bar or tab. If you do see those lines in your terminal window, that just means Chrome is guessing that your server, too , might have `favicon.ico` file, which it does!

Here’s a quick walkthrough if a demo may help.

Alright, now try visiting `https://ide50-username.cs50.io/cat.html`. (Note the `.html` instead of `.jpg` this time.) You should see Happy Cat again, possibly with a bit of margin around him (simply because of Chrome’s default CSS properties). If you look at the developer tools’ Network tab (possibly after reloading, if they weren’t still open), you should see that Chrome first requested `cat.html` followed by `cat.jpg`, since the latter, really, was specified as the value of the `img` element’s `src` attribute that we saw earlier in `cat.html`. To confirm as much, take a look at the developer tools’ Elements tab, wherein you’ll see a pretty-printed version of the HTML in `cat.html`. You can even change the HTML, but only Chrome’s in-memory copy thereof. To change the actual file, you’d need to do so in the usual way within CS50 IDE. Incidentally, you might find it interesting to tinker with the developer tools’ Styles tab, too. Even though this page doesn’t’ have any CSS of its own, you can see and change (temporarily) Chrome’s default CSS properties via that tab.

Okay, one last test. Try visiting `https://ide50-username.cs50.io/hello.html`. Go ahead and input your name into the form and then submit it, as by clicking the button or hitting Enter. You should find yourself at a URL like `https://ide50-username.cs50.io/hello.php?name=Alice` (albeit with your name, not Alice’s, unless your name is also Alice), where a personalized hello awaits! That’s what we mean by “dynamic” content. By submitting that form, you provided input (i.e., your name) to the server, which then generated output just for you. (That input was in the form of an “HTTP parameter” called `name`, the value of which was your name.) Indeed, if you look at the page’s source code (as via the developer tools’ Elements tab), you’ll see your name embedded within the HTML! By contrast, files like `cat.jpg` and `cat.html` (and even `hello.html`) are “static” content, since they’re not dynamically generated.

Neat, eh?? Though odds are you’ll find it easier to test your own code via a command line than with a browser. So let’s show you one other technique.

Open up a second terminal window and position it alongside your first. In the first terminal window, execute

``~cs50/chapterB/server public``

from within your `~/workspace/unit6` directory, if the server isn’t already running. Then, in the second terminal window, execute the below. (Note the `http://` this time instead of `https://`.)

``curl –i http://localhost:8080/``

If you haven’t used `curl` before, it’s a command-line program with which you can send HTTP requests (and more) to a server in order to see its responses. The `-i` flag tells `curl` to include responses’ HTTP headers in the output. Odds are, whilst debugging your server, you’ll find it more convenient (and revealing!) to see all of that via `curl` than by poking around Chrome’s developer tools.

Incidentally, take care not to request `cat.jpg` (or any binary file) via `curl`, else you’ll see quite a mess! (You’re about to try, aren’t you.)

Unfortunately, your server right now doesn’t have functionality… yet! If you open up `parser.h`, you’ll see declarations and descriptions for six functions, `error`, `extract_request`, `extract_headers`, `parse`, `extract_query`, and `respond`. While implementing your server, do take care to note the follow.

• You may alter `Makefile`

• You may alter `parser.h`, but may not alter the declarations of any of the functions therein. Odds are, you won’t need to change `parser.h`.

• You may alter `parser.c`, and in fact, must in order to complete the implementation of your server.

• Do not delete the `server.o` file as that’s where most of the server’s implementation is held!

## OmgLikeWhut

Recall that HTTP messages adhere to a “grammar,” which is to say they’re formatted according to a set of rules, patterns that web servers and web browsers can parse. Consider a (simplified) grammar below, wherein any bold symbol is further defined by some other rule. Know that `CRLF` represents `\r\n`, that `SP` represents a single white space, that `*` means “zero or more” (of whatever’s in parentheses), that `/` means “or” and that square brackets mean something’s optional.

``````HTTP-message	=	*start-line*
CRLF
[ message-body ]
start-line	=	*request-line* /  *status-line*
request-line	=	method SP *request-target* SP HTTP-version CRLF
status-line	=	HTTP-version SP status-code SP reason-phrase CRLF

One important task of web servers is to parse the HTTP message to ensure the messages are “grammatically correct”.

Before we jump into coding, let’s review a old friend from our past: pointers (you’ll thank us later)! Recall that a `string` is just a `char*`, which is a pointer to single character. The program will then continue reading 1 byte (the size of a `char`) at a time until a `\0` is reached. So if we have the following string:

``char* word = “hello, world”;``

Then the `char*` pointer, word, would point to the first letter in the string, which in this case, is `h`. Let’s arbitrarily say that `h` is located in memory at location `0x05` (remember hex?). Then the `e` would be at `0x06`, the `l` at `0x07` and so on and so forth until the NULL terminator is read at `0x11`. So if we had a pointer that pointed to the NULL terminator, and the pointer, word, we can get the length of the string by subtracting the pointer, word, from the pointer that points to the NULL terminator (`0x11 – 0x05 = 0x0C` which is 12, the length of the string word). Remember, all strings end with a NULL terminator, even an empty string such as `””`!

Still confused? Open up `pointers.c` and take a look at the code inside. Here, we parse a sentence, `”hello, world”` to find the first and second words of the sentence. We isolate the variables, once in a char array and once using dynamically allocated memory. Note that `strcpy` copies in the `’\0’` whereas `strncpy` does not. You’re more than welcome, and in fact encouraged, to use `pointers.c` as a reference throughout this problem.

## Divide and Conquer

Feel free to divide the work in whichever manner you believe to be the most efficient, but our recommendation is for one person to take on the `extract_request`, `extract_headers`, and `extract_query`, while the other person tackles `parse`. Be sure to communicate throughout the whole coding and problem-solving process to make the problem much easier! After all, two heads are better than one.

## Such Parsing, Much Wow

If you try to run your version of `server`, you’ll see that it doesn’t do much at all. Now that we had that little review of pointers, your job, as a team, is to implement the parsing functionality. Let’s dive in.

### extract_request

Complete the implementation of `extract_request` in such a way that the function takes the parameter `message`, which is the HTTP message, and returns the request-line.

The request-line of the message is the part of HTTP message up to and including the first `CRLF` (or `\r\n`). If no `CRLF` is found, respond to the browser with 500 Internal Server Error by calling

``error(500);``

and returning NULL. Else, if a `CRLF` is found, “extract” the request-line by dynamically allocating (remember how?) a `char*` and taking care to copy only the request-line into the new `char*`. Do take care to ensure all your strings end in a NULL terminator and not to worry about freeing your dynamically allocated memory! We take care of that for you in `server.o`.

Odds are you’ll find functions like `strchr`, `strstr`, `strcpy`, `strncpy`, `strncmp`, `strcmp`, and/or `strcasecmp` of help!

Complete the implementation of `extract_headers` in such a way that the function takes the parameter `content` and extracts the headers, responding with the interpreter’s content.

Similarly to `extract_request`, the headers are the part up to and including the first `CRLF CRLF` in `content`. If no `CRLF CRLF` is found, `free` `content`, respond to the browser with 500 Internal Server Error, and stop the function by simply calling `return`.

Else, copy in the header information to a new variable, though this time the variable need not be in dynamically allocated memory since it is not being returned by the function.

Finally, call `respond` with a `200`. If you look at `parser.h`, you’ll see that `respond` takes four arguments, `code`, `headers`, `body`, and `length`, where `code` is your response code to the browser, `headers` are the headers you extracted in the function, `body` is the part of the HTTP message that comes after the headers (after the `CRLF CRLF`), and `length` is the length of the body. Odds are your call to `respond` will look something like

``respond(200, headers, needle + 4, length – size of headers);``

where `headers` is the extracted headers and `needle` (we use the variable name needle to represent finding a needle in a haystack, wherein `content` is the haystack here) is a pointer to `CRLF CRLF`. To calculate the length of the body, calculate the size of the headers and subtract that from `length`, which was passed to `extract_headers` as an argument.

Odds are you’ll find functions like `strchr`, `strstr`, `strcpy`, `strncpy`, `strncmp`, `strcmp`, and/or `strcasecmp` of help!

### extract_query

Complete the implementation of `extract_query` in such a way that the function takes in `target`, the request-target of the HTTP message, and stores the absolute path and query in `abs_path` and `query` respectively.

Per 5.3 of http://tools.ietf.org/html/rfc7230, a `request-target` can take several forms, the only one of which your server needs to support is

``absolute-path [ “?” query ]``

whereby `absolute-path` (which will not contain `?`) might optionally be followed by a `?` followed by a `query`. For example, a `request-target` may look like

``````/hello.php
/hello.php?
/hello.php?q=Alice``````

where `/hello.php` is the `absolute-path` for all three above, but the only query present in any of the three `request-target’s above is `q=Alice`. Do take note that the `?` is part of neither the `absolute-path` nor the `query`. The presence of a `?` simply indicates that there may possibly be a `query` that follows afterwards.

Per `target`, the `request-target` passed into `extract_query` as an argument, so that the `absolute-path` is stored at the address in `abs_path` and the `query` is stored at the address in `query`. If that substring is absent (even if a `?` is present), then `query` should be `””`, thereby consuming one byte, whereby `query[0]` is `’\0’`.

For instance, if `request-target` is `/hello.php` or `/hello.php?` then `query` should have a value of `””`. And if `request-target` is `/hello.php?q=Alice`, then `query` should have a value of `q=Alice`.

Odds are you’ll find functions like `strchr`, `strstr`, `strcpy`, `strncpy`, `strncmp`, `strcmp`, and/or `strcasecmp` of help!

### parse

Complete the implementation of `parse` in such a way that the function parses (i.e. iterates over) `line`, ensuring that the `request-line` is “grammatically correct”.

Per 3.1.1 of http://tools.ietf.org/html/rfc7230, a `request-line` is defined as

``method SP request-target SP HTTP-version CRLF``

wherein `SP` represents a single space (` `) and `CRLF` represents `\r\n`. None of `method`, `request-target`, and `HTTP-version`, meanwhile, may contain `SP`.

Parse `line` by isolating the `method`, `request-target`, `HTTP-version` and `CRLF` and storing them into separate `char` array variables, so that a variable named `target` would only hold the `request-target`.

Ensure that `request-line` (which is passed into `parse` as `line`) is consistent with the following rules. If is not, respond to the browser with the appropriate response code, by calling the `error` with the response code as shown below.

``error(400);``
• If `method` is not `GET`, respond to the browser with 405 Method Not Allowed and return `false`;

• If `request-target` does not begin with `/`, respond to the browser with 501 Not Implemented and return `false`;

• If `request-target` contains a `”`, respond to the browser with 400 Bad Request and return `false`;

• If `HTTP-version` is not `HTTP/1.1`, respond to the browser with 505 HTTP Version Not Supported and return `false`;

• If there is no `\r\n`, respond to the browser with 414 Request-URI Too Long and return `false`;

• If there are more or fewer than two `SP`, respond to the browser with 400 Bad Request and return `false`;

• If anything else in the `request-line` is incorrectly formatted, respond to the browser with 400 Bad Request and return `false`;

• Else if everything is properly formatted, call `extract_query` and return `true` as per below

``````extract_query(target, abs_path, query);
return true;``````

wherein `target` is the `request-target` you extracted earlier, and `abs_path` and `query` are the arguments passed into `parse`.

Odds are you’ll find functions like `strchr`, `strstr`, `strcpy`, `strncpy`, `strncmp`, `strcmp`, and/or `strcasecmp` of help!

This was Server (Part 1).