Problem: Server (Part 2)

Questions? Feel free to head to CS50 on Reddit, CS50 on StackExchange, the #cs50ap channel on CS50x Slack (after signing up), or the CS50 Facebook group.

Objectives

  • Become familiar with HTTP.

  • Apply familiar techniques in unfamiliar contexts.

  • Transition from C to web programming

Academic Honesty

This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.

The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.

Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.

Reasonable

  • Communicating with classmates about problems in English (or some other spoken language).

  • Discussing the course’s material with others in order to understand it better.

  • Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.

  • Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.

  • Reviewing past years' quizzes, tests, and solutions thereto.

  • Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.

  • Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.

  • Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.

  • Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.

  • Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.

Not Reasonable

  • Accessing a solution to some problem prior to (re-)submitting your own.

  • Asking a classmate to see his or her solution to a problem before (re-)submitting your own.

  • Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.

  • Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.

  • Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.

  • Looking at another individual’s work during a quiz or test.

  • Paying or offering to pay an individual for work that you may submit as (part of) your own.

  • Providing or making available solutions to problems to individuals who might take this course in the future.

  • Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.

  • Searching for or soliciting outright solutions to problems online or elsewhere.

  • Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).

  • Submitting (after possibly modifying) the work of another individual beyond allowed snippets.

  • Submitting the same or similar work to this course that you have submitted or will submit to another.

  • Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.

  • Viewing another’s solution to a problem and basing your own solution on it.

Assessment

Your work on this problem set will be evaluated along four axes primarily.

Scope

To what extent does your code implement the features required by our specification?

Correctness

To what extent is your code consistent with our specifications and free of bugs?

Design

To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?

Style

To what extent is your code readable (i.e., commented and indented with variables aptly named)?

To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.

Getting Started

First, log into cs50.io and execute

update50

within a terminal window to make sure your workspace is up-to-date.

Below are two options for getting started with this problem. The first option is for those who wish to start with the staff’s implementation of the features from Server (Part 1) already implemented for them. The second option is for those who wish to complete their own implementation of Server, relying on their own solution to Part 1 as their foundation. Only choose one of the below two options.

Then, after having chosen your option and followed all the steps therein, pick up at "Get Servered".

Option 1: Start from a Clean Slate

In your terminal window, execute

cd ~/workspace/chapterB

Then execute

wget http://docs.cs50.net/2016/ap/problems/server/2/server2.zip

Confirm you’ve downloaded that file, then unzip server2.zip (remember how?) and remove the ZIP file (remember how?).

Then navigate into the server2 directory and list its contents (remember how?) and you should find that the directory contains four files and one folder

Makefile	parser.c	parser.h	public/		server.c

Take care to only edit server.c for this problem, and not parser.c and parser.h. Though you are free to alter Makefile, you will most likely not find a need to do so.

Option 2: Extend Your Server

In your terminal window, execute

cd ~/workspace/chapterB

Then execute

wget http://docs.cs50.net/ap/2016/ap/problems/server/2/server2.zip

Confirm you’ve downloaded that file, then unzip server2.zip (remember how?) and remove the ZIP file (remember how?).

Then navigate into the server2 directory and list its contents (remember how?) and you should find that the directory contains four files and one folder

Makefile	parser.c	parser.h	public/		server.c

Copy and paste your parser.c from Part 1 into the distro’s parser.c, and if relevant, also copy in your parser.h. Though you are free to alter this Problem’s Makefile again, do not copy in the Makefile from Part 1 since this Makefile is a bit different from Server (Part 1).

Get Servered

The files in public/ as well as parser.c and parser.h should be familiar to you from Problem 6-6. But maybe not so much with server.c. In this problem, you will implement the serving side of the server in order to create a complete server! Let’s dive into that distribution code, starting with a high-level overview.

And now a lower-level tour through the code.

server.c

Open up server.c, if not open already. Let’s take a tour.

  • Atop the file are a bunch of "feature test macro requirements" that allow us to use certain functions that are declared (conditionally) in the header files further below.

  • Defined next are a few constants that specify limits on HTTP requests sizes. We’ve (arbitrarily) based their values on defaults used by Apache, a popular web server. See httphttpd.apahce.orgdocs2.2modcore.html if curious.

  • Defined next is BYTES, a constant that specifies how many bytes we’ll eventually be reading into buffers at a time.

  • Next are a bunch of header files, including parser.h, followed by a definition of BYTE, which we’ve indeed defined as an 8-bit char, followed by a bunch of prototypes.

  • Finally, just above main are just a few global variables.

main

Let’s now walk through main.

  • Atop main is an initialization of what appears to be a global variable called errno. In fact, errno is defined in errno.h and is used by quite a few functions to indicate (via an int), in cases of error, precisely which error has occurred. See man errno for more details.

  • Shortly thereafter is a call to getopt, which is a function declared in unistd.h that makes it easier to parse command-line arguments. See man 3 getopt if curious. Notice how we use getopt (and some Boolean expressions) to ensure that server is used properly.

  • Next notice the call to start (for which you may have noticed a prototype earlier). More on that later.

  • Below that is a declaration of a struct sigaction via which we’ll listen for SIGINT (i.e., control-c), calling handler (a function defined by use elsewhere in server.c) if heard.

  • And then, after declaring some variables, main enters an infinite while loop.

    • Atop that loop, we first free any memory that might have been allocated by a previous iteration of the loop.

    • We then check whether we’ve been "signaled" via control-c to stop the server. Thereafter, within an if statement, is a call to connected, a function you will implement so that it returns true if a client (e.g., a browser or even curl) has connected to the server.

    • After that is a call to extract_request and parse which you’ve implemented in Problem 6-6.

    • Next is a bunch of code that decodes that path (decoding any URL-encoded characters like %20) and "resolves" the path to a local path, figuring out exactly what file was requested on the server itself.

    • Below that, we ascertain whether that path leads to a directory or to a file and handle the request accordingly, ultimately calling list, interpret, or transfer.

      • For directories (that don’t have an index.php or index.html file inside them), we call list in order to display the directory’s contents.

      • For files ending in .php (whose "MIME type" is text/x-php), we call interpret.

      • For other (supported) files, we call transfer.

And that’s it for main! Notice, though, that throughout main are a few uses of continue, the effect of which is to jump back to the start of that infinite loop. Just before continue in some cases, too, is a call to error (another function we wrote) with an HTTP status code. Together, those lines allow the server to handle and respond to errors just before returning its attention to new requests.

connected

Oh no, seems like we didn’t implement this one. Back to this later.

error

Spend a bit of time looking through error, which is that function via which we respond to browsers with errors (e.g., 404). This function, though perhaps a bit long, should perhaps have some more familiar constructs. (If curious, we’re using log10 simply to figure out how many digits, and thus char s, code is.)

freedir

This function exists simply to facilitate freeing memory that’s allocated by a function called scandir that we call in list.

handler

Thankfully, a short one! This function (called whenever a user hits control-c) essentially tells main to call stop by setting signaled, a global variable, to true.

htmlspecialchars

This function, named identically to that PHP function we saw earlier, escapes characters (e.g., < as <) that might otherwise "break" an HTML page. We call it from list, lest some file or directory we’re listing have a "dangerous" character in its name.

indexes

Though perhaps a bit confusing, this function checks to see if index.php or index.html exists in a directory or folder, and if indeed one exists, returns the path to that file so that the server can load the page.

interpret

This function enables the server to interpret PHP files. It’s a bit cryptic at first glance, but in a nutshell, all we’re doing,, upon receiving a request, say, hello.php, is executing a line like

QUERY_STRING="name=Alice" REDIRECT_STATUS=200 SCRIPT_FILENAME=homeubuntuworkspaceunit6server2publichello.php php-cgi

the effect of which is to pass the contents of hello.php to PHP’s interpreter (i.e., php-cgi), with any HTTP parameters supplied via an "environment variable" called QUERY_STRING. Via load (a function we wrote), we then read the interpreter’s output into memory (via `load). And then we respond to the browser with (dynamically generated) output like

HTTP/1.1 200 OK
X-Powered-By: PHP/5.5.9-1ubuntu4.12
Content-type: text/html

<!DOCTYPE html>

<html>
	<head>
		<title>hello</title>
	</head>
	<body>
		hello, Alice
	</body>
</html>

Even though the PHP code in hello.php is pretty-printed, it’s output isn’t quite as pretty. (Take a look at hello.php. Can you deduce why?)

Odds are you’re unfamiliar with open. That function opens a pipe to a process (php-cgi in our case), which provides us with a FILE pointer via which we can read that process’s standard output (as though it were an actual file).

Notice how this function calls load, though, in order to read the PHP interpreter’s output into memory and the extraction process is done by extract_headers, a function you implemented in the previous problem.

list

Ah, here’s that function that generates a directory listing. Notice how much code it takes to generate HTML using C, thanks to requisite memory management.

load

This function loads a file into dynamically allocated memory, storing the address of the loaded file in content (notice how the argument passed into load is BYTE** content, a pointer that points to a pointer that points to a BYTE) and stores the length of the loaded file in length.

lookup

A simple function that looks up the MIME type of a file and returns the supported extensions (e.g., text/html), else returns NULL.

reason

This function simply maps HTTP "status codes" (e.g., 200) to "reason phrases" (e.g., OK).

redirect

Ah, neat, this function redirects a client to another location (i.e., URL) by sending a status code of 301 plus a Location header.

request

Ah, this one’s a biggie. But worth reading through. When the server receives a request from a client, the server doesn’t know in advance how many characters the request will comprise. And so this function iteratively reads bytes form the client, one buffer’s worth at a time, calling realloc as needed to store the entire message (i.e., request).

Notice this function’s use of pointers, dynamic memory allocation, pointer arithmetic, and more. All somewhat familiar by now, but definitely a lot of it all in one place! Do try to understand each and every line, if only for the practice. Ultimately, it keeps reading bytes from the client until it encounters \r\n\r\n (aka CRLF CRLF), which, according to HTTP’s specs, marks the end of a request’s headers.

If curious, know that read is quite like read except that it reads from a "file descriptor" (i.e., an int) instead of from a FILE pointer (i.e., FILE*). See its man page for more.

respond

It’s this function that actually sends to a client an HTTP response, given a status code, heads, a body, and that body’s length. For instance, it’s this function that sends a response like the below.

HTTP/1.1 200 OK
X-Powered-By: PHP/5.5.9-1ubuntu4.12
Content-type: text/html

<!DOCTYPE html>

<html>
	<head>
		<title>hello</title>
	</head>
	<body>
		hello, Alice
	</body>
</html>

Know that printf is quite like printf (or, really fprintf) except that the former, like read, writes to a "file descriptor" instead of a FILE*.

create

Hmm, seems like this function isn’t fully implemented yet.

listener

Darn, another TODO. More on that later.

start

Here’s the function that started it all (pun intended). This function finds the path to the server’s root, ensuring it is executable, then calls create and listen to help start the server.

stop

Drat, another one. But at least it’s our last!

transfer

This function’s purpose in life is to transfer a file from the server to a client. Whereas interpret handles dynamic content (generated by PHP scripts), transfer handles static content (e.g., JPEGs). Notice how this function calls load in order to read some file from disk.

urldecode

This function, also named after a PHP function, URL-decodes a string, converting special characters like %20 back to their original values.

Service Check

Phew. Now that we’re done with our tour of the distro code, let’s implement the broken parts of the server! Similarly to Part 1, this is a collaboration problem, and you may thus divide the work amongst the two of you however you find fit. Our recommendation is for one person to tackle connect and listener while the other handles create and stop.

Though this problem may appear to be conceptually challenging, the number of lines of code you will actually write will not be much. If ever struggling, consult your partner for help!

Recall from Part 1, if you’d like to play with the staff’s implementation of server, execute in a terminal window

~cs50/chapterB/server public

and if you’d like to test out your server with curl, execute

curl -i http://localhost:8080/

connect

Complete the implementation of connect in such a way that the function checks whether a client has connected to the server.

First, define a client address of type struct sockaddr_in and set the address of the client address to 0. Odds are, you’ll find the function memset of use.

Then, create a variable of type socklen_t that holds the length, or size, of the client address variable. Your code should be something reminiscent of

size_t length = x;

with size_t replaced with socklen_t and the length of the client address determined by invoking a function reminiscent of a past problem (how do you determine the size of a variable).

Now, assign cfd, a global variable, to the return value of accept, passing in the appropriate arguments to accept. accept takes in three arguments: a socket file descriptor (which, if you remember, we’ve declared as a global), the address of a sockaddr variable, the address of the length of address, and an optional fourth argument, which will not be needed in this situation. If curious, take a look at the man page for accept. Odds are, your line of code will look something like

cfd = accept(x, y, z);

where x, y, and z are substituted with the appropriate arguments.

Finally, if the value of cfd has not change from -1, return false, else true.

listener

Complete the implementation of listener in such a way that the function listens for a connection and announces the port in use when connected.

Let’s first check the man page for listen. It seems listen takes in two arguments a socket file descriptor and a backlog. Let’s first call listen with our socket file descriptor and SOMAXCONN as our backlog. For those curious, SOMAXCONN is simply the max limit, defined in sys/socket.h, for the number of pending requests the server will take. If the return value of listen is -1, call stop to terminate the server.

Next, let’s do something similar to what we did in connect. Let’s define another struct sockaddr_in address without assigning to it any value, then get the length of your new address.

Once more, let’s pull up the man page for getsockname. Looks familiar, eh? Let’s call getsockname and if the return value is -1, call stop.

If stop isn’t invoked, then our printf statements will print out to the terminal on which port the user server is listening. But if you look at our second print statement, -50 is definitely not the right port that we’re listening on! So delete that TODO and -50 and replace it with

ntohs(addr.sin_port)

where addr is the name of your variable of type sockaddr_in that you declared earlier.

For those curious, ntohs is a function that converts a value’s byte order from the network byte order to the host’s, or client’s, byte order. Some machines are "little endian" while others are "big endian", which are just names for the byte ordering a system uses, so it’s important for clients and networks to be able to read data regardless of their byte order. For more information on ntohs, feel free to use its man page. And if curious about little and big endian, do just take a look at this, but no need to feel obliged!

create

Complete the implementation of create in such a way that the function creates a server socket.

Let’s first take a look at the man page for socket.

Hmm, it seems that socket takes a domain, type, and protocol, as its arguments. Let’s call socket, using AF_INET and SOCK_STREAM for the domain and type and let’s leave protocol as 0, since we don’t have a specified protocol for the socket. Let’s also assign the return value of socket to our global socket variable, sfd.

If curious, AF_INET simply means we’d like TCPIP as our communication domain and SOCK_STREAM let’s socket know that we’d like our socket to provide a sequenced, reliable stream of bytes for our connection.

According to the man page of socket, socket returns -1 upon failure, so let’s check for failure and if indeed socket returned -1, call stop to terminate the server.

Now, let’s allow for reuse of address, to avoid any "Address already in use" messages. We can do that by writing the following code.

int optval = 1;
setsockopt(x, SOL_SOCKET, SO_REUSEADDR, y, z);

where x, y, and z are the socket file descriptor, the address of the optval variable, and the size of the optval variable respectively.

Finally, let’s bind a name, or more accurately an address, to the socket. Create a server address variable of type struct sockaddr_in and set the address of the server address to 0. Odds are, you’ll find the function memset of use. Because your server address is of type struct sockaddr_in, we can access inside the struct (similar to RGBTriple way back when), by using dot notation. If I wanted to access a field inside the struct named name, for example, then I can assign the name to "David" by doing

serv_addr.name = "David";

if my variable’s name were serv_addr. Now assign

  • The sin_family field of your variable to AF_INET

  • The sin_port field to htons(port)

  • And the sin_addr.s_addr field to htonl(INADDR_ANY)

For those curious, htons and htonl are functions that convert a value’s byte order from the network byte order to the host’s, or client’s, byte order. Some machines are "little endian" while others are "big endian", which are just names for the byte ordering a system uses, so it’s important for clients and networks to be able to read data regardless of their byte order. For more information on the functions, feel free to use their man pages. And if curious about little and big endian, do just take a look at this, but no need to feel obliged!

Afterwards, let’s replace that ugly /* TODO */ false inside the if statement with a real Boolean expression. We’ve created a name for the socket but have yet to assign the name to the socket, so let’s do that now.

Check out the man page for bind. Seems like bind takes three arguments. A socket file descriptor (hmm, sounds like something we’ve already seen in this function), an address (or pointer) to a sockaddr_in, and the length, or size, of the address. Upon failure, bind returns -1 so let’s change our if statement to check if bind fails. Your expression should look something like

if (bind(x, y, z) == -1)

where x, y, and z are replaced by the appropriate arguments. If finding yourself in a pickle, consult your partner and/or connect and listen for inspiration!

stop

Complete the implementation of stop in such a way that the function frees any allocated memory not freed and closes any file descriptors still open.

Here’s how.

  • Check to see if the global variable, root, is NULL. If not, then call free.

  • Then check to see if server socket (now, which global variable is that) is open. If still open, call close on the server socket.

  • Finally, stop the server by calling the exit function and passing in errsv as its argument.

This was Server (Part 2).