Problem: Server (Part 2)
Questions? Feel free to head to CS50 on Reddit, CS50 on StackExchange, the #cs50ap
channel on CS50x Slack (after signing up), or the CS50 Facebook group.
Objectives
-
Become familiar with HTTP.
-
Apply familiar techniques in unfamiliar contexts.
-
Transition from C to web programming
Academic Honesty
This course’s philosophy on academic honesty is best stated as "be reasonable." The course recognizes that interactions with classmates and others can facilitate mastery of the course’s material. However, there remains a line between enlisting the help of another and submitting the work of another. This policy characterizes both sides of that line.
The essence of all work that you submit to this course must be your own. Collaboration on problems is not permitted (unless explicitly stated otherwise) except to the extent that you may ask classmates and others for help so long as that help does not reduce to another doing your work for you. Generally speaking, when asking for help, you may show your code or writing to others, but you may not view theirs, so long as you and they respect this policy’s other constraints. Collaboration on quizzes and tests is not permitted at all. Collaboration on the final project is permitted to the extent prescribed by its specification.
Below are rules of thumb that (inexhaustively) characterize acts that the course considers reasonable and not reasonable. If in doubt as to whether some act is reasonable, do not commit it until you solicit and receive approval in writing from your instructor. If a violation of this policy is suspected and confirmed, your instructor reserves the right to impose local sanctions on top of any disciplinary outcome that may include an unsatisfactory or failing grade for work submitted or for the course itself.
Reasonable
-
Communicating with classmates about problems in English (or some other spoken language).
-
Discussing the course’s material with others in order to understand it better.
-
Helping a classmate identify a bug in his or her code, such as by viewing, compiling, or running his or her code, even on your own computer.
-
Incorporating snippets of code that you find online or elsewhere into your own code, provided that those snippets are not themselves solutions to assigned problems and that you cite the snippets' origins.
-
Reviewing past years' quizzes, tests, and solutions thereto.
-
Sending or showing code that you’ve written to someone, possibly a classmate, so that he or she might help you identify and fix a bug.
-
Sharing snippets of your own solutions to problems online so that others might help you identify and fix a bug or other issue.
-
Turning to the web or elsewhere for instruction beyond the course’s own, for references, and for solutions to technical difficulties, but not for outright solutions to problems or your own final project.
-
Whiteboarding solutions to problems with others using diagrams or pseudocode but not actual code.
-
Working with (and even paying) a tutor to help you with the course, provided the tutor does not do your work for you.
Not Reasonable
-
Accessing a solution to some problem prior to (re-)submitting your own.
-
Asking a classmate to see his or her solution to a problem before (re-)submitting your own.
-
Decompiling, deobfuscating, or disassembling the staff’s solutions to problems.
-
Failing to cite (as with comments) the origins of code, writing, or techniques that you discover outside of the course’s own lessons and integrate into your own work, even while respecting this policy’s other constraints.
-
Giving or showing to a classmate a solution to a problem when it is he or she, and not you, who is struggling to solve it.
-
Looking at another individual’s work during a quiz or test.
-
Paying or offering to pay an individual for work that you may submit as (part of) your own.
-
Providing or making available solutions to problems to individuals who might take this course in the future.
-
Searching for, soliciting, or viewing a quiz’s questions or answers prior to taking the quiz.
-
Searching for or soliciting outright solutions to problems online or elsewhere.
-
Splitting a problem’s workload with another individual and combining your work (unless explicitly authorized by the problem itself).
-
Submitting (after possibly modifying) the work of another individual beyond allowed snippets.
-
Submitting the same or similar work to this course that you have submitted or will submit to another.
-
Using resources during a quiz beyond those explicitly allowed in the quiz’s instructions.
-
Viewing another’s solution to a problem and basing your own solution on it.
Assessment
Your work on this problem set will be evaluated along four axes primarily.
- Scope
-
To what extent does your code implement the features required by our specification?
- Correctness
-
To what extent is your code consistent with our specifications and free of bugs?
- Design
-
To what extent is your code written well (i.e., clearly, efficiently, elegantly, and/or logically)?
- Style
-
To what extent is your code readable (i.e., commented and indented with variables aptly named)?
To obtain a passing grade in this course, all students must ordinarily submit all assigned problems unless granted an exception in writing by the instructor.
Getting Started
First, log into cs50.io and execute
update50
within a terminal window to make sure your workspace is up-to-date.
Below are two options for getting started with this problem. The first option is for those who wish to start with the staff’s implementation of the features from Server (Part 1) already implemented for them. The second option is for those who wish to complete their own implementation of Server, relying on their own solution to Part 1 as their foundation. Only choose one of the below two options.
Then, after having chosen your option and followed all the steps therein, pick up at "Get Servered".
Option 1: Start from a Clean Slate
In your terminal window, execute
cd ~/workspace/chapterB
Then execute
wget http://docs.cs50.net/2016/ap/problems/server/2/server2.zip
Confirm you’ve downloaded that file, then unzip server2.zip
(remember how?) and remove the ZIP file (remember how?).
Then navigate into the server2
directory and list its contents (remember how?) and you should find that the directory contains four files and one folder
Makefile parser.c parser.h public/ server.c
Take care to only edit server.c
for this problem, and not parser.c
and parser.h
. Though you are free to alter Makefile
, you will most likely not find a need to do so.
Option 2: Extend Your Server
In your terminal window, execute
cd ~/workspace/chapterB
Then execute
wget http://docs.cs50.net/ap/2016/ap/problems/server/2/server2.zip
Confirm you’ve downloaded that file, then unzip server2.zip
(remember how?) and remove the ZIP file (remember how?).
Then navigate into the server2
directory and list its contents (remember how?) and you should find that the directory contains four files and one folder
Makefile parser.c parser.h public/ server.c
Copy and paste your parser.c
from Part 1 into the distro’s parser.c
, and if relevant, also copy in your parser.h
. Though you are free to alter this Problem’s Makefile
again, do not copy in the Makefile
from Part 1 since this Makefile
is a bit different from Server (Part 1).
Get Servered
The files in public
/ as well as parser.c
and parser.h
should be familiar to you from Problem 6-6. But maybe not so much with server.c
. In this problem, you will implement the serving side of the server in order to create a complete server! Let’s dive into that distribution code, starting with a high-level overview.
And now a lower-level tour through the code.
server.c
Open up server.c
, if not open already. Let’s take a tour.
-
Atop the file are a bunch of "feature test macro requirements" that allow us to use certain functions that are declared (conditionally) in the header files further below.
-
Defined next are a few constants that specify limits on HTTP requests sizes. We’ve (arbitrarily) based their values on defaults used by Apache, a popular web server. See httphttpd.apahce.orgdocs2.2modcore.html if curious.
-
Defined next is
BYTES
, a constant that specifies how many bytes we’ll eventually be reading into buffers at a time. -
Next are a bunch of header files, including parser.h, followed by a definition of
BYTE
, which we’ve indeed defined as an 8-bitchar
, followed by a bunch of prototypes. -
Finally, just above
main
are just a few global variables.
main
Let’s now walk through main
.
-
Atop
main
is an initialization of what appears to be a global variable callederrno
. In fact,errno
is defined inerrno.h
and is used by quite a few functions to indicate (via anint
), in cases of error, precisely which error has occurred. Seeman errno
for more details. -
Shortly thereafter is a call to
getopt
, which is a function declared inunistd.h
that makes it easier to parse command-line arguments. Seeman 3 getopt
if curious. Notice how we usegetopt
(and some Boolean expressions) to ensure thatserver
is used properly. -
Next notice the call to
start
(for which you may have noticed a prototype earlier). More on that later. -
Below that is a declaration of a
struct sigaction
via which we’ll listen forSIGINT
(i.e., control-c), callinghandler
(a function defined by use elsewhere inserver.c
) if heard. -
And then, after declaring some variables,
main
enters an infinitewhile
loop.-
Atop that loop, we first free any memory that might have been allocated by a previous iteration of the loop.
-
We then check whether we’ve been "signaled" via control-c to stop the server. Thereafter, within an
if
statement, is a call toconnected
, a function you will implement so that it returnstrue
if a client (e.g., a browser or evencurl
) has connected to the server. -
After that is a call to
extract_request
andparse
which you’ve implemented in Problem 6-6. -
Next is a bunch of code that decodes that path (decoding any URL-encoded characters like %20) and "resolves" the path to a local path, figuring out exactly what file was requested on the server itself.
-
Below that, we ascertain whether that path leads to a directory or to a file and handle the request accordingly, ultimately calling
list
,interpret
, ortransfer
.-
For directories (that don’t have an
index.php
orindex.html
file inside them), we calllist
in order to display the directory’s contents. -
For files ending in
.php
(whose "MIME type" istext/x-php
), we callinterpret
. -
For other (supported) files, we call
transfer
.
-
-
And that’s it for main
! Notice, though, that throughout main
are a few uses of continue
, the effect of which is to jump back to the start of that infinite loop. Just before continue
in some cases, too, is a call to error
(another function we wrote) with an HTTP status code. Together, those lines allow the server to handle and respond to errors just before returning its attention to new requests.
connected
Oh no, seems like we didn’t implement this one. Back to this later.
error
Spend a bit of time looking through error
, which is that function via which we respond to browsers with errors (e.g., 404). This function, though perhaps a bit long, should perhaps have some more familiar constructs. (If curious, we’re using log10
simply to figure out how many digits, and thus char
s, code
is.)
freedir
This function exists simply to facilitate freeing memory that’s allocated by a function called scandir
that we call in list
.
handler
Thankfully, a short one! This function (called whenever a user hits control-c) essentially tells main
to call stop
by setting signaled
, a global variable, to true
.
htmlspecialchars
This function, named identically to that PHP function we saw earlier, escapes characters (e.g., <
as <
) that might otherwise "break" an HTML page. We call it from list
, lest some file or directory we’re listing have a "dangerous" character in its name.
indexes
Though perhaps a bit confusing, this function checks to see if index.php
or index.html
exists in a directory or folder, and if indeed one exists, returns the path to that file so that the server can load the page.
interpret
This function enables the server to interpret PHP files. It’s a bit cryptic at first glance, but in a nutshell, all we’re doing,, upon receiving a request, say, hello.php
, is executing a line like
QUERY_STRING="name=Alice" REDIRECT_STATUS=200 SCRIPT_FILENAME=homeubuntuworkspaceunit6server2publichello.php php-cgi
the effect of which is to pass the contents of hello.php
to PHP’s interpreter (i.e., php-cgi
), with any HTTP parameters supplied via an "environment variable" called QUERY_STRING
. Via load
(a function we wrote), we then read the interpreter’s output into memory (via `load). And then we respond to the browser with (dynamically generated) output like
HTTP/1.1 200 OK
X-Powered-By: PHP/5.5.9-1ubuntu4.12
Content-type: text/html
<!DOCTYPE html>
<html>
<head>
<title>hello</title>
</head>
<body>
hello, Alice
</body>
</html>
Even though the PHP code in hello.php
is pretty-printed, it’s output isn’t quite as pretty. (Take a look at hello.php
. Can you deduce why?)
Odds are you’re unfamiliar with open
. That function opens a pipe
to a process (php-cgi
in our case), which provides us with a FILE
pointer via which we can read that process’s standard output (as though it were an actual file).
Notice how this function calls load
, though, in order to read the PHP interpreter’s output into memory and the extraction process is done by extract_headers
, a function you implemented in the previous problem.
list
Ah, here’s that function that generates a directory listing. Notice how much code it takes to generate HTML using C, thanks to requisite memory management.
load
This function loads a file into dynamically allocated memory, storing the address of the loaded file in content (notice how the argument passed into load is BYTE** content
, a pointer that points to a pointer that points to a BYTE) and stores the length of the loaded file in length.
lookup
A simple function that looks up the MIME type of a file and returns the supported extensions (e.g., text/html
), else returns NULL.
reason
This function simply maps HTTP "status codes" (e.g., 200
) to "reason phrases" (e.g., OK
).
redirect
Ah, neat, this function redirects a client to another location (i.e., URL) by sending a status code of 301
plus a Location
header.
request
Ah, this one’s a biggie. But worth reading through. When the server receives a request from a client, the server doesn’t know in advance how many characters the request will comprise. And so this function iteratively reads bytes form the client, one buffer’s worth at a time, calling realloc
as needed to store the entire message (i.e., request).
Notice this function’s use of pointers, dynamic memory allocation, pointer arithmetic, and more. All somewhat familiar by now, but definitely a lot of it all in one place! Do try to understand each and every line, if only for the practice. Ultimately, it keeps reading bytes from the client until it encounters \r\n\r\n
(aka CRLF CRLF
), which, according to HTTP’s specs, marks the end of a request’s headers.
If curious, know that read
is quite like read
except that it reads from a "file descriptor" (i.e., an int
) instead of from a FILE
pointer (i.e., FILE*
). See its man
page for more.
respond
It’s this function that actually sends to a client an HTTP response, given a status code, heads, a body, and that body’s length. For instance, it’s this function that sends a response like the below.
HTTP/1.1 200 OK
X-Powered-By: PHP/5.5.9-1ubuntu4.12
Content-type: text/html
<!DOCTYPE html>
<html>
<head>
<title>hello</title>
</head>
<body>
hello, Alice
</body>
</html>
Know that printf
is quite like printf
(or, really fprintf
) except that the former, like read
, writes to a "file descriptor" instead of a FILE*
.
create
Hmm, seems like this function isn’t fully implemented yet.
listener
Darn, another TODO
. More on that later.
start
Here’s the function that started it all (pun intended). This function finds the path to the server’s root, ensuring it is executable, then calls create
and listen
to help start the server.
stop
Drat, another one. But at least it’s our last!
transfer
This function’s purpose in life is to transfer a file from the server to a client. Whereas interpret
handles dynamic content (generated by PHP scripts), transfer
handles static content (e.g., JPEGs). Notice how this function calls load
in order to read some file from disk.
urldecode
This function, also named after a PHP function, URL-decodes a string, converting special characters like %20
back to their original values.
Service Check
Phew. Now that we’re done with our tour of the distro code, let’s implement the broken parts of the server! Similarly to Part 1, this is a collaboration problem, and you may thus divide the work amongst the two of you however you find fit. Our recommendation is for one person to tackle connect
and listener
while the other handles create
and stop
.
Though this problem may appear to be conceptually challenging, the number of lines of code you will actually write will not be much. If ever struggling, consult your partner for help!
Recall from Part 1, if you’d like to play with the staff’s implementation of server, execute in a terminal window
~cs50/chapterB/server public
and if you’d like to test out your server with curl, execute
curl -i http://localhost:8080/
connect
Complete the implementation of connect
in such a way that the function checks whether a client has connected to the server.
First, define a client address of type struct sockaddr_in
and set the address of the client address to 0. Odds are, you’ll find the function memset
of use.
Then, create a variable of type socklen_t
that holds the length, or size, of the client address variable. Your code should be something reminiscent of
size_t length = x;
with size_t
replaced with socklen_t
and the length of the client address determined by invoking a function reminiscent of a past problem (how do you determine the size of a variable).
Now, assign cfd
, a global variable, to the return value of accept
, passing in the appropriate arguments to accept
. accept
takes in three arguments: a socket file descriptor (which, if you remember, we’ve declared as a global), the address of a sockaddr
variable, the address of the length of address, and an optional fourth argument, which will not be needed in this situation. If curious, take a look at the man
page for accept
. Odds are, your line of code will look something like
cfd = accept(x, y, z);
where x
, y
, and z
are substituted with the appropriate arguments.
Finally, if the value of cfd
has not change from -1
, return false
, else true
.
listener
Complete the implementation of listener
in such a way that the function listens for a connection and announces the port in use when connected.
Let’s first check the man
page for listen
. It seems listen
takes in two arguments a socket file descriptor and a backlog. Let’s first call listen
with our socket file descriptor and SOMAXCONN
as our backlog. For those curious, SOMAXCONN
is simply the max limit, defined in sys/socket.h
, for the number of pending requests the server will take. If the return value of listen
is -1
, call stop
to terminate the server.
Next, let’s do something similar to what we did in connect
. Let’s define another struct sockaddr_in
address without assigning to it any value, then get the length of your new address.
Once more, let’s pull up the man
page for getsockname
. Looks familiar, eh? Let’s call getsockname
and if the return value is -1
, call stop
.
If stop
isn’t invoked, then our printf
statements will print out to the terminal on which port the user server is listening. But if you look at our second print statement, -50
is definitely not the right port that we’re listening on! So delete that TODO
and -50
and replace it with
ntohs(addr.sin_port)
where addr
is the name of your variable of type sockaddr_in
that you declared earlier.
For those curious, ntohs
is a function that converts a value’s byte order from the network byte order to the host’s, or client’s, byte order. Some machines are "little endian" while others are "big endian", which are just names for the byte ordering a system uses, so it’s important for clients and networks to be able to read data regardless of their byte order. For more information on ntohs
, feel free to use its man
page. And if curious about little and big endian, do just take a look at this, but no need to feel obliged!
create
Complete the implementation of create
in such a way that the function creates a server socket.
Let’s first take a look at the man
page for socket
.
Hmm, it seems that socket
takes a domain
, type
, and protocol
, as its arguments. Let’s call socket
, using AF_INET
and SOCK_STREAM
for the domain
and type
and let’s leave protocol
as 0
, since we don’t have a specified protocol for the socket. Let’s also assign the return value of socket
to our global socket variable, sfd
.
If curious, AF_INET
simply means we’d like TCPIP as our communication domain and SOCK_STREAM
let’s socket
know that we’d like our socket to provide a sequenced, reliable stream of bytes for our connection.
According to the man
page of socket
, socket
returns -1
upon failure, so let’s check for failure and if indeed socket
returned -1
, call stop
to terminate the server.
Now, let’s allow for reuse of address, to avoid any "Address already in use" messages. We can do that by writing the following code.
int optval = 1;
setsockopt(x, SOL_SOCKET, SO_REUSEADDR, y, z);
where x
, y
, and z
are the socket file descriptor, the address of the optval variable, and the size of the optval variable respectively.
Finally, let’s bind a name, or more accurately an address, to the socket. Create a server address variable of type struct sockaddr_in
and set the address of the server address to 0. Odds are, you’ll find the function memset
of use. Because your server address is of type struct sockaddr_in
, we can access inside the struct (similar to RGBTriple
way back when), by using dot notation. If I wanted to access a field inside the struct named name
, for example, then I can assign the name
to "David"
by doing
serv_addr.name = "David";
if my variable’s name were serv_addr
. Now assign
-
The
sin_family
field of your variable toAF_INET
-
The
sin_port
field tohtons(port)
-
And the
sin_addr.s_addr
field tohtonl(INADDR_ANY)
For those curious, htons
and htonl
are functions that convert a value’s byte order from the network byte order to the host’s, or client’s, byte order. Some machines are "little endian" while others are "big endian", which are just names for the byte ordering a system uses, so it’s important for clients and networks to be able to read data regardless of their byte order. For more information on the functions, feel free to use their man
pages. And if curious about little and big endian, do just take a look at this, but no need to feel obliged!
Afterwards, let’s replace that ugly /* TODO */ false
inside the if
statement with a real Boolean expression. We’ve created a name for the socket but have yet to assign the name to the socket, so let’s do that now.
Check out the man
page for bind
. Seems like bind
takes three arguments. A socket file descriptor (hmm, sounds like something we’ve already seen in this function), an address (or pointer) to a sockaddr_in
, and the length, or size, of the address. Upon failure, bind
returns -1
so let’s change our if
statement to check if bind
fails. Your expression should look something like
if (bind(x, y, z) == -1)
where x
, y
, and z
are replaced by the appropriate arguments. If finding yourself in a pickle, consult your partner and/or connect
and listen
for inspiration!
stop
Complete the implementation of stop
in such a way that the function frees any allocated memory not freed and closes any file descriptors still open.
Here’s how.
-
Check to see if the global variable,
root
, isNULL
. If not, then callfree
. -
Then check to see if server socket (now, which global variable is that) is open. If still open, call
close
on the server socket. -
Finally, stop the server by calling the
exit
function and passing inerrsv
as its argument.
This was Server (Part 2).