Week 2
Last Time
-
We introduced programming with Scratch, and last week we learned a little bit of C, a textual language.
-
Our first, simple program just printed out
hello, world
:#include <stdio.h> int main(void) { printf("hello, world\n"); }
-
But there was lots of magic going on, and by introducing Scratch first we were able to learn the important concepts first.
-
The extra syntax is a common source of frustration and errors, so today we’ll talk a little bit about how to solve those problems.
-
-
We also learned to use CS50 IDE.
-
And concepts like:
-
functions
-
loops
-
variables
-
Boolean expressions
-
conditions
-
-
And functions in C like
printf
, but really in essence they had analogies in Scratch, like the purplesay
block. -
We learned to include the CS50 library, with helpful functions like:
-
get_char
-
get_double
-
get_float
-
get_int
-
get_long_long
-
get_string
-
…
-
-
And data types like:
-
bool
-
char
-
double
-
float
-
int
-
long long
-
string
-
…
-
-
We also ran into some limits of computing, like integer overflow or float imprecision, which could have severe real-world impact.
Debugging
-
Commonly, errors come from our compiler,
make
orclang
, as we’ll demonstrate:int main(void) { printf("hello, world\n"); }
-
When we try to compile just this, we get an error that looks like:
~/workspace/ $ make buggy0 clang -ggdb3 -O0 -std=c11 -Wall -Werror -Wshadow buggy0.c -lcs50 -lm -o buggy0 buggy0.c:3:5: error: implicitly declaring library function 'printf' with type 'int (const char *, ...)' [-Werror] printf("hello, world\n"); ^ buggy0.c:3:5: note: include the header <stdio.h> or explicitly provide a declaration for 'printf' 1 error generated. make: *** [buggy0] Error 1
-
The first line under our command that starts with
clang
is normal, and just whatmake
is doing for us. -
The next line tells us, in the file
buggy0.c
, line3
, character5
, there is an error. We are implicitly declaring a library function 'printf' … and we might not understand the entire message or every part of it, but we can guess at what parts mean. Declaring a function is like mentioning it or defining it, and recall last time we had to declare a function’s prototype before we could use it. -
Since we didn’t write
printf
, we shouldn’t try to write it ourselves, but rather include the library file that has the implementation, with#include <stdio.h>
in the first line. (Indeed, the last line of the error tells us as much!)
-
-
Now if we
make
our new file, everything should work.#include <stdio.h> int main(void) { printf("hello, world\n"); }
-
The first problem set introduced you to
help50
, a command written by staff which will help explain some errors that you still don’t understand. -
Now let’s try another buggy program:
#include <stdio.h> int main(void) { string s = get_string(); printf("hello, %s\n", s); }
-
Oh boy, trying to
make
that resulted in7 errors generated
. But scrolling up a little and fixing the first one first is generally a good idea. We see:buggy0.c:5:5: error: use of undeclared identifier 'string'; did you mean 'stdin'? string s = get_string(); ^~~~~~ stdin
-
We definitely meant to say
string
, so why is the compiler a little confused? Well,string
andget_string()
both come from the CS50 library, so we need to#include
that too.
-
-
Now let’s try to print a
#
10 times:#include <stdio.h> int main(void) { for (int i = 0; i <= 10; i++) { printf("#\n"); } }
-
Remember that a
for
loop like this one sets some variable to a starting count, checks to see whether we’ve reached the number of repeats, and then runs the code inside and adds to the counter.
-
-
We can compile this without any issues, but when we run it, we see 11
#
symbols, not 10 like we wanted. -
So let’s include a new function from the CS50 Library,
eprintf
, that prints out errors (or anything we want to mark as special) to the screen:#include <cs50.h> #include <stdio.h> int main(void) { for (int i = 0; i <= 10; i++) { printf("#\n"); eprintf("i is now %i\n", i); } }
-
Now we see a bunch of input:
~/workspace/ $ ./buggy0 # buggy0:buggy0.c:9: i is now 0 # buggy0:buggy0.c:9: i is now 1 # buggy0:buggy0.c:9: i is now 2 # buggy0:buggy0.c:9: i is now 3 # buggy0:buggy0.c:9: i is now 4 # buggy0:buggy0.c:9: i is now 5 # buggy0:buggy0.c:9: i is now 6 # buggy0:buggy0.c:9: i is now 7 # buggy0:buggy0.c:9: i is now 8 # buggy0:buggy0.c:9: i is now 9 # buggy0:buggy0.c:9: i is now 10
-
So it seems like
eprintf
is telling us the name of our program, the name of the file it’s from, and the line number it’s on. -
Hm, between 0 and 10, there are actually 11 numbers since we started at 0. So we could change our loop to stop before 10 with
i < 10
, or start withint i = 1
. But conventionally, we like to start counting at the lowest number, 0, and stop before we reach the number of repeats we want. -
So now we can fix our error, remove the
eprintf
, and call this program done.
-
-
Let’s try one more:
#include <cs50.h> #include <stdio.h> int get_negative_int(); int main(void) { int i = get_negative_int(); printf("%i is a negative integer\n", i); } int get_negative_int(void) { int n; do { printf("n is "); n = get_int(); } while (n > 0); return n; }
-
A lot going on here, but we can figure it out. Inside
get_negative_int
, we create a new variablen
, and get an int from the user whilen > 0
. Then we return it. -
get_negative_int()
hasvoid
inside its parentheses, since it doesn’t take any arguments, but it has anint
in front, since that’s the type of data we get back from it. -
Our program compiles, so we probably have a logical error. Let’s run it and try a few numbers:
~/workspace/ $ ./buggy3 n is 1 n is 2 n is 3 n is 50 n is -50 -50 is a negative integer ~/workspace/ $ ./buggy3 n is -1 -1 is a negative integer ~/workspace/ $ ./buggy3 n is -2 -2 is a negative integer ~/workspace/ $ ./buggy3 n is 0 0 is a negative integer
-
Hmm, it seems to work except for that very last case. Well, we could look at our code to try and figure out where that error happens, but once our program gets more complex we’ll need some better tools too.
-
We can use another CS50 tool called
debug50
, which lets us run our program step by step, line by line, and look at what’s happening.
-
-
Let’s run:
~/workspace/ $ debug50 ./buggy3 n is
-
A panel just popped up to the right, with a lot going on:
-
-
But if we type in a number for our program, it seems to do the normal thing and tell us it’s exiting:
~/workspace/ $ debug50 ./buggy3 n is -1 -1 is a negative integer Child exited with status 0 GDBserver exiting ~/workspace/ $
-
This really just means that our program finished and so the debugger will stop running, too.
-
-
So we can click in the extra space next to a line, and a red dot will appear, which will tell the debugger to pause the program:
-
Now as soon as we run the same command, we’ll stop at that line and the debugger will tell us about our variables:
-
So now let’s click the button next to the blue triangle at the top (which reads
Step Over
if you hover over it), which runs this line. -
And now in our terminal we’ll see the prompt, and if we type in
-1
to the prompt again, we’ll see that our variable has changed in the debugger. We’ll also see the line we’re on underCall Stack
in the debugger, and if we press the Step Over button again, we’ll see our-1
printed out. Finally, if we press Step Over one last time (on the ending brace formain
in our program), everything is complete and exits like before. -
Let’s run it again with the same command, but this time instead of clicking
Step Over
, we’ll click the button next to it (that looks like an arrow pointing downwards), andStep Into
the function. -
And all of a sudden we’ll jump to the first statement (the others are declaring variables and constructs, but not doing anything) in our
get_negative_int
function:-
We’ll put in a number,
0
, and pressStep Over
since the next line is justget_int
which we know works. -
But now we’re on the last line,
return n;
, instead of inside the loop like we’d otherwise want. (Remember the whole point of all this was to get a negative number!) -
So we see that our variable
n
is0
, and the condition waswhile (n > 0)
. -
Since
0
isn’t greater than0
, the loop doesn’t continue, and we aren’t asked for another number. -
So all we need was to change the condition to
while (n >= 0)
, and now our program should work correctly.
-
-
There’s another, less technical way to debug called rubber duck debugging. When a programmer is working alone, it’s a useful technique for them to imagine having a toy rubber duck, and explaining their code to the rubber duck. Sometimes, hearing ourselves give an explanation out loud can be useful in realizing where our errors might be!
-
And if that doesn’t work out, CS50 has a whole support staff ready to help!
Things to Know
-
CS50 Problem Sets are evaluated on 4 axes, scope, correctness, design, and style.
-
Scope is the amount of the problem set you’ve attempted.
-
Correctness is whether your program works as it’s supposed to.
-
Design is how well written your program is, based on qualities like efficiency and duplicated code, etc.
-
Style is how well-formatted your code is, where your indents are the same and your variables have appropriate names.
-
-
Each of these axes will have a grade ranging from 1 to 5, and there’s no need to worry about scores of 3s, with some 2s and 4s, in the beginning, since you’ll have lots of room to grow and improve over the semester.
-
The axes are also weighted with the following formula:
-
scope x (correctness x 3 + design x 2 + style x 1)
-
-
CS50 also takes academic honesty very seriously, and over years there’s been a small, but significant fraction of students who have been involved in issues. Since we have the technological ability to compare all submissions to problem sets to each other, and past years, and any sources online, we tend to catch more cases than other courses.
-
Our policy can be shortened to just "Be reasonable." And more descriptively, "The essence of all work you submit to this course must be your own."
-
We’re happy to allow classmates, TFs, friends to help you, but "… when asking for help, you may show your code to others, but you may not view theirs …"
-
This is explained more in the syllabus, and if you still have questions, please reach out to David or our head TFs for guidance!
-
We also have a regret clause, "If you commit some act that is not reasonable but bring it to the attention of the course’s heads within 72 hours, the course may impose local sanctions that may include an unsatisfactory or failing grade for work submitted, but the course will not refer the matter for further disciplinary action except in cases of repeated acts."
Cryptography
-
Now we dive into our first real-world application, crytography, or the ability to send and receive secret (encrypted) messages.
-
We watch a short clip from the movie A Christmas Story, where a child, Ralphie, excitedly decodes a secret message from the radio with a ring that maps letters to other letters, only to find that it is just an advertisement for Ovaltine, a beverage popular many years ago.
-
A method like that can be categorized as secret-key cryptography, where the sender and the receiver know some secret value, code, or phrase that can be used to encrypt and decrypt information.
-
Cryptography can also be reduced to an algorithm that takes inputs and produces outputs.
-
Inputs are the key and plaintext, or the message in unencrypted form that anyone can read, and the output is the ciphertext, or the encrypted message that only someone with the key can decrypt, or unscramble.
-
So let’s start by figuring out what a
string
really is. It’s a sequence of characters, in an array (a list of things right next to each other) in memory. -
If we wanted to store Zamyla’s name,
Zamyla
, we’d put each character in a box:------------------------- | Z | a | m | y | l | a | -------------------------
-
And this is important beacause we want to change one character at a time, say
A
toB
andB
toC
. -
We can see this in action with the following program:
#include <cs50.h> #include <stdio.h> #include <string.h> int main(void) { string s = get_string(); for (int i = 0; i < strlen(s); i++) { printf("%c\n", s[i]); } }
-
First, we include a new library,
string.h
, that includes functions to help us work with strings. Then we get a strings
from the user, and for the length of the string,strlen(s)
, we print out a character, whatevers[i]
is. Ands[i]
is just the notation to get whatever is in that position of an array. So the loop will start withi
set to0
, meaning we gets[0]
, the first character in the string, thens[1]
, thens[2]
, and so on, until each character of the string is printed out:Z a m y l a
-
-
But what happens if, say, a user types in a really really long string or does something that causes an error? Well,
get_string()
, and certain other functions in C, can return a special value calledNULL
. And so to be safe, it’s good practice to check thats
is notNULL
before we try to do something with it:#include <cs50.h> #include <stdio.h> #include <string.h> int main(void) { string s = get_string(); if (s != NULL) { for (int i = 0; i < strlen(s); i++) { printf("%c\n", s[i]); } } }
-
A
!=
means "not equal to" in C, andget_string
can either return a string value orNULL
, so we can proceed ifs != NULL
since it has to be a string value if it isn’tNULL
.
-
-
So now our program, if we just wanted it to print each character on a line, would be correct. But how might we improve the design? Well, recall that a
for
loop first initializes a value, checks a condition, and after each iteration, increments the value. So on each pass of the loop, we’re checking ifi < strlen(s)
. Butstrlen()
is a function that we call, passing its
as an argument, and using the value it returns to compare withi
. Each time, we’re calculating the length of the string even though we don’t have to. A better solution might look like this:#include <cs50.h> #include <stdio.h> #include <string.h> int main(void) { string s = get_string(); if (s != NULL) { for (int i = 0, n = strlen(s); i < n; i++) { printf("%c\n", s[i]); } } }
-
This is a little fancier, but we’re just setting another variable,
n
, to the length ofs
at the beginning, and now we compare two numbers each time and not have to recalculate the length of the string.
-
-
For style, our variable names are short, since we only have a few of them. We could comment our code:
#include <cs50.h> #include <stdio.h> #include <string.h> int main(void) { // ask user for input string s = get_string(); // make sure get_string returned a string if (s != NULL) { // iterate over the characters in s one at a time for (int i = 0, n = strlen(s); i < n; i++) { // print i'th character in s printf("%c\n", s[i]); } } }
-
We explain the more interesting lines of our code in words. The
//
at the beginning of a line marks it as a comment, which means the compiler will ignore it. But now the code is understandable to humans.
-
-
In C, there’s another feature called
typecasting
that lets you convert one type of data to another. Characters are stored in memory as binary numbers, so we can convert them back and forth. -
Remember that ASCII is a standard for mapping characters to letters. Here are some sample ones:
A B C D E F G H I ... 65 66 67 68 69 70 71 72 73 ... a b c d e f g h i ... 97 98 99 100 101 102 103 104 105 ...
-
We can experiment with this program:
#include <stdio.h> int main(void) { for (int i = 65; i < 65 + 26; i++) { printf("%c is %i\n", (char) i, i); } }
-
We print out
i
as a character by typecasting it, using(char) i
to tell our program to treati
as a character.
-
-
Now if we compile and run our program, we get:
A is 65 B is 66 C is 67 ... Z is 90
-
But we can actually just say
printf("%c is %i\n", i, i);
, andi
will be printed out as a character too, sinceprintf
knows%c
means thati
should be formatted as a character. -
But wait, if we can treat numbers like characters, we can also treat characters like numbers:
#include <stdio.h> int main(void) { for (char c = 'A'; c <= 'Z'; c++) { printf("%c is %i\n", c, c); } }
-
Now we’re using
c
like we would an integeri
, and this iterates through all the capital letters like before. And since characters have an integer value due to ASCII, we can compare them to each other.
-
-
There’s another pattern with ASCII characters: a lowercase letter has a value exactly 32 higher than the same letter in uppercase.
-
Maybe we can apply this:
#include <cs50.h> #include <stdio.h> #include <string.h> int main(void) { string s = get_string(); if (s != NULL) { for (int i = 0, n = strlen(s); i < n; i++) { if (s[i] >= 'a' && s[i] <= 'z') { printf("%c", s[i] - ('a' - 'A')); } else { printf("%c", s[i]); } } printf("\n"); } }
-
The first few lines we’re familiar with already, where we’re getting a string from the user, and iterating over each character of the string.
-
Inside the loop, for each character, if the character is between
a
andz
inclusive, which means if it’s lowercase, we print outs[i] - ('a' - 'A')
, which is the character minus the difference between a lowercase letter and an uppercase letter. Which makes it uppercase! (We could have also just used32
, but it’s more understandable to show where we got that value from.) -
Otherwise, if it’s not a lowercase value, we just print it out.
-
-
But we could even use a function that comes with C,
toupper
, in the libraryctype.h
(and we’d discover it through looking around reference books or online):#include <cs50.h> #include <ctype.h> #include <stdio.h> #include <string.h> int main(void) { string s = get_string(); if (s != NULL) { for (int i = 0, n = strlen(s); i < n; i++) { if (islower(s[i])) { printf("%c", toupper(s[i])); } else { printf("%c", s[i]); } } printf("\n"); } }
-
And in fact,
toupper
only changes lowercase letters to uppercase, so we can really just:#include <cs50.h> #include <ctype.h> #include <stdio.h> #include <string.h> int main(void) { string s = get_string(); if (s != NULL) { for (int i = 0, n = strlen(s); i < n; i++) { printf("%c", toupper(s[i])); } printf("\n"); } }
-
We can use the command
man toupper
in our terminal to read about functions or commands, likeman strlen
orman printf
. We can use our arrow keys to scroll up and down (it’s pretty old school), andq
to quit. -
And even if
toupper
didn’t come with C, it would have been better design for us to implement it as a separate function, since ourmain
function is much easier to understand now.
-
-
Let’s go even further in exploring strings by trying to implement a different function,
strlen
, ourselves:#include <cs50.h> #include <stdio.h> int main(void) { string s = get_string(); int n = 0; while (s[n] != '\0') { n++; } printf("%i\n", n); }
-
We’ll get a string as usual, and create a variable
n
to store some number. We’ll start it at0
, and whiles[n]
, the character at that index ins
, isn’t something called\0
, we’ll increasen
. -
But why does this work? Turns out, strings are stored with a character at the end marking the end of a string, since there’s no predetermined length, so a string in memory really looks like:
------------------------------ | Z | a | m | y | l | a | \0 | ------------------------------
-
-
We can represent more of our computer’s memory as a grid:
----------------------------------- | Z | a | m | y | l | a | \0 | A | ----------------------------------- | n | d | i | \0 | | | | | ----------------------------------- | | | | | | | | | ----------------------------------- | | | | | | | | | -----------------------------------
-
We can imagine each byte (each box in this grid) of memory as labeled from
0
to31
, since there are 32 bytes total. And to keep track of where strings start, we can simply remember the start of our string in memory, in the case ofZamyla
,0
, andAndi
,7
. Indeed, a string in C is just the location of the first character in memory. -
And with
\0
, C indicates the end of our string.
-
-
Integers, and other data types, too, are stored in the same way in memory, even if they take up more bytes.
-
Once we understand that data is just bytes in memory, we can manipulate all of it and can do everything by writing a program.
-
And https://reference.cs50.net has lots of useful information too, about functions that come with C.
Command-Line Arguments
-
Let’s use what we’ve learned to dig deeper into command-line arguments.
-
So far, we’ve used
int main(void)
to start our programs. Thevoid
keyword in particular is indicating that our program takes no arguments. -
But what if we want to write programs that take input from the command line, or words after the program name when we run it in the terminal? For example, we might run
make hello
ormake cough0
, and the second word there is an argument to our programhello
. -
It turns out, we can start our program with this:
int main(int argc, string argv[])
and it will receive those command-line arguments. -
Now our program will receive two arguments. The first is an integer named
argc
, as in argument count, that tells us how many arguments we got. The second is an array, or list, of strings, calledargv
, as in argument vector. This list of strings can be accessed with the same syntax as we do for characters in a string (since a string is just an array of characters), likeargv[0]
. -
Let’s see this in action:
#include <cs50.h> #include <stdio.h> int main(int argc, string argv[]) { if (argc == 2) { printf("hello, %s\n", argv[1]); } else { printf("hello, world\n"); } }
-
This program, when we run it, will return something like the following if we give it a command-line argument:
~/workspace/ $ ./argv0 hello hello, hello
-
We used
argv[1]
becauseargv[0]
is always the name of the program itself. -
When we run just
./argv0
,argc
passed to our program will be1
, so it will just sayhello, world
.
-
-
Let’s do this, to see how we can iterate over an array:
#include <cs50.h> #include <stdio.h> int main(int argc, string argv[]) { for (int i = 0; i < argc; i++) { printf("%s\n", argv[i]); } }
-
This program prints out each argument, or each string in
argv
, as it goes through the indexes from0
toargc
, which tells us how many strings are inargv
.
-
-
We can be even cooler. Since we know
argv
is an array of strings and each string is an array of characters, we can directly access characters fromargv
:#include <cs50.h> #include <stdio.h> #include <string.h> int main(int argc, string argv[]) { // iterate over strings in argv for (int i = 0; i < argc; i++) { // iterate over characters in current string for (int j = 0, n = strlen(argv[i]); j < n; j++) { // print j'th character in i'th string printf("%c\n", argv[i][j]); } printf("\n"); } }
-
The outer
for
loop, withi
, is iterating over each string inargv
. -
The inner
for
loop, withj
, looks atargv[i]
, and for each character in it, prints it on a new line. -
Then the inner loop repeats for the next string.
-
With
argv[i][j]
we can get an individual character inargv
.
-
-
So what about
main
's output? It turns out,main
also returns some value by default. When a program exits successfully, it returns a number0
to indicate as much. A non-zero number, on the other hand, is used to represent an error. -
Of course, we want to see this firsthand:
#include <cs50.h> #include <stdio.h> int main(int argc, string argv[]) { if (argc != 2) { printf("missing command-line argument\n"); return 1; } printf("hello, %s\n", argv[1]); return 0; }
-
Now, if the program doesn’t get a command-line argument, the program will quit by returning, and with the value
1
. -
Otherwise, we’ll print the argument and explicitly return the value
0
as we exit.
-
-
We can see the exit code in terminal like this:
~/workspace/ $ ./exit missing command-line argument ~/workspace/ $ echo $? 1
-
$?
is a magic symbol for the previous program’s exit code, andecho
is a command-line program that just prints out values. -
We might not look for this often, but debuggers and other programs might look for it to determine if there were any errors.
-
-
Don’t forget, an array is a chunk of continuous memory, with each of the elements inside contiguous, back-to-back-to-back. And those elements are generally the same data type, as we commonly have arrays of characters or integers. We’ll need this construct to solve more complex problems, like sorting and searching.
-
Phew, what an exciting day! More fun to come next week!