Lecture 10
Cookies
-
Don’t forget that we can import a function from CS50’s Python library called
eprint, which helps us debug our program by printing the filename and line number from where it was called. -
There are many websites like Facebook where we log in when we first visit, and not have to log in every time we move around that website after. Somehow, the website must remember who we are. And they store information on the server, like in CSV files or local variables like we saw last time. The server can’t necessarily rely on the IP address, since one household or network might have one public IP address that many devices share.
-
A cookie (in this context) is a small piece of information unique to our login that a web server gives to our browser, that our browser can send back in the future, to verify who we are. We can think of it like a digital handstamp that we might get at an amusement park, so we can get in or out easily.
-
Our browser stores cookies for us and sends it back as part of an HTTP request in the headers. When we first log in, some of the headers in the reply from the server might look like this:
HTTP/1.1 200 OK Content-Type: text/html Set-Cookie: session=value ...-
sessionis the key for the cookie, like a variable name, andvaluewill be our unique handstamp, which might be a large random number that’s hard for others to guess.
-
-
Now our browser, in future requests, will include that cookie in the headers:
GET / HTTP/1.1 Host: www.facebook.com Cookie: session=value ...-
Facebook’s servers stores the values of each account’s cookies, so it can know who we are from that cookie.
-
-
As individual users, we might want to delete our cookies. If someone has access to the cookies our browser is storing, they could pretend to be us by including them in requests.
-
HTTPS encrypts the messages between our browser and the receiving server, so someone along the network won’t be able to steal our cookies. A few years ago, a security researcher created a program that anyone can run, to capture all wireless traffic in the room, and show all the unencrypted cookies that were sent. That motivated most websites to use only HTTPS for sending information like this. And we ourselves might clear the cookies our browser has stored, so there’s no trace of which websites we’ve visited in the past.
-
Let’s take a look at
application.pyfrom ourstoredemo:from flask import Flask, redirect, render_template, request, session from flask_session import Session # Configure app app = Flask(__name__) # Configure sessions app.config["SESSION_PERMANENT"] = False app.config["SESSION_TYPE"] = "filesystem" Session(app) @app.route("/") def index(): return render_template("index.html") @app.route("/update", methods=["POST"]) def update(): for item in request.form: session[item] = int(request.form.get(item)) return redirect("/cart") @app.route("/cart") def cart(): return render_template("cart.html", cart=session)-
If we had a web application like a store, where we wanted to add items to a virtual cart, we could store that information on our server, linked to each user’s cookie.
-
We create a Flask application called
app, and set up one of its built-in features, Session, which allows us to set information about a user’s session. -
We set the default route to be
index.htmlas usual. -
The
/cartroute leads to a template where we pass in thesessionvariable. Andsessionis tracked automatically by our server’s Flask application. Incart.html, we see how that variable is used:{% extends "layout.html" %} {% block body %} <h1>Cart</h1> {% for item in cart %} {{ item }} : {{ cart[item] }} <br/> {% endfor %} <p> <a href="/">Continue shopping</a>. </p> {% endblock %}-
We include the standard HTML from
layout.html, and we add a heading ofCartto our page. Then, for eachitemin the variablecart(which wassessionwhen we passed it in), we print the key ofitemand the value of it incart.cartis a dictionary, or hash table, in which we can store values for keys. Here we are iterating over all the keys, like"foo","bar", and"baz", so this loop would translate that tocart["foo"],cart["bar"], andcart["baz"]to print each one’s value.
-
-
-
If we visit our store, we see this page:
-
"Foo", "bar", and "baz" are generic names in Computer Science that we can use to refer to some objects that don’t need a specific name.
-
We have a form where we can set the quantity of each item that we want, and if we go our cart, we’ll see the quantities for each item as we input it.
-
-
To see how, we can open Chrome’s Developer Tools, and use the Network tab to see that our server is indeed responding with some values for setting a cookie with a
sessionin the Response Headers: -
And if we go back to the relevant section of our server’s code, we can see how we update the quantities stored every time the form is submitted, with the
updateroute:@app.route("/update", methods=["POST"]) def update(): for item in request.form: session[item] = int(request.form.get(item)) return redirect("/cart")-
For each
itemin the form, we set the value in oursessiondictionary to theintvalue (like casting in C) of the value sent through the form.
-
-
And if we looked in the directory where our
storeapplication is, we would see a directory calledflask_session, in which there are files for eachsessionour server has created.
Databases
-
Many web applications, including the ones we’ve demonstrated in lectures, follow a certain paradigm called MVC, Model-View-Controller. We have been writing Python code that act as controllers, the logic that controls where the user goes and what our application actually does with their inputs, etc. The HTML templates, in which we placed our data to display to the user, are views. This separates the logic of our application from the aesthetics and formatting. And finally, today we will see how models, or data, are stored and accessed.
-
SQL, Structured Query Language, is a programming language that we use to talk to a database, a program that stores data and responds to requests for data, like a web server. And it has advanced features like searching and filtering data quickly.
-
Spreadsheet programs like Excel or Google Sheets allow us to store data in rows and columns, and this layout is shared with relational databases, which we will be using. We commonly use the columns in the top row for headers like "Name" and "Dorm", and each row after will be an entry:
-
But with spreadsheet programs, we run into limitations with how many rows we can store, and we can only search linearly.
-
-
The database program we will use is SQLite, which stores our data in binary format and provides advanced, efficient features.
-
We can use yet another program, called phpLiteAdmin, to manage our databases, or use a command-line program,
sqlite3. -
And we can connect and talk to our database with Python code.
-
In SQL, we’ll use a few basic operations:
-
CREATE … -
INSERT … -
SELECT … -
UPDATE … -
DELETE … -
…
-
-
With those verbs, we can write statements like these, and run them one at a time:
-
CREATE TABLE 'registrants' ('id' INTEGER PRIMARY KEY, 'name' TEXT, 'dorm' TEXT)-
This is the command we’d use to create a table named
registrants, and we specify the columns we’d like to have in our table. More interestingly, we specify the type of data that each column will be, so our database can optimize how it stores and accesses data. Theidcolumn is an integer we can use to uniquely identify each row, andnameanddormwill be text fields.
-
-
INSERT INTO "registrants" ("id", "name", "dorm") VALUES(1, 'David', 'Matthews')-
Now we specify the columns of data we want to provide for a new row, and the values for each column.
-
-
SELECT * FROM "registrants"-
*allows us to select all rows, since*matches all values.
-
-
UPDATE "registrants" SET "name" = 'David Malan' where id = 1-
We can also change values for rows, but ensure that we only set it for the row we want to change with
where id = 1.
-
-
DELETE FROM "registrants" WHERE id = 1-
And we can delete rows, too.
-
-
-
And our database, SQLite, can have other data types, too:
-
BLOBstands for "binary large object", or binary data. -
INTEGER -
NULLis something we can use to specify that there is no value for a particular column, like in C. -
REALis a real number, or floating-point value. -
TEXT -
DATETIMEstores dates and times. -
NUMERICstores numbers, whether they are integers or floating-point.
-
-
SQL has functions to help us figure out the time, like
date,time, anddatetime. -
For phone numbers and zip codes, we might want to store the data as text, since leading zeroes are usually removed from numbers.
-
We also have a few options we can apply to storing data like names and addresses. We might want separate first name and last name columns, as well as street, city, and state, so we can search or filter on each of those pieces individually.
-
And if we have lots of people sharing the same address, we’ll have redundant information in our database. With just the zip code, for example, we should be able to figure out the city and state.
-
So if we were using a spreadsheet program, we can have another sheet, "cities", and store that data separately:
-
Now, our "customers" sheet has less redundancy:
-
And we can assign an ID to each row in our cities sheet, and only refer to each one in our customers sheet by the ID, in case any of the other values change.
-
Now, to find the actual city and state of someone, we need to take the added step of looking up their zip code in the cities sheet, but we gain the advantage of using less space and not having to update all the rows if something changes.
-
In today’s lecture code, we have a file called
lecture.db. If we upload that to our IDE, and double-click it, it opens in a program called phpLiteAdmin: -
This shows us the content of that database, and allows us to visualize it. Now we’ll see all the tables, and all the rows in each of the tables. We can click the SQL tab, and run commands:
-
In this particular command, we are selecting everything in the
Albumtable, but we are joining theArtisttable, based on theArtistIdstored in each row of theAlbumtable. So instead of seeing some ID for theArtistIdcolumn, we’ll see all the fields stored in theArtisttable.
-
-
And we can have more complex relations. A many-to-many relationship might be between our
Playlisttable and ourSongtable, where a playlist has many songs, and a particular song might be on many playlists. ThePlaylistTracktable represents this, by indicating thePlaylistIdandTrackIDfor every song on a playlist: -
We can also assign other attributes to columns:
-
PRIMARY KEY, where this column will be used to uniquely identify rows. -
UNIQUEmeans that the field will be unique for every row, but not used to identify rows in joins. -
INDEXmeans that we want the database to store the field in some index to speed up searches in the future, if we anticipate searching on that field frequently. -
NOT NULLmeans that the field has to have some value, and can’t be blank. -
FOREIGN KEYwe’ll come back to again later, but means that it is referring to a row in some other table.
-
-
Let’s run a command to create an empty file,
touch customers.db. Now if we double-click it, phpLiteAdmin will open with a blank database, and we can create a new table calledusers: -
We can specify the columns in our table like so:
-
Autoincrement tells our database to automatically increase the
idfield for us, and we can also specify the default value for a column if we want it to not be null.
-
-
After we click the create button, phpLiteAdmin also tells us the SQL command of what it did:
-
We can do the same in the terminal by running the
sqlite3command. We’ll runsqlite3 alternative.dbto use a different database file, and run the sameCREATEcommand: -
By running
.schema, a built-in SQLite command, we can see what our tables look like. We can do the same for ourlecture.dbfile:-
We see our
Albumtable, and each of the columns inside. AFOREIGN KEYis a reference to a row in another table, and we can indicate how that reference is made.
-
-
Now that we have an idea of how databases work, let’s see how we can access them from our web applications.
-
Let’s write a simple program to test out accessing databases:
from cs50 import SQL db = SQL("sqlite:///lecture.db") # Query database for all albums rows = db.execute("SELECT * FROM Album") # For each album in database for row in rows: # Print title of album print(row["Title"])-
By using the
SQLclass from the CS50 Library, we can easily connect to a local database file with the SQLite program. Then we can usedb.executeto run commands on that database, that we pass in as strings. Androwswill be a list of dictionaries, with fields we can access. -
Now if we run this program, we’ll see all the album titles printed out. And we’ll also see a debugging line printed out by the CS50 Library, that shows what query was actually executed.
-
-
We can also pass variables into the command we want to execute, which will then be substituted:
rows = db.execute("SELECT * FROM Album WHERE Title = :t", t=sys.argv[1])-
sys.argv[1]is the command-line argument our program gets.
-
-
We pass in the variables individually so the CS50 Library can sanitize, or remove dangerous characters from the user’s input, for us. If we had passed in a formatted string like so:
rows = db.execute(f"SELECT * FROM Album WHERE Title = '{sys.argv[1]}'")-
Someone could pass in code that would then be added into the query. They could run
haha ; DELETE FROM Albumand the semicolon would end theSELECTcommand, and then run anotherDELETEcommand.
-
-
The name for this class of issues is an SQL injection attack. For example, when we log in to some website we might fill out the username and password fields of some form, and the back-end might try to select from the database a row that has a matching username and password.
-
But if the server uses SQL, then what we pass in to the form might be directly included in the query. For example, suppose we pass in
me@examplemailprovider.comas the username and' OR '1' = '1as our password: -
SQL happens to have the keyword
OR, and it looks like this password value will change the meaning of a SQL query, if it’s included directly with single quotes. -
Suppose the back-end code looks like this:
username = request.form.get("username") password = request.form.get("password") db.execute(f"SELECT * FROM users WHERE username = '{username}' AND password = '{password}'"-
So if
usernameandpasswordis substituted in, our query will actually end up being:db.execute("SELECT * FROM users WHERE username = 'me@examplemailprovider.com' AND password = '' OR '1' = '1'") -
And that will select the row with
username = 'me@examplemailprovider.com'even ifpassworddoesn’t match, since1is always equal to1.
-
-
If we sanitize our input by passing in variables to the CS50 Library separately, the query after the input is substituted will look like this, with dangerous characters escaped with backslashes:
db.execute("SELECT * FROM users WHERE username = 'me@examplemailprovider.com' AND password = '\' OR \'1\' = \'1'") -
Let’s combine this into a web application, with lecture0 in the source directory:
from flask import Flask, render_template, request from cs50 import eprint, SQL app = Flask(__name__) db = SQL("sqlite:///lecture.db") @app.route("/") def index(): rows = db.execute("SELECT * FROM Album") return render_template("index.html", albums=rows)-
We connect to our local database, and pass all the rows we get from the
Albumtable into our template, which looks like:{% extends "layout.html" %} {% block body %} <ul> {% for album in albums %} <li>{{ album["Title"] }}</li> {% endfor %} </ul> {% endblock %} -
Now if we run our webserver, we’ll see an unordered, bulleted list of our album titles.
-
-
We can change the query to be from the URL:
@app.route("/") def index(): query = request.args.get("q") eprint(query) rows = db.execute("SELECT * FROM Album WHERE Title = :q", q=query) return render_template("index.html", albums=rows)-
Now, if we add
?q=Somethingto the URL, we’ll see only the album titles that match that query.
-
-
With these building blocks, we’ll build a web application in Problem Set 7 that simulates buying and selling stocks, with real prices!