Lecture 12
Duke University
STA 199 - Fall 2022
10/6/22
ae-08
Let’s tidy up our plot a bit more!
Every time we use apps, websites, and devices, our data is being collected and used or sold to others.
More importantly, decisions are made by law enforcement, financial institutions, and governments based on data that directly affect the lives of people.
What pieces of data have you left on the internet today? Think through everything you’ve logged into, clicked on, checked in, either actively or automatically, that might be tracking you. Do you know where that data is stored? Who it can be accessed by? Whether it’s shared with others?
What are you OK with sharing?
Have you ever thought about why you’re seeing an ad on Google? Google it! Try to figure out if you have ad personalization on and how your ads are personalized.
05:00
Which of the following are you OK with your browsing history to be used towards?
Suppose you create a profile on a social media site and share your personal information on your profile. Who else gets to use that data?
Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.
Researchers Emil Kirkegaard and Julius Daugbjerg Bjerrekær
Randomly select 10 words from the Gettysburg Address and calculate the mean number of letters in these 10 words. Submit your answer at bit.ly/gburg199.
What might be the reason for Google’s gendered translation? How do ethics play into this situation?
ae-09
- Part 1ae-09
(repo name will be suffixed with your GitHub name).2016 ProPublica article on algorithm used for rating a defendant’s risk of future crime:
In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways.
The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
White defendants were mislabeled as low risk more often than black defendants.
What is common among the defendants who were assigned a high/low risk score for reoffending?
How can an algorithm that doesn’t use race as input data be racist?
ae-09
- Part 2ae-09
(repo name will be suffixed with your GitHub name).