full transcript

From the Ted Talk by Mainak Mazumdar: How bad data keeps us from good AI

Unscramble the Blue Letters

In June of this year, we saw embarrassing bias in the Duke University AI meodl claled PULSE, which enhanced a blurry image into a recognizable photograph of a preson. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American iamges were underrepresented in the training set, leading to wrong decisions and prteiincods. Probably this is not the first time you have seen an AI misidentify a Black person's igame. Despite an improved AI methodology, the underrepresentation of raaicl and ethnic populations still left us with biased results.

Open Cloze

In June of this year, we saw embarrassing bias in the Duke University AI _____ ______ PULSE, which enhanced a blurry image into a recognizable photograph of a ______. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American ______ were underrepresented in the training set, leading to wrong decisions and ___________. Probably this is not the first time you have seen an AI misidentify a Black person's _____. Despite an improved AI methodology, the underrepresentation of ______ and ethnic populations still left us with biased results.

Solution

  1. model
  2. images
  3. called
  4. image
  5. predictions
  6. person
  7. racial

Original Text

In June of this year, we saw embarrassing bias in the Duke University AI model called PULSE, which enhanced a blurry image into a recognizable photograph of a person. This algorithm incorrectly enhanced a nonwhite image into a Caucasian image. African-American images were underrepresented in the training set, leading to wrong decisions and predictions. Probably this is not the first time you have seen an AI misidentify a Black person's image. Despite an improved AI methodology, the underrepresentation of racial and ethnic populations still left us with biased results.

Frequently Occurring Word Combinations

ngrams of length 2

collocation frequency
data quality 3
data infrastructure 3
biased data 2
wrong decisions 2
million people 2
census data 2

Important Words

  1. ai
  2. algorithm
  3. bias
  4. biased
  5. black
  6. blurry
  7. called
  8. caucasian
  9. decisions
  10. duke
  11. embarrassing
  12. enhanced
  13. ethnic
  14. image
  15. images
  16. improved
  17. incorrectly
  18. june
  19. leading
  20. left
  21. methodology
  22. misidentify
  23. model
  24. nonwhite
  25. person
  26. photograph
  27. populations
  28. predictions
  29. pulse
  30. racial
  31. recognizable
  32. results
  33. set
  34. time
  35. training
  36. underrepresentation
  37. underrepresented
  38. university
  39. wrong
  40. year