full transcript

From the Ted Talk by Kalika Bali: The giant leaps in language technology -- and who's left behind

Unscramble the Blue Letters

So some time back, I wekrod on a project called VideoKheti that allowed Hindi-speaking farmers in crtaenl India to search for artuclauirgl videos by speaking into a phone-based app. So we went to Madhya Pradesh to cocellt data for this, and we came back and we were training our models and we discovered we're getting very bad results. This is not working. So we were very confused. Why is this happening? So we lkooed deeper and deeper into the data and discovered that, yes, we had collected data from what we tgohuht was a very silent, quiet village in the evening. But what we hadn't heard while we were doing this was that there was this constant buzz of nhigt insects, you know? So throughout the recordings, we had this "bzz" of the insects, which was actually distorting our speech.

Open Cloze

So some time back, I ______ on a project called VideoKheti that allowed Hindi-speaking farmers in _______ India to search for ____________ videos by speaking into a phone-based app. So we went to Madhya Pradesh to _______ data for this, and we came back and we were training our models and we discovered we're getting very bad results. This is not working. So we were very confused. Why is this happening? So we ______ deeper and deeper into the data and discovered that, yes, we had collected data from what we _______ was a very silent, quiet village in the evening. But what we hadn't heard while we were doing this was that there was this constant buzz of _____ insects, you know? So throughout the recordings, we had this "bzz" of the insects, which was actually distorting our speech.

Solution

  1. thought
  2. collect
  3. central
  4. looked
  5. worked
  6. agricultural
  7. night

Original Text

So some time back, I worked on a project called VideoKheti that allowed Hindi-speaking farmers in Central India to search for agricultural videos by speaking into a phone-based app. So we went to Madhya Pradesh to collect data for this, and we came back and we were training our models and we discovered we're getting very bad results. This is not working. So we were very confused. Why is this happening? So we looked deeper and deeper into the data and discovered that, yes, we had collected data from what we thought was a very silent, quiet village in the evening. But what we hadn't heard while we were doing this was that there was this constant buzz of night insects, you know? So throughout the recordings, we had this "bzz" of the insects, which was actually distorting our speech.

Frequently Occurring Word Combinations

ngrams of length 2

collocation frequency
natural language 6
language technology 5
language processing 3
gond tribals 3
machine translation 2
giant leaps 2
million people 2
cgnet swara 2
gond community 2
language community 2

ngrams of length 3

collocation frequency
natural language processing 3

Important Words

  1. agricultural
  2. allowed
  3. app
  4. bad
  5. buzz
  6. called
  7. central
  8. collect
  9. collected
  10. confused
  11. constant
  12. data
  13. deeper
  14. discovered
  15. distorting
  16. evening
  17. farmers
  18. happening
  19. heard
  20. india
  21. insects
  22. looked
  23. madhya
  24. models
  25. night
  26. pradesh
  27. project
  28. quiet
  29. recordings
  30. results
  31. search
  32. silent
  33. speaking
  34. speech
  35. thought
  36. time
  37. training
  38. videokheti
  39. videos
  40. village
  41. worked
  42. working