Machine Learning

Main Types of Machine Learning

  • Unsupervised learning:
  • Discovering hidden properties of data

    Google search, Amazon ”if you like X…”, Netflix “Top picks for you”

  • Supervised learning:
  • Classifying new data from known properties

    BoA automatic fraud detection, Apple Siri, Microsoft WhisperID

  • Reinforcement learning:
  • Making the best decisions now to maximize long-term reward

    Deep Blue, IBM Watson, Geico instant online quotes

Natural Language Processing

  • • There are three basic types of processing going on during human/computer voice interaction
  • – Voice recognition: recognizing human word

    – Natural language comprehension: interpreting human communication

    – Voice synthesis: recreating human speech

  • • Common to all of these problems is the fact that we are using a natural language, which can be any language that humans use to communicate

Voice Recognition

Problems with understanding speech

    – Each person's sounds are unique

    – Each person's shape of mouth, tongue, throat, and nasal cavities that affect the pitch and resonance of our spoken voice are unique

    – Speech impediments, mumbling, volume, regional accents, and the health of the speaker are further complications

Other problems

    – Humans speak in a continuous, flowing manner, stringing words together

    – Sound-alike phrases like “ice cream” and “I scream”

    – Homonyms such as “I” and “eye” or “see” and “sea”

    Humans can often clarify these situations by the context of the sentence, but that processing requires another level of comprehension

    Modern voice-recognition systems still do not do well with continuous, conversational speech


    The plot of frequency changes over time representing the sound of human speech

    A human trains a voice-recognition system by speaking a word several times so the computer gets an average voiceprint for a word

    Used to authenticate the declared sender of a voice message

Voice Synthesis

One Approach to Voice Synthesis

   Dynamic voice generation

    A computer examines the letters that make up a word and produces the sequence of sounds that correspond to those letters in an attempt to vocalize the word


    The sound units into which human speech has been categorized

Another Approach to Voice Synthesis

   Recorded speech

    A large collection of words is recorded digitally and individual words are selected to make up a message

    Since words are pronounced differently in different contexts, some words may have to be recorded multiple times

  • – For example, a word at the end of a question rises in pitch compared to its use in the middle of a sentence