Introduction One of the main questions in oral research is how human beings can successfully recognize speech despite its diversity. For example, speaker's speech speed, dialect, and even the language's syllable ratio are different (Newman & Sawusch, 1996). Speech words are usually distorted (Dilley & Pitt, 2010), like a normal continuous speech where the speaker overlaps words. In some cases, overlapping of adjacent words becomes severe, and words may appear to disappear (for example,
"Characteristics of Language Perception" (Sternberg, p. 352): Since the word "special" itself is ambiguous, it is difficult to evaluate this description. The word "special" can suggest at least three possibilities. Because speech perception is special in audition, or speech perception is special in auditions. Production and processing tracking (ncbi 2009). There is not enough evidence to support this claim, so it should be accelerated (talkbrains 2008).
Audio is a complex phenomenon. People rarely understand that it is understood how it is being generated. Rustic perception is that speech is usually composed of words, each word is composed of one phone (unique voice or gesture, and another concept different from phoneme). The reality is actually quite different. Audio is a dynamic process that does not clearly distinguish elements. A useful memo is to use the sound editor to watch and listen to recordings of lectures. The explanation of speech is somewhat stochastic. This means there is no clear boundary between units and words. The conversion from voice to text is never 100% correct. This idea is uncommon for software developers who generally use deterministic systems. This creates many questions for voice technology only.