Tempo sensation of beginner learners of Spanish : comparison of authentic native speech, the production of artificial intelligence and Spanish coursebook recordings

Vol.47,No.1(2026)

Abstract
In this research, I aim to examine whether tempo or pauses contribute more to perception from the view of speed and intelligibility, and whether this is influenced by the source of the audio file: if it is a coursebook recording, has been produced by an everyday speaker or has been generated by artificial intelligence. Hungarian beginner-level learners of Spanish have filled in four questionnaires in which they listened to and evaluated audio samples according to perceived speed and intelligibility. The recordings of the questionnaires came from the aforementioned sources and differed from each other based on tempo (normal or slow) and pauses (present or absent), but they all contained the same content (text). The results show that most students correctly identified the fastest and slowest samples and selected these recordings as the most easily and most difficult to understand. However, the responses do not seem to have any relationship with the source of the recordings. Although no clear tendencies have been found, the obtained data suggest that rather tempo and not pauses influences the perception of speed and intelligibility.

Keywords:
artificial intelligence; intelligibility; perception; Spanish as a foreign language; speech rate

Pages:
413–436
References

Ananeva, A., & Kochetkova, U. (2025). Influence of Linguistic and Sociolinguistic Factors on Speech Rate Perception. In A. Karpov & V. Delić (Eds.), Speech and Computer. Proceedings of the 26th International Conference, SPECOM 2024, Part I (pp. 251–264). Springer. https://doi.org/10.1007/978-3-031-77961-9_19

Arnold, J. (2000). Seeing through Listening Comprehension Exam Anxiety. TESOL Quarterly, 34(4), 777–786. https://doi.org/10.2307/3587791

Baditzné Pálvölgyi, K. (2015). A szóbeli kifejezőkészség fejlesztendő elemei a spanyolnyelv-órán. In É. Major & E. Tóth (Eds.), Szakpedagógiai körkép II. Idegennyelv-pedagógiai tanulmányok (pp. 131–151). ELTE Eötvös Loránd Tudományegyetem. Retrieved from: https://metodika.btk.elte.hu/dstore/document/5810/3-TAMOP_BTK_BMK_3-Szakpedag%C3%B3giai_k%C3%B6rk%C3%A9p_II.pdf

Baditzné Pálvölgyi, K. (2024). Tananyagfejlesztés kezdő spanyol orvosi szaknyelv órákon a mesterséges intelligencia segítségével. PORTA LINGUA, 2, 135–144. https://doi.org/10.48040/PL.2024.2.11

Banzina, E., & Dilley, L. C. (2010). Context speech rate and duration as cues to native and non-native perception of casually-spoken words in Russian. In Proceedings of Speech Prosody 2010. https://doi.org/10.21437/SpeechProsody.2010-4

Boersma, P., & Weenink, D. (2024). Praat: doing phonetics by computer (Version 6.4.21.) [Software]. Retrieved from: https://www.fon.hum.uva.nl/praat/

Bóna, J. (2007). A felgyorsult beszéd produkciós és percepciós sajátosságai [Doctoral Thesis, ELTE Eötvös Loránd University]. Retrieved from: https://doktori.btk.elte.hu/lingv/bona/Phd_dolgozat_BonaJudit.pdf

Bóna, J. (2008). A beszédtempó pedagógiai vonatkozásai. Anyanyelv-pedagógia, 1(1). Retrieved from: https://www.anyanyelv-pedagogia.hu/cikkek.php?id=16

Bóna, J. (2014). Temporal characteristics of speech: The effect of age and speech style. The Journal of the Acoustical Society of America, 136(2), 116–121. https://doi.org/10.1121/1.4885482

Bóna, J. (2016). Temporális sajátosságok a beszédben. In J. Bóna (Ed.), Fonetikai olvasókönyv (pp. 159–173). ELTE Fonetikai Tanszék. https://doi.org/10.18425/FONOLV.2016.13

Bond, R. N., Feldstein, S., & Simpson, S. (1988). Relative and Absolute Judgments of Speech Rate from Masked and Content-Standard Stimuli. The Influence of Vocal Frequency and Intensity. Human Communication Research, 14(4), 548–568.

Bosker, H. R. (2016). Our own speech rate influences speech perception. In Proceedings of Speech Prosody 2016 (pp. 227–231). https://doi.org/10.21437/SpeechProsody.2016-47

Byrd, D. (1992). Preliminary results on speaker-dependent variation in the TIMIT database. Journal of the Acoustical Society of America, 92(1), 593–596. https://doi.org/10.1121/1.404271

Corpas, J., García, E., & Garmendia, A. (2020). Aula Internacional Plus 1. Difusión.

den Os, E. (1985). Perception of Speech Rate of Dutch and Italian Utterances. Phonetica, 42(2–3), 124–134. https://doi.org/10.1159/000261743

Dilley, L. C., & Pitt, M. A. (2008, November 13–16). Now You Hear It, Now You Don't: Effects of Speech Rate on Function Word Perception [Abstract]. 49th Annual Meeting of the Psychonomic Society, Chicago. Retrieved from: https://cdn.ymaws.com/www.psychonomic.org/resource/resmgr/Annual_Meeting/Past_and_Future_Meetings/2008/Abstracts08.pdf

Eefting, W., & Rietveld, A. C. M. (1989). Just noticeable differences of articulation rate at sentence level. Speech Communication, 8, 355–361. https://doi.org/10.1016/0167-6393(89)90017-4

Feldstein, S., & Bond, R. N. (1981). Perception of Speech Rate as a Function of Vocal Intensity and Frequency. Language and Speech, 24(4), 387–394. Retrieved from: http://www.communicationcache.com/uploads/1/0/8/8/10887248/perception_of_speech_rate_as_a_function_of_vocal_intensity_and_frequency.pdf

Feldstein, S., Dohm, F.-A., & Crown, C. L. (1993). Gender as a mediator in the perception of speech rate. Bulletin of the Psychonomic Society, 31(6), 521–524. https://doi.org/10.1080/00224540109600588

Fónagy, I. (1967). Áthajlás, szünet, szerkezet. Nyelvtudományi Közlemények, 69(2), 313–343. Retrieved from https://epa.oszk.hu/04100/04182/00187/pdf/EPA04182_nyelvtudomanyi_kozlemenyek_1967_69_2_313-343.pdf

Gocsál, Á. (2001). Gyorsabban beszélnek-e a nők, mint a férfiak? Beszédkutatás, 9, 61–72.

Goldman-Eisler, F. (1961). The significance of changes in the rate of articulation. Language and Speech, 4(3), 171–174. https://doi.org/10.1177/002383096100400305

Gósy, M. (1997). A magyar beszéd tempója és a beszédmegértés. Magyar Nyelvőr, 121(2), 129–139. Retrieved from https://adt.arcanum.com/hu/view/MagyarNyelvor_1997/?pg=134&layout=s

Gósy, M. (2000). A beszédszünetek kettős funkciója. Beszédkutatás 8, 1–14. Retrieved from: https://real-j.mtak.hu/4687/1/Beszedkutatas_2000.pdf

Gósy, M. (2004). Fonetika, a beszéd tudománya. Osiris Kiadó.

Gósy, M. (2005). Pszicholingvisztika. Osiris Kiadó.

Gósy, M., & Menyhárt, K. (Eds.). (2003). Szöveggyűjtemény a fonetika tanulmányozásához: elméleti, kísérleti és alkalmazott beszédkutatás. Nikol.

Grosjean, F., & Lane, H. (1974). Effects of Two Temporal Variables on the Listener's Perception of Reading Rate. Journal of Experimental Psychology, 102(5), 893–896. https://doi.org/10.1037/h0036323

Grosjean, F., & Lane, H. (1976). How the Listeners Integrates the Components of Speaking Rate. Journal of Experimental Psychology: Human Perception and Performance, 2(4), 538–543. https://doi.org/10.1037/0096-1523.2.4.538

Gyarmathy, D., Auszmann, A., & Neuberger, T. (2016). Az anyanyelvi és az idegen nyelvi spontán beszéd temporális jellemzői. Anyanyelv-pedagógia, 9(1), 5–19. https://doi.org/10.21030/anyp.2016.1.1

Henry, M. J., Dilley, L. C., Vinke, L. N., & Weinland, C. J. (2009). Duration and context speech rate as cues to lexical perception and word segmentation. Journal of the Acoustical Society of America, 125, 2655. https://doi.org/10.1121/1.4784166

Horváth, V. (2010). Filled pauses in Hungarian: Their phonetic form and function. Acta Linguistica Hungarica, 57(2–3), 288–306. https://doi.org/10.1556/aling.57.2010.2-3.6

Horváth, V. (2014). Hezitációs jelenségek a magyar beszédben. Beszéd – Kutatás – Alkalmazás. ELTE Eötvös Kiadó. Retrieved from: https://www.eltereader.hu/media/2014/10/Horvath_V_Hezitacios_jelensegek_READER.pdf

Houghton, Z., Kato, M., Baese-Berk, M., & Vaughn, C. (2024). Task-dependent consequences of disfluency in perception of native and non-native speech. Applied Psycholinguistics, 45, 64–80. https://doi.org/10.1017/S0142716423000486

Hualde, J. I., Olarrea, A., Escobar, A. M., & Travis, C. E. (2010). Introducción a la lingüística hispánica. Cambridge University Press. https://doi.org/10.1017/9781108770293

IBM Corp. (2024). IBM SPSS Statistics (Versión 30.0.0.0.) [Software].

Kahng, J. (2018). The effect of pause location on perceived fluency. Applied Psycholinguistics, 39, 569–591. https://doi.org/10.1017/S0142716417000534

Kassai, I. (1993). Gyorsult-e a magyar beszéd tempója az elmúlt 100–120 évben? Beszédkutatás, 1, 62–69. Retrieved from: https://www.epa.hu/04100/04178/00001/pdf/EPA04178_beszedkutatas_1993_062-069.pdf

Kimovich, C. M., & Yuryevna, C. O. (2022). Listening skills in the process of teaching Russian as a foreign language at the initial stage of training. In Proceedings of the Electronic Research Conference "International Scientific Solutions 2022" (pp. 75–80). Infinity publishing. https://doi.org/10.34660/INF.2022.72.71.012

Kohler, K. J. (1986). Parameters of Speech Rate Perception in German Words and Sentences: Duration, F0 Movement, and F0 Level. Language and Speech, 29(2), 115–139. https://doi.org/10.1177/002383098602900202

Koreman, J. (2006). Perceived speech rate: The effects of articulation rate and speaking style in spontaneous speech. The Journal of Acoustical Society of America, 119(1), 582–596. https://doi.org/10.1121/1.2133436

Künzel, H. J. (1997). Some general phonetic and forensic aspects of speaking tempo. Forensic Linguistics, 4(1), 48–83. https://doi.org/10.1558/ijsll.v4i1.48

Laver, J. (1994). Principles of phonetics. Cambridge University Press. https://doi.org/10.1017/CBO9781139166621

Levinson, S. C. (1983/1991). Pragmatics [Reprint]. Cambridge University Press. https://doi.org/10.1017/CBO9780511813313

Liberman, A. M., & Mattingly, I. G. (1985). The motor theory of speech perception revised. Cognition, 21, l–36. Retrieved from: http://wexler.free.fr/library/files/liberman%20(1985)%20the%20motor%20theory%20of%20speech%20perception%20revised.pdf

Lipski, J. M. (2005). El español de América. Cátedra.

Machuca, M. J., Llisterri, J., & Ríos, A. (2015). Las pausas sonoras y los alargamientos en español: un estudio preliminar. Normas. Revista de Estudios Lingüísticos Hispánicos, 5, 81–96. https://doi.org/10.7203/Normas.5.6823

Madrid Servín, E. A. (2008). Hacia el establecimiento de unidades para la medición de la velocidad de habla. El caso del español. In P. M. Butragueño & E. Herrera Z. (Coord.), Fonología instrumental: patrones fónicos y variación (pp. 257–274). El Colegio de México.

Markó, A. (2005). A spontán beszéd néhány szupraszegmentális jellegzetessége [Doctoral Thesis, ELTE Eötvös Loránd University]. Retrieved from http://www.spontanbeszed.hu/letoltes/ aspontanbeszedszuprasz.pdf

Martin, J. G., & Strange, W. (1968). The perception of hesitation in spontaneous speech. Perception & Psychophysics, 3(6), 427–438. https://doi.org/10.3758/BF03205750

Miller, J. L., & Grosjean, F. (1981). How the Components of Speaking Rate Influence Perception of Phonetic Segments. Journal of Experimental Psychology: Human Perception and Performance, 7(1), 208–215. https://doi.org/10.1037/0096-1523.7.1.208

Ministerio de Educación, Cultura y Deporte (2002). Marco Común Europeo de Referencia para las Lenguas: aprendizaje, enseñanza, evaluación. Instituto Cervantes.

Nagy, E., & Seres, K. (2006/2021). Colores 1. Nemzeti Tankönyvkiadó.

Nooteboom, S. G., & Eefting, W. (1994). Evidence for the Adaptive Nature of Speech on the Phrase Level and Below. Phonetica, 51, 92–98. https://doi.org/10.1159/000261961

Plug, L., Lennon, R., & Smith, R. (2020). Listeners' sensitivity to syllable complexity in speech tempo perception. In Proceedings of Speech Prosody 2020 (pp. 6–10). https://doi.org/10.21437/ SpeechProsody.2020-2

Plug, L., & Smith, R. (2018). Segments, syllables and speech tempo perception. In Proceedings of Speech Prosody 2018 (pp. 279–283). https://doi.org/10.21437/SpeechProsody.2018-57

Quené, H. (2007). On the just noticeable difference for tempo in speech. Journal of Phonetics, 35, 353–362. https://doi.org/10.1016/j.wocn.2006.09.001

Quilis, A. (1999). Tratado de fonética y fonología españolas. Editorial Gredos.

Ramig, L. A. (1983). Effects of physiological aging on speaking and reading rates. Journal of Communication Disorders, 16(3), 217–226. https://doi.org/10.1016/0021-9924(83)90035-7

Rietveld, A. C. M., & Gussenhoven, C. (1987). Perceived speech rate and intonation. Journal of Phonetics, 15, 273–285. https://doi.org/10.1016/S0095-4470(19)30571-6

Rojas, D., & Martínez, H. (2011). Percepción de la velocidad de habla en el español de Mérida (Venezuela). Estudios de Fonética Experimental, XX, 179–203. Retrieved from: https://www.raco.cat/index.php/EFE/article/download/252415/338815&ved=2ahUKEwjxs63f8bSPAxWQ-gIHHSTaJRgQFnoECBcQAQ&usg=AOvVaw2yvnNoeYUeEGPg1aI4FIsT

Saito, Y., Horwitz, E. K., & Garza, T. J. (1999). Foreign Language Reading Anxiety. The Modern Language Journal, 83(2), 202–218. https://doi.org/10.1111/0026-7902.00016

Santiago, F., & Mairano, P. (2022). Spaniards articulate faster than Mexicans Temporal patterns in two varieties of Spanish. Spanish in Context, 19(2), 244–264. https://doi.org/10.1075/sic.20013.san

Scherer, K. R. (1995). Expression of emotion in voice and music. Journal of Voice, 9(3), 235–248. https:// doi.org/10.1016/S0892-1997(05)80231-0

Schwab, S. (2011). Relationship between Speech Rate Perceived and Produced by the Listener. Phonetica, 68(4), 243–255. https://doi.org/10.1159/000335578

Schwab, S. (2015). Las variables temporales en el español de Costa Rica y de España: un estudio comparativo. Filología y Lingüística, 41(1), 127–139. https://doi.org/10.15517/rfl.v41i1.21193

Shapley, M. (1987). Prosodic variation and audience response. In A. Duranti & B. B. Schieffelin (Eds.), IPrA Papers in Pragmatics (pp. 66–80). John Benjamins. https://doi.org/10.1075/iprapip.1.2.03sha

Shrosbree, M. (2015). Cross-linguistic articulation rate among near-balanced bilinguals and implications for second language fluency measurement. In The Scottish Consortium for ICPhS (Ed.), Proceedings of ICPhS 2015. University of Glasgow. Retrieved from: https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0572.pdf

Sjons, J., & Hörberg, T. (2016). Articulation rate in child-directed speech increases as a function of child age. Retrieved from https://www.diva-portal.org/smash/get/diva2:945646/FULLTEXT01.pdf

Strangert, E. (2003). Emphasis by pausing. In M. J. Solé, D. Recasens, & J. Romero (Eds.), 15th International Congress of Phonetic Sciences (pp. 2477–2480). Causal Productions Pty Ltd. Retrieved from https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/p15_intro.pdf

Szabó-Kovács, D. (2024). Artikulációs tempó és szünettartás spanyol nyelvkönyvek hanganyagában. Magyar Pedagógia, 124(3), 191–214. https://doi.org/10.14232/mped.2024.3.191

Szabó-Kovács, D. (2026, en prensa). Temporal characteristics of AI-generated reduced-speed Spanish speech. In A. Lehocki-Samardzic (Ed.), Nyelvhasználat és nyelvoktatás a mesterséges intelligencia korában. (A MANYE Kongresszusok Előadásai 15/1, 15/2.). Akadémiai Kiadó.

Szabó-Kovács, D. (en prensa). Temporal Analysis of Synthesized Spanish Speech Produced by Artificial Intelligence. In A. Rahal & A. Fischer (Eds.), Contributions from STaPs-22: Research in Linguistics. J.B. Metzler–Springer Nature.

Szende, T. (1976). A beszédfolyamat alaptényezői. Akadémiai Kiadó.

Tóth, L., & Kocsor, A. (2003). A Magyar Telefonbeszéd-adatbázis (MTBA) kézi feldolgozásánaktapasztalatai. Beszédkutatás, 11, 134–146. Retrieved from: https://epa.oszk.hu/04100/04178/00011/pdf/EPA04178_beszedkutatas_2003_134-146.pdf

Trouvain, J., & Möbius, B. (2014). Sources of variation of articulation rate in native and non-native speech: comparisons of French and German. Speech Prosody 2014 (pp. 275–279). https://doi.org/10.21437/SpeechProsody.2014-42

Trouvain, J., Koreman, J., Erriquez, A., & Braun, B. (2001). Articulation rate measures and their relation to phone classification in spontaneous and read Germans. In Proceedings of the Workshop Adaptation Methods for Speech Recognition, Sophia-Antipolis, France (pp. 155–158). Retrieved from: https://www.isca-archive.org/adaptation_2001/trouvain01_adaptation.html

Tuomainen, O., & Hazan, V. (2016). Articulation rate in adverse listening conditions in younger and older adults. In N. Morgan, P. Georgiou, S. Narayanan, & F. Metze (Eds.), Interspeech. Understanding speech processing in humans and machines. Conference program & abstract book (pp. 2105–2109). International Speech Communication Association. https://doi.org/10.21437/Interspeech.2016-843

Vasilescu, I., Nemoto, R., & Adda-Decker, M. (2007). Vocalic hesitations vs vocalic systems: A crosslanguage comparison. In J. Trouvain & W. J. Barry (Eds.), Proceedings of the 16th International Congress of Phonetic Sciences (ICPhS XVI) (pp. 1101–1104). Universität des Saarlandes. Retrieved from: http://www.icphs2007.de/conference/Papers/1504/1504.pdf

Voigt, S., & Schüppert, A. (2013). Articulation rate and syllable reduction in Spanish and Portuguese. In C. Gooskens & R. van Bezooijen (Eds.), Phonetics in Europe: Perception and production (pp. 317–332). Peter Lang.

Young, D. J. (1992). Language Anxiety from the Foreign Language Specialist's Perspective: Interviews with Krashen, Omaggio Hadley, Terrell, and Rardin. Foreign Language Annals, 25(2), 157–172. https://doi.org/10.1111/j.1944-9720.1992.tb00524.x

Metrics

0

Crossref logo

0


0

Views

0

PDF (Spanish) views