Responses to video about monosyllabicizing English
December 19, 2023•2,008 words
About
This is a collection of some rather detailled comments I left on the following YouTube video:
I Removed Most of the Syllables from English and It's 30% Faster Now
On the issues with the linguistics in the video
2023-12-18
This is a curation of three replies I made. The two long comments were replies to an individual asking why I thought the video bordered on being incorrect. The last quote block here was part of the first long comment. A couple contextual sentences were ommitted to adapt these comments to being on a blog, but no actual content was removed.
NOTE: Unfortunately, Listed does not seem to show where these block quotes start and end, so it is not easy to discern the boundaries between these comments.
Really cool from a programming perspective, but the linguistics in this is so oversimplified that it at times borders on incorrect. As a programmer and linguist, I am conflicted.
One of the most-obvious things, which even non-linguists could have noticed during the course of the video, is that
[s]
is higher on the sonority hierarchy ("more sonorant") than[t]
; indeed, this is a very common and well-known exception to the rule-of-thumb that is the Sonority Sequencing Principle in many Indo-European languages. Yet, the presenter makes no mention of this (instead (as I recall) presenting the SSP as a kind of universal law of language (It's more of a universal guideline, and many (most? Hmm, this is a good question...) languages have explicit exceptions to it.). No mention of there even being such exceptions was made in the video (at least that I heard), and worse: 3:06 shows[s]
as being less sonorant than[p]
! This is plainly factually incorrect, but I assume the author just wanted a prettier squiggle, and decided to say "chàbùduō" and draw it incorrectly anyway, either to avoid having to talk about there being exceptions, or because the author didn't know that exceptions existed.Another thing that turned me off was that the author, as I recall, reduced the sonority hierarchy into a matter of how loud phones were which is... not correct, and beyond mere oversimplification. It's actually kinda difficult to explain sonority to a layperson. One way you can think of it in terms of distinctive features, which are the characteristics of a phone. Some features make a phone more sonorant than others. The ¿best? way to think of sonority is probably in terms of acoustics. Unfortunately, I'm not an acoustic phonetician (though I'd like to dive deep into it someday); but if I had to try to hazard an acoustic definition of sonority, I'd perhaps say that the more well-defined and steady the formants are, the more sonorant a sound is. If there's an acoustic phonetician out there, please correct me if I'm missing something with this definition. The loudness that the author said was the defining characteristic of sonority is at worst more of a side-effect of sonority, and at best just one small part of the puzzle.
If you want to quickly grok the sonority hierarchy, you can essentially do so by going row-by-row in the IPA chart; the rows are the manners of articulation, and they're mostly ordered by sonority (though this is not true for some rows, such as the laterals, which aren't more sonorant than their unlateralized counterparts).An additional, though very minor point, is that the author exclusively uses "Sonority Sequencing Principle" in places where he meant "sonority hierarchy" or just "sonority" This is really not at all a real problem, since people can figure things out from context, or just reduce everything to the word "sonority"; but I bring this up because it's one of many tell-tale signs that the author is inexperienced with the subject matter. Which, I want to stress that that is fine; we're allowed to go outside our fields of expertise (Good heavens, imagine if we couldn't!). But what was covered was lackluster, akin to being tutored by someone who is still, themselves, learning the material they are trying to tutor you on.
The author also had this idea that fewer syllables means faster communication, but per my understanding, this isn't true. I don't have a study off-hand to point to, but my recollection is that the rate of information transfer during human speech is consistent regardless of how syllable-laden the language is; that is to say: languages with more syllables are simply articulated faster than languages with fewer syllables. As an English speaker, you may have experienced this phenomenon when hearing Spanish spoken: it sounds really fast, because each syllabe really is being pronounced faster; but Spanish words have on average so many syllables that they aren't actually communicating more-rapidly than you with your less-syllabic English. The language faculties of the brain can only handle so much information at once, regardless of how quickly your mouth is able to move.
There are also issues with the project methodology, generally with the author's clever solutions (I mean this earnestly; they were clever.) to problems he did not understand. This had the similar vibes to "I don't really understand problem X, and it would take way too much time to understand it; but I am good at programming, so I'm just going to build a neural net and let the PC figure it out for me." (I know this wasn't a neural net; it's just that the vibes, so-to-speak, were similar to me.). There's nothing wrong with this, imho; it's completely natural to lean into your strengths. But it would have been good were it called out. Which, it wasn't in this video; but it's not a sin or anything. Anyways, you can very much see from the results of the presenter's project ("the proof is in the pudding") that limited author knowledge translated into limited model efficacy. A rules-based system that properly takes the phonotactical restraints of the language into consideration, instead of the very cool but misguided directed graph used by the author, would have resulted in much better nonsense words. The biggest problem with the directed graph was that it just isn't a good example of the underlying rules of the language -- it's a quick-and-dirty corpus analysis; it's not generative. When you want to reduce the number of syllables in a word (as the author did), the letter frequencies and the probabilities that one will come after the other will change; and keeping them static, as the author did, doesn't make sense, and indeed won't really work. There are other things to critique about the methodology, but this is perhaps the biggest piece.
I would have liked to have seen the author approach this problem like a historical linguist describing the descendants of PIE: There is a whole library of different sorts of sound changes that can be applied to a language to achieve the goals the author was seeking to achieve. The most obvious place to start is in eliminating all reduced vowels, as this will result in a pretty dramatic decrease in syllables without a big impact on comprehension.
It would have been nice had the author thought to simplify the grammar before monosyllabicizing everything. English has inflections, and making every different inflection a different randomly-generated word is a very inefficient use of a limited set of phonotactically-valid single-syllable words.
There was also imho a huge missed opportunity for the author to move complexity out of the morphology and into the syntax. This would have been probably the most exciting part of the whole project. Noun adjuncts could replace adjectives, noun adjuncts could allow for basically compound words without officially being compound words (so that we can keep to the goal of "every word is one syllable"). Alas, though.
In any case: these were the main things I remember having noticed when I watched it a couple days ago.
Please don't read this message as a total condemnation of the author, because it isn't; I'm just answering why I felt the linguistical side to this video was so lackluster and why I was so underwhelmed by it, despite finding the software part quite cool.
On better ways to achieve the main goal
2023-12-16
This is much less-formal than the above comment. I get a bit carried away at the end, playing with the word "antidisestablishmentarianism", which the video mentioned at its outset.
The directed graph thing was really cool, but it's also... not the best way to go about this? It's clever and original though, and as a programmer I appreciate what you've done. But you can do all of this with rules-based systems and get likely better results. What you've made feels like using a neural net as an alternative to fully, deeply understanding something and building it from the ground-up, if that makes sense. Using data structures as a way of avoiding interacting with the core of the problem is certainly a better use of your time than doing it a "right" way; but it's still a shortcut, and you get shortcut results unless you increase the complexity of the model by... a lot, or you switch to a different paradigm. This corner-cutting shows quite clearly in 10:41, where you have a number of words that violate English phonotactics, like "zleengz" (I am unaware of //zlV// being valid in English, and I'm pretty sure that /i/ does not occur before /ŋ/ in GA (General American) or RP (Received Pronunciation) -- it's always /ɪ/.)
Another problem is that your directed graph can't account for the weights changing as you reduce the syllable count. You could condense "helicopter" to "helcopr" and English natives would very likely still understand you. If you condense enough words (which, like, you're literally doing the entire language), then your graph is going to look quite different. Your use of a weighted directed graph preserves weights in a context where those weights are no longer relevant... and so you get a lot of gibberish.
Another issue, one which I don't recall you addressing, is that going from IPA to English orthography is... fraught.
My first step in tackling this problem of monosyllabicization would be to axe reduced vowels. You can significantly reduce syllable counts this way basically without harming comprehension, at the cost of increasing the complexity of consonant clusters. You can then manually get rid of additional consonants and vowels that aren't super important for comprehension. Example word: nidistabshmntarinism for antidisestablishmentarianism. It kinda sounds almost like a drunk slur of the original, and amounts to an elimination of 5 entire syllables. To go lower, we could start removing all but the vowels that are absolutely necessary for the word to be pronounceable: nid-stab-shmn-trin-is. We can then start moving things around and assimilating features: nit-stab-smin-trims. To go any lower than that, you really either need to start modifying English phonotactics (ntst-psmnt-rmnz, which eliminates 9 of the 12 syllables of antidisestablishmentarianism), or you have to start allowing compound words (like "backpack") or making extensive use of noun adjuncts ("county firemen" is an example of this). Your need to eliminate syllables can be achieved in part by moving complexity out of the morphology and into the syntax.
Alternatively, you could try to go for the MUCH more attainable goal of one syllable per morpheme. That allows you to have "ant-dis-stabsh-ment-tair-an-imz" (with a decorative "i" added to "tar" to make it look more like "tari" and also make it clear that it's pronounced as /teɪr/ instead of /tar/).Another angle that could be interesting is if you can somehow use Optimality Theory to produce your monosyllabic words.
On the fact that what the author wanted already exists in Classical Chinese
2023-12-15
This is basically what modern Mandarin Chinese does! Most words are just two-syllable compound words composed of one-syllable words.
And Classical Chinese was basically just entirely monosyllabic. OP should learn Classical Chinese.