Compression based models
July 15, 2023•64 words
Regarding the compression based language models (like https://github.com/Futrell/ziplm): I've always wondered whether the achievable source code compression ratio wouldn't be a good ghetto proxy metric for code readability and maintainability: too high, and you have a corpus with lots of boilerplate, too low, and you have weird write-only code (think Perl code golf or APL). Must be tuned to the concrete language, of course.