Compression based models

Regarding the compression based language models (like https://github.com/Futrell/ziplm): I've always wondered whether the achievable source code compression ratio wouldn't be a good ghetto proxy metric for code readability and maintainability: too high, and you have a corpus with lots of boilerplate, too low, and you have weird write-only code (think Perl code golf or APL). Must be tuned to the concrete language, of course.


You'll only receive email when they publish something new.

More from pmf
All posts