2024-12-05 at 1444

need to figure out difference betw LLMs and smth like alphago

has anyone done the equivalent of like . training a tiny LLM/NN to use a tiny LLM

and how different is that from smth like . training a NN to play go

==========

also . like humans, will it need to start from scratch for each new skill ? like will it just come out of the box sorta tabula rasa about how to reason in that skill ?

Hm Perhaps but also surely some things will carry over from prev skills like humans

for agi to be agi . will it need to learn to learn ???

or does it just need to know how to reason

==========

also dont get too focused on having a human-shaped NN

==========

also think about the compute of reasoning w huge ass LLMs like 4o .... i rly think we need to pivot to extremely tiny LLMs

==========

also 4o is prompted to make too many jumps that its not even sure about . its an overconfident guy who just one-shots an immediate answer using only intuition - and sure his intuition is like 99.99999 percentile but the fact that he has ZERO (ABSOLUTELY ZERO) reasoning just kills him

O1 IS STILL VERY BABY STEPS FOR REASONING.

==========

need some sort of .. Uncertainty detection . need 4o to not be so fucking certain . or maybe we can use another model to supervise that, ykwim?

also it seems like CPO (chain of preference optimization, see https://arxiv.org/abs/2406.09136) is sorta what i was thinking of

==========

also is there a way that we can model human reasoning but then allow it to go beyond the way that humans reason and figure it out itself??

sorta like how alphazero was trained on pro human chess players but then learned by playing itself a bunch of times

==========

also how cheap of a system 2 model (like an llm) can we use and still produce good results? like. how much can we nerf their intuition and still produce good results? — because for a game of chess or go, one move has a latency and compute of like 0.000000000001 because it’s literally just like a line of code . for training how to reason, that will take like . a wholeass inference from another model, ykwim????

basically i wanna train a NN to use another preset model, in the same way
this model will not be learning actual new skill/knowledge, other than that we will train it to reason

==========

related to the above question, i feel like if there was an LLM analogous to base intuition for humans, it would literally be like one of the dumbest things ever . wait actually is that true? nvm idk if thats true. well im almost certain we r like 1000x dumber than 4o for solving math problems. but for Regurgitating Factoids, actually we have a lot of space for that. wait but we only specialize in like a handful of areas (whereas 4o has deep knowledge in like all topics) and even with those few areas we cant even remember much perfectly. like without our reasoning, we r actually Terrible LLMs with Very Little training data (compared to SOTA LLMs rn)

i think we only need like a <500M param model
then train another thing to use it
also it is very important to give it a scratchpad. humans cannot fucking reason without pen and paper.
(and then BTW! - after that is done, you can also give it tools like 4o and internet search and python, so it can be like a human who reasons not just by himself but with the help of computer tools)

==========

i also wanna train smth to learn how to learn - like not just make it learn, but make it learn how to learn

EDIT 2024-12-25 01:46:48
wait, each human can barely even do this for most things

==========

wait but also . during retrieval (eg trying to remember a history fact or a cooking recipe or a formula), sometimes our immediate one-shot intuition is wrong (and in some of those cases we can Feel that its prob wrong) - but then we’re like oh lets redo the inference with more depth of grokking this time (how tf does that even happen?? and wtf does more depth of grokking even mean?? but we all know how it feels like…)

i think there's something to be said about like . you might remember a thing only partially and then those neurons activate their friends . but idk how true that is and idk how much that contributes .

===

god i wonder how that intuition process works . how do we Feel that its wrong . it is a Feeling . something produces a feeling .

wait also how does memory retrieval to language work . like i feel like it reactivates the part of the visual+motor cortex that was activated during that experience, and that is the pre/semi-conscious qualia when trying to articulate a thought

==========

i think it will be pretty hard to train reasoning for many things at a time . but that is what people want . people want AGI that can already do like Everything and so we're trying to train it on Everything Now Now I want All of it Now -- but i dont think we should be doing that.

i think it will be pretty beneficial to just focus on a simple area like physics or ARC-AGI (or perhaps also sparse reward environments) and then use a tiny LLM to train reasoning for this one tiny thing

===

why dont we train many small models? why do we train a rly big model that tries to be an expert of everything? i feel like the former would be a much cheaper approach, bc my impression is that when you use gpt-4o, youre using the same amount of hugeass compute each time, even though usually each question is about approx 1 field. sometimes we have questions that need to connect 2+ fields. but we almost never have questions that must connect ALL THE FIELDS IN THE ENTIRE INTERNET... so then why are we only using a HUGE model with HUGE knowledge and HUGE compute?? it's like the equivalent of using terrence tao to do elementary school arithmetic, except in the AI case it has to use the same amt of compute each time.

unless my assumption about compute usage is just wrong. idk im rly new to all this stuff.

also what is Mixture Of Experts approach, i've heard about it

==========

i feel like what is happening in the AI world is very very isomorphic to what is happening in the music world.

to summarize what i think about the music world:
i think that the mass majority of current classical musicians have basically zero mastery over their instrument, in the sense of how we measure linguistic fluency.

like, yes, they have mastery in the sense that they can recite a written piece.
but the mass majority of these classical musicians have BASICALLY ZERO ability to produce their own sounds from their own ideas/feelings (i.e. IMPROVISE), nor can they even do the simpler task of parroting what they hear.

in the linguistic sense, this would be equivalent to a chinese learner who can ONLY recite pinyin from a page, and they CANNOT even repeat things they hear, and they CANNOT have any conversation nor produce their own sentences.
in the linguistic sense of fluency, we would NEVER call this person fluent in chinese.

now im not saying it's bad to learn music like how classical musicians go about music right now -- it's fine to do that and many get true enjoyment from that exact thing. and yes they are masters of their instrument, but ONLY in ONE specific sense. it becomes a problem when you confuse that SPECIFIC mastery for GENERAL FLUENCY.

if you don't confuse it, it's fine and everyone can enjoy what they want to enjoy. but then some people confuse it, and then it becomes a bigger problem when that mental mistake carries over to the larger community and infects the pedagogical consensus. like, it used to just be a personal preference of "yeah i dont rly want piano fluency in the linguistic sense, i enjoy just reciting pieces!" now has turned into "i wanna become good at the piano and the only way i can conceive of that goal and that path is just learning how to recite pieces"

and now the mass majority of people wanting to learn music will say "oh! i will become fluent in piano by learning how to read sheet music!!" but no!!!! you will only learn how to read sheet music! thats it!

like, when you wanna learn music, there is usually very little emphasis/encouragement on experimentation and improv. like, yeah, there are some groups of people who do this, but it's sort of like . only Rogue people that do it . there are like Rogue communities that do this . there arent any organizations or programs or common methods (other than Gordon's attempt at creating MLT (music learning theory)) that faciliate this to be actually commonplace in a non-marginalized way.

and i feel like this whole phenomenon is caused and/or perpetuated by the fake sense of fluency you get when you recite a whole piece. when you learn a piece youre like wow!! it really feels like im masterful!! im playing the piano!!!!
and yes, to an extent / in a sense, you truly are!
but do NOT confuse that with fluency/mastery in the linguistic sense. if i was learning french and all i did was just read a page of french poetry, it would be CRAZY to think that demonstrates mastery.

===

ANYWAY that is my summary of what i think about the music world. and i feel like that is what is going on with the AI community.

everyone got swept into LLM stuff. i mean sure there are groups that arent, but theyre kinda sorta rogue and/or marginalized, ykwim?

and LLMs are isomorphic to just reciting sheet music. they memorize the whole internet, but they cannot go past that. and then you get a fake sense of mastery and youre like Wow!!!! So Cool! let's do more LLM stuff!! let's make them bigger and more efficient!!!! but that is just like learning more sheet music and playing the sheet music better -- it is Cool and Nice but you must not confuse it with ACTUAL GENERAL FLUENCY.

we need to create NNs that actually REASON. which is harder and more first principles, but it is more exponential. this is all isomorphic to the people who do piano improv.

we've trained NNs to play all the pieces on the internet. but we havent trained them to improv.

==========

okay anyway now i wanna make a plan for what i want to experiment to train NNs

there are a lot of cool papers and models and videos to look at, but i feel like i need to sorta sit down for a little bit and solidify what will actually be relevant for what im trying to do.

im just kinda all over the place rn . very hard to balance the divergent and convergent thinking .

==========

wait okay if i really believe in the whole .. my isomorphism connection between the music community and the AI community . then like . that would imply we are nowhere near AGI because we have been improving the fundamentally categorically wrong thing

okay but the thing is . at least with chatgpt-4o it CAN create its own sentences that are novel and new . it does have novelty, but the novelty factor isnt large, like it sorta just rearranges existing text with a relatively surface-level algorithm (it just rearranges text, not the underlying ideas)

and so i guess a better analogy is actually like . LLMs are good at improv but sorta kinda only in the way that Melody is good at improv (her improv is close to a concatenated string of Quotes than 100% all genuine improv) (not the best analogy but yeah)

now, does that mean we're near AGI?
i think it means we can remix existing Text Excerpts and create new things like that. just like melody Can make novel songs, but the novelty factor isnt large, and she's sorta just cutting and pasting existing phrases

but it will be very very hard/improbable/impossible to use Existing Tech / Existing Approach to actually go Beyond the level of things that we currently have. the same way that melody cant

a model like gpt-4o is like . it sums together all of human knowledge and it can remix it into novel ways. but it cant go beyond humanity in any significant meaningful way right now.

===

god i love it when a field is based on a faulty first principle bc i can easily attack it without much actual skill or knowledge related to the rest of the field

but also idk if im just being incredibly naive lol . we'll see . but at least it does sorta seeeem like i have potentially novel contributions

===

note that even big AI researchers are very prone to bias . perhaps even especially them bc to get there youre usually a tweaker

okay wait but im also very prone to bias. i am a tweaker... Lmfao i wonder if ill actually contribute anything novel . battle of the tweakers.

More from corbin
All posts