2024-12-27 yapping to friend about arc
this is a monologue im gonna text u things bc writing it out to another person helps me process my thoughts better than just writing it to myself but u dont have to respond like u can fully just not respond at all like i can just pretend u r reading these okay so . ive been trying to flesh out a program for ARC . and its rly crazy how salena is such a good analogy for an LLM like . she hallucinates just like an LLM (i.e. e.g. she confidently does the wrong equation) she litera...
Read post
2024-12-21 yapping to friend about o1/o3/search
this is a monologue so the way i like thinking abt it is models like 4o, claude sonnet, etc .. their pattern matching / interpolation / wtvr is all prettyyyy analogous to human type 1 thinking .. like imagine if u were asked a question and you gave zero conscious thought about it and just immediately spewed out words based on intuition (the analogy is not perfect but i think its pretty good so ill continue using it here) humans are actually pretty bad at type 1 thinking for things like...
Read post
2024-12-25 at 14:10 summarizing past notes and what to do going forward
i have gone through my past notes and this note is me copypasting excerpts that i wanna keep in mind, trying to inform what to do rn / going forward this note is unfinished (but i guess all of my notes on this blog are unfinished n messy anyway lol so wtvr) ===== 2024-12-24 at 18:01 okay i know ive been getting Slightly more into learning about ai, and i want to make sure im not making my ARC solution hypotheses too complex — i.e. i dont want the equivalent of feature creep i wanna try the ...
Read post
2024-12-25 at 02:13
wait dude. the thing is . humans r ALWAYS doing test-time training. it's not like training -> static level of performance... even our performance is training. i mean i guess Yes we can train and then have some static level of performance that we stagnate at, but even that performance is still solidifying smth (perhaps eg ur current bad habits in ur technique) ...
Read post
2024-12-23 misc3
if u dont understand smth u need to know that u dont fully understand it and that youve blackboxed it etc how does a good human reasoner learn that? and then how do we update our beliefs/understanding after learning it === u also need to learn the threshold of blackboxing, and the threshold of how blackboxed of a tool can u still be satisfied using how does a good human reasoner learn that? === id like an ai to be able to go through Purcell n Morin EM textbook, do all the practice problems...
Read post
2024-12-23 misc2
understand constitutional ai approach can i do RLAIF via prompting? like it wont change the model but it will change the prompt evolutionary approach is dumb if it's only evolutionary and you have set compute cutoffs. allow your time to be managed like a human would manage their time. okay maybe im being too harsh to jeremy berman evolutionary approach is actually pretty cool (picbreeder type beat) but i just feel like it's not needed for this problem i guess? i feel like it would be much more...
Read post
2024-12-22 at 17:38 more notes on Parables on the Power of Planning in AI (noam brown)
And this was, at the time, state of the art for predicting human moves in chess. 29:48 Now, one thing that's really interesting about MAIA is that for high Elo models, it was about 100 to 300 points-- 29:56 Elo points-- below the target Elo rating. So if you were to train it on 2,000 Elo-rated humans, 30:03 it would only be about 1,700 Elo. For the lower Elo ratings, this ended up not being a problem. 30:08 For the higher Elo ratings, it was a challenge. Now, one hypothesis for why this is the c...
Read post
2024-12-22 at 16:21 3b1b
cost function is average over all examples backprop gives you gradient of C(w1, w2, ...) (how?) but to calculate that youd need all examples instead, we use only a few examples at a time then calculate not the exact gradient but instead a Stochastic Gradient using backprop using those few examples . this makes sense bc it's also how humans learn . we dont need to retrain on 50000 examples before adjusting our strategy/intuition/wtvr (whether consciously or subconsciously) . we adjust as we go, ...
Read post
2024-12-22 misc
it's crazy to think about how babies learn language . like Wtf the brain is just able to do that???? and humans first learning experience are like . model free RL isnt that fucking crazy . its not model based with rules n wtvr . its fucking model free deep RL... === ive been thinking about how to copy human reasoning and RL it like a human teacher -- but perhaps also think about how can we make reasoning emerge in the first place... i mean, even with the former method we can go beyond human r...
Read post
2024-12-22 at 02:07
i feel like i wouldnt have gotten most of these ideas if o1 didnt exist yet to inspire me ...
Read post
2024-12-22 at 00:03
for things like tennis n rock climbing, perhaps the partial derivatives are approximately only dependent on that variable like, at any point, you can improve any part of your technique and expect a reasonable gain in performance, and it doesn't rly matter what order you do this in, because the improvement from URGH im gonna stop explaining it in english i already understand it . whatever. a little more formally, let's say you have a cost function C which depends on factors x, y, ... let's say ...
Read post
2024-12-21 at 21:14
if we want ai to make serendipitously great discoveries we need it to play and that means we need “useless” ai agents that “waste” compute ...
Read post
2024-12-21 at 20:55
remember the thing shane said about adding more params just allows us to gradient descent even more instead of hitting a minima and it seems to work rly rly rly well its rly interesting bc once i was talking w jason and i thought more params = harder to find gradient and less params = rly easy to find gradient we were comparing Life Happiness vs Tennis, and we made that analogy ... happiness is very hard to make progress on, tennis is very easy to make progress on.. (or at least it seems like ...
Read post
2024-12-21 at 20:45
BFS vs DFS play shows bfs ...
Read post
2024-12-21 at 20:30
wait humans r extremely bad at type 1 thinking but also sometimes we get a kamikaze divine gift and even more than sometimes we also somehow r rly good at connecting things or wtvr theres some sort of creativity process, ykwim? so cool ...
Read post
2024-12-21 at 18:54
learning by association vs learning by gradient descent etc ??? learning by RL , ai vs humans ??? ...
Read post
2024-12-17 at 16:09
why is everyone being weird about scaling test-time compute vs training compute https://x.com/ClementDelangue/status/1868740932251844806 or maybe thats just tweakers on AI twitter and not ppl in actual research communities === also im curious how the phenomenon of "for every 10x increase in training compute, we decrease 15x in inference compute" maps onto humans re: https://yellow-apartment-148.notion.site/AI-Search-The-Bitter-er-Lesson-44c11acd27294f4495c3de778cd09c8d "Moreover, the brillia...
Read post
2024-12-16 at 16:01
why are we not worried about human interpretability why is there no worry about human superintelligence like . why is there the assumption of . AI becomes compoundingly infinitely smart right after we get reasoning is it just bc . humans are limited by their speed and AI is assumed to have the ability to scale speed+knowledge with enough compute? also .. i'm realizing how human reasoning is so slow n unoptimal actually compared to the ideal that we sometimes think it is also . even if we get ...
Read post
2024-12-15 at 06:21
wait humans r actually so bad at first principles n reasoning for super open ended questions like we use so many heuristics and it just rly depends if ur lucky that ur heuristics work out and u only change heuristics when ur rly sure they dont work out and also emotions change ur rationalization so easily like . you cant verify each step from first principles for an emotional problem .. u can try but its impossible to really have some axiomatic base that u can build up from . all axioms in p...
Read post
2024-12-12 at 20:02
how do you reason for things that feel genuinely out of your reasoning ability? ...
Read post
2024-12-12 at 01:20
we need a tiny LLM to be trained on the brainstorming, not the output actually this is smth that i alr said but . idk i forgot it then rederived it the brainstorming is the actual interesting part the output is something that we can easily calculate using python we need to train the brainstorming to have that sort of . "oh we know how iq tests work and i know how to sorta kinda reason through it like a human sorta kinda reasons through it" ...
Read post
2024-12-11 at 20:03
i dont think i wrote abt this yet but . i remember when reading the Voyager minecraft thing and how they used embeddings of past solutions to help current/future solutions, i remember i was like "oh i've been thinking of that" (but i forgot what Exactly i was actually originally thinking and how i originally wanted to go about it, so now the earliest version i actually remember is just the voyager implementation) ========== constitutional ai not simple RL but like nuanced RL could we implement...
Read post
2024-12-11 at 03:51
summary of convo w shane today i met this guy beside the pool tables in faunce while i was talking about Hidden Markov Models w moses and then he chimed in and started teaching us about HMMs and then i asked him what year he was and he said he's a phd student and i asked him what research and he said ai rl and so we started talking about ai here's his website https://sparr.io/ double descent penalize large coeffs language changes perception not bc of the language but bc the language is a sc...
Read post
2024-12-10 at 21:47
instead of providing the reasoner with a human-built cheat sheet / human-curated guidelines / etc, we want to allow the reasoner to struggle a lot we can give the reasoner tips on how to reason, but not shortcuts to reason same thing with training humans! like if we were teaching salena, we wouldnt wanna give her human-curated tips like "oh try noticing connected blocks, and try shearing, then try symmetry, then try rotation, then try [...]" we would wanna let her struggle through it, but give...
Read post
2024-12-09 at 19:20
humans learn not from repeated correct examples but from the edge cases of mistakes (wait we also do learn from repeating correct things like w spaced repetition, right? and also w reviewing the same practice problem ... oh wait but . perhaps that is also still actually traveling the edge case of mistakes, bc otherwise tbh we just skip the example when we're studying) eg dekeyser's skill building oh wait i guess this is what RL does (?) ========== also why havent we created a multimodal mode...
Read post
2024-12-09 at 14:22
arxiv paper that i found via youtube shorts yt = https://youtube.com/shorts/ZlvdInrdAYE arxiv link = https://arxiv.org/abs/2309.05660 kinda surprised that this was published late 2023 and presented in mid 2024 . also this is literally not discrete program search, right? but in the yt short, chollet says it is ? this is a naive implementation of literally what i was thinking about a few days ago (re: "brainstorming") i guess they were just trying to demonstrate base level capabilities, and per...
Read post
2024-12-09 at 13:31
i wanna be curious and just explore whatever but like also i dont wanna waste time,, i wanna actually be fast and speedy and actually catch up to things so that i can actually think smart about these things and actually start experimenting on novel things quickly, like ofc it's very useful to just sit around and think about hypotheses and we need that to get New Creative Ideas (re: greatness cannot be planned, etc) and that is very intangible but i also wanna be able to respond to tangible measu...
Read post
2024-12-06 at 0341 mimi import
active inference how can an ai eventually do divergent thinking and connecting two seemingly unrelated things? how do WE interpolate for that? r we only capable of interpolation and our extrapolation is just built from interpolation? ...
Read post
2024-12-06 at 1209 mimi import
humans r so dumb but we have 999999x distributed compute n teamwork ...
Read post
2024-12-08 at 1331 mimi import
when youve honed your reasoning you can sorta Feel if an equation is off that feeling is an action tugging (just like how u feel a word on the tip of ur tongue or how you feel) (i rly like model of . viewing feelings as action tuggings … its not perfect but i like it) and thats bc your Type 2 reasoning is built using Type 1, and Type 1 is multimodal whose teleology (and thus ontology) is to control Actions how did humans figure out how to reason ? maybe instead of teaching a model how to reas...
Read post
2024-12-08 at 2220 mimi import
train type 1 w type 2 (as in at the same time, just like how we train the word embedding matrix with the rest of the model), but type 2 is constructed from type 1 tho one new sorta reason ive thought of that it’s good to train type 1 and type 2 together is . this combined training will also determine how type 2 is constructed from type 1, and otherwise youre setting a sort of demarcation that is probably rly naive, but if u allow the demarcation to happen on its own then it follows how humans l...
Read post
2024-12-09 at 1312
tokenization ...
Read post
2024-12-05 at 2018 Untitledaksdjfhdas
i swear AI research is like a profitable tweaker's hobby.. like sure it's Valuable and you are contributing value to ppl. but like . ppl do it moreso for the prestige and the thrill (or at least me) . and i would feel more of the Contribution Feeling if i was doing smth like . actually personal . at least i think . ...
Read post
2024-12-07 at 1727 Untitled 5
do we only interpolate? how do we interpolate on ideas how limited are LLMs? bc they do not interpolate on ideas wait but what if they Do interpolate on ideas . like what if with 170B params it actually interpolates on ideas in a similar way as us bc like . we are only trained on our output not our hidden layers (ideas) right ...? i mean sorta but also no. okay but with next word prediction it obviously does not interpolate on an Entire idea (or maybe it is?? and maybe our sentence generation...
Read post
2024-12-05 at 2018 Untitled 3
ah . god im realizing how new i am to thinking about AI because im making many silly mistakes. but its okay. playing devil's advocate though: how do we make sure it doesn't optimize for: taking shortcuts that look efficient but aren't actually good reasoning finding patterns in the training problems rather than learning general reasoning memorizing solution templates instead of actually reasoning oh wow all of these are problems we have with Humans too... (note to self: 2 and 3 are basica...
Read post
2024-12-07 at 1727 Untitled 7
wait i swear ARC AGI tasks are interpolative on the goal but then you need a reasoning thing to try out the solutions at test time just like we do ========== if humans are strictly interpolative but we use that to do extrapolation, we have to ask how that happens rn LLMs are surface level interpolative but extremely good at doing surface level interpolation across superhuman amounts of data i think there's something about . humans interpolate on a much finer resolution, like on a much finer...
Read post
2024-12-05 at 1444
need to figure out difference betw LLMs and smth like alphago has anyone done the equivalent of like . training a tiny LLM/NN to use a tiny LLM and how different is that from smth like . training a NN to play go ========== also . like humans, will it need to start from scratch for each new skill ? like will it just come out of the box sorta tabula rasa about how to reason in that skill ? Hm Perhaps but also surely some things will carry over from prev skills like humans for agi to be agi ....
Read post
2024-12-23 misc
LLMs are too confident and they need to not be they need to know their limits and they need to know to what threshold they can make intuitive jumps vs need to break it into smaller parts good human reasoners do this but actually this is not a closed problem for humans too like . for something like doing the problem of 572 * 205 ... we know we cant immediately do it so we don't just immediately say "oh the answer is 170380" or wtvr. we have to break it up into like 572 * 2 * 100 + 572 * 10 / 2 ...
Read post