2024-12-27 yapping to friend about arc

December 27, 2024•1,586 words

this is a monologue

im gonna text u things bc writing it out to another person helps me process my thoughts better than just writing it to myself

but u dont have to respond

like u can fully just not respond at all like i can just pretend u r reading these

okay so . ive been trying to flesh out a program for ARC . and its rly crazy how salena is such a good analogy for an LLM

like .

she hallucinates just like an LLM (i.e. e.g. she confidently does the wrong equation)
she literally has no idea how to reason through a math problem and she kinda flops around like a fish unless she has been explicitly trained on that specific kind of math problem
there’s a common trick to get LLMs to give slightly better answers, and that trick is to make them write out their thought process before arriving at a final answer .. ofc they havent rly been trained on reasoning, but just writing out their thought process reduces some hallucinations and increases accuracy bc now theyre decreasing their huge leaps of intuition so they have more stepping stones to the correct answer as opposed to just trying to leap straight to the correct answer .. but a naive implementation of this trick obviously doesnt produce much more success because they're still not checking their work . and even if u tell them to check their work, lowk theyre still rly bad at that ..... just like salena

but another crazy thing is . actually even the smartest humans are just like salena, relative to some hypothetical 99999 IQ level . even pelcovits and rly good scientists and whatever, we all do the same mistakes as salena/LLMs, but it's just that pelcovits is like 100x less bad, but even pelcovits would be salena-level compared to some hypothetical 99999 IQ titan.. like e.g. even pelcovits and einstein and bohr make a shitton of mistakes and even they are sorta flopping around like a fish compared to Perfection, but it just turns out that their flopping around churns out a new scientific discovery every once in a while

anyway . trying to think about how to solve ARC is so humbling bc it's like . wow this is at the limits of my own problem-solving . it's making me realize how imperfect i am at problem-solving . it's making me realize i am just a salena too . cuz i dont know how to tackle this problem .

sure im really good at reasoning through math psets or wtvr, and sure im good enough at reasoning to solve slightly new problems that are a bit beyond my training . i like to think i have a pretty decent scientific problem-solving skill of breaking down new problems into things that i know how to do .

but this ARC problem seems to be outside of my reasoning capability . and now i don't know how to solve it . and like by definition i can't solve it because it's out of my reasoning capability . and i dont rly know how to expand my reasoning purview or increase my reasoning capability!!!!! kinda like salena .

like i sorta know how i would teach salena to expand her reasoning capabilities to get good at math (it would just require a lot of practice on her end) ... but how tf do you teach someone how to expand their reasoning capabilities to tackle a super new problem that is completely out of their reach????

wait i was typing more things then i realized maybe im wrong

hold on

hmmmm to try to articulate it better / more accurately ... Hmm god idk how to put this into words but i have a visual for it LOL .

like let's say salena's Familiar Reasoning Purview is a circle of radius 5, and she can push her boundaries to solve new problems outside of her familiar zone for an extra 0.5 radius . so her Total Reasoning Purview is radius 5.5 , and her "Novel Reasoning Ability" ratio is 0.5/5 = 10% ... if that makes sense .. and then perhaps my Familiar Reasoning Purview is a circle of radius 10, and let's say i can push my boundaries to solve new problems outisde of my familiar zone for an extra 5 radius . so my total reasoning purview is radius 15 and my "Novel Reasoning Ability" ratio is 5/10 = 50% ... so my "Novel Reasoning Ability" ratio is 5x better than salena.

im pretty confident that i could train a salena to increase her Familiar Reasoning Purview to radius 7, and assuming her "Novel Reasoning Ability" ratio stays the same then her Total Reasoning Purview immediately becomes radius 7.7 .. and i could probably increase her "Novel Reasoning Ability" ratio to something like 25% so that now her Total Reasoning Purview immediately becomes radius 8.4 .. but idk how to increase anyone's "Novel Reasoning Ability" ratio beyond 50% because that is my own "Novel Reasoning Ability" ratio and TBH i just don't know how to go beyond that myself and i don't know how to make someone else go beyond that

now ARC is a problem that is super far away

ARC requires me to increase my Novel Reasoning Ability by a fuckton, which i dont know how to do! as stated earlier.

or Waitt.... another option is it requires me to increase my Familiar Reasoning Purview by a fuckton .. which i do know how to do, i think! maybe? perhaps?

Hmmmmmm...... hmmm.......... hmmmmm........ i lost my train of thought but i will perhaps come back to this later

Also. as stated earlier, ARC is a problem that is really has me at the edge of my own reasoning ability, and it's sorta requiring me to stretch/expand my reasoning ability (whether the Familiar Reasoning Purview or the Novel Reasoning Ability Ratio) .... but aside from that, ARC IS A PROBLEM ABOUT STRETCHING/EXPANDING THE REASONING ABILITY OF AI ...... so there is almost a catch22-esque situation!!!

also, semirelated --
they noticed that o3's performance is actually not rly correlated with how difficult humans found the task, but is actually correlated with the task size/length !! isnt that interesting !!!!

and im lowk realizing that for myself too like fuck i just cant keep the whole fucking context in my head for ARC bc there's just so many goddamn unknowns and so many goddamn things to experiment on and think about

https://x.com/mikb0b/status/1871573534201536861

https://www.reddit.com/r/singularity/comments/1hlsh1p/o3_failure_rate_on_arc_agi_correlates_with_grid/

(same chart just different caption)

bfs maze gif
dfs maze gif

here are two gifs showing DFS vs BFS (breadth first search vs depth first search)

rn one of my bottlenecks is i am mentally managing a fuckton of BFS branches and it’s taking up too much mental RAM and im somehow not able to prune the branches and/or focus on only a couple things to DFS
and not just that but there’s so many future dead ends that im keeping in mind (even tho i havent reached them yet, but i hypothesize that they will be dead ends), and that is taking up a lot of mental space too

anyway . so i feel like there are really two main categorical approaches to getting salena to pass APMA1650

== category 1 ==
train her actual reasoning

choice 1, help her improve her Familiar Reasoning Purview (via grinding out practice problems + deep understanding)
choice 2, help her improve her Novel Reasoning Ability Ratio (via making her enjoyably struggle through the reasoning process for novel problems
choice 3, help her improve both (the AI equivalent of any of these would require me to be actually cracked at AI, like i would be working at OpenAI/Anthropic/Deepmind already if i was that good)

== category 2 ==
work with her so i can improve her static cheat sheet
(which will include not just knowledge but also directions to try to emulate reasoning -- obv not as good as actually having reasoning skill, but perhaps a decent substitute. like, e.g. when i made her memorize a step-by-step problem-solving approach for one kind of PDF CDF problem for a stats midterm)
(btw, this static cheat sheet could also include directions for her to, during the test, update a dynamic cheat sheet)
(the AI equivalent of this is just prompt engineering + creating a program to facilitate the AI's workflow)

okay . i feel like this helped me staighten out some of my thoughts .
idk how but i think it helped.

god i just wanna give up but also i feel like it’s actually trivial to get a higher score on arc-agi-pub than the top score that isnt o3 so i wanna continue but also that is lowk not even impactful other than the fact that i will get clout so i wanna give up but also then i will have ethos so i will be able to preach my awesome ideas so i wanna continue

wait it’s actually crazy how much articulating my thoughts to anohter person helps me

like sometimes it just feels like “oh thats a stupid cliche study tip” and then ill either Not articulate it or just articulate it to Myself … but no like it acc helps so much 😭

i should do this more often

2024-12-27 yapping to friend about arc

More from corbin
All posts

2024-12-21 yapping to friend about o1/o3/search

2024-12-27 yapping to friend about arc

More from corbinAll posts

2024-12-21 yapping to friend about o1/o3/search

More from corbin
All posts