2024-12-23 misc2

understand constitutional ai approach

can i do RLAIF via prompting?
like it wont change the model but it will change the prompt

evolutionary approach is dumb if it's only evolutionary and you have set compute cutoffs. allow your time to be managed like a human would manage their time.
okay maybe im being too harsh to jeremy berman
evolutionary approach is actually pretty cool (picbreeder type beat) but i just feel like it's not needed for this problem i guess? i feel like it would be much more efficient to have some sort of chain of thought with a list of attempted subrules and attempted full transform functions.
and sampling is not that bad actually. we do sampling too. i just think it wouldve been better to sample more strategically. like sample subrules instead of the full transform function.

ugh wait but if we sample subrules then we have to generate and run a whole python thing for each one
okay wait but i still think it would be much more efficient than sampling 200 fullass transform functions actually

===

i should evolutionarily sample a bunch of different simple approaches first, and get a baseline for how things perform. just do a lot of fast experiments.

  • test out qwen 32b coding. efficacy + efficiency
  • test out vision models
  • 20 questions, finetuning using only prompts, memory of guesses for this task, memory of guesses for all tasks, etc

exhaust the simple approaches first. dont get lost in math n complicated RL stuff. make the minimum simplest solution that gets us there

naive approach -> rlaif with prompting -> rlaif on reasoning?

===

remember, 7.2 mins per task
but surely it should take less than that

===

dude we just need an LLM or LLM system that is as smart as salena and as malleable as salena. then we'll be bing chilling

these surely must exist
if not, it surely must be finetunable to exist

gosh i rly need to learn how the constitutional ai works and then try replicating it so that i can see how i might be able to utilize it for ARC.

ugh but also constitutional ai only works for a set of principles that you have beforehand. what if these principles are dynamic!!!

More from corbin
All posts