Discussion about this post

User's avatar
Maxime @Storen's avatar

The empiricism take reminds me of Stephen Wolfram's post (What Is ChatGPT Doing … and Why Does It Work?), with examples like the temperature parameter, the embedding architecture (addition of a token's value and its position), or the split of the embedding vector in attention blocks, that are parts of a "lore" rather than a rigorous scientific theory ! As long as it works, all is fine :))

No posts

Ready for more?