|
[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)
|
|
1
|
5
|
1 November 2025
|
|
[Video Response] What Cloudflare's code mode misses about MCP and tool calling
|
|
1
|
2
|
19 October 2025
|
|
[Paper Analysis] On the Theoretical Limitations of Embedding-Based Retrieval (Warning: Rant)
|
|
1
|
1
|
11 October 2025
|
|
AGI is not coming!
|
|
1
|
3
|
9 August 2025
|
|
Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)
|
|
1
|
1
|
23 July 2025
|
|
Energy-Based Transformers are Scalable Learners and Thinkers (Paper Review)
|
|
2
|
3
|
20 July 2025
|
|
On the Biology of a Large Language Model (Part 2)
|
|
1
|
5
|
3 May 2025
|
|
On the Biology of a Large Language Model (Part 1)
|
|
1
|
6
|
5 April 2025
|
|
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (Paper Explained)
|
|
1
|
2
|
26 January 2025
|
|
Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)
|
|
1
|
22
|
24 December 2024
|
|
Safety Alignment Should be Made More Than Just a Few Tokens Deep (Paper Explained)
|
|
1
|
10
|
10 December 2024
|
|
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters (Paper Explained)
|
|
1
|
2
|
23 November 2024
|
|
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
|
|
1
|
1
|
19 October 2024
|
|
Were RNNs All We Needed? (Paper Explained)
|
|
1
|
6
|
12 October 2024
|
|
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters (Paper)
|
|
1
|
2
|
5 October 2024
|
|
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models (Paper Explained)
|
|
1
|
3
|
4 August 2024
|
|
Scalable MatMul-free Language Modeling (Paper Explained)
|
|
1
|
6
|
8 July 2024
|
|
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools (Paper Explained)
|
|
1
|
19
|
26 June 2024
|
|
xLSTM: Extended Long Short-Term Memory
|
|
1
|
17
|
1 June 2024
|
|
[ML News] OpenAI is in hot waters (GPT-4o, Ilya Leaving, Scarlett Johansson legal action)
|
|
1
|
51
|
21 May 2024
|
|
ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)
|
|
1
|
27
|
1 May 2024
|
|
[ML News] Chips, Robots, and Models
|
|
1
|
61
|
30 April 2024
|
|
TransformerFAM: Feedback attention is working memory
|
|
1
|
25
|
28 April 2024
|
|
[ML News] Devin exposed | NeurIPS track for high school students
|
|
1
|
30
|
27 April 2024
|
|
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
|
|
1
|
44
|
24 April 2024
|
|
[ML News] Llama 3 changes the game
|
|
1
|
65
|
24 April 2024
|
|
Hugging Face got hacked
|
|
1
|
20
|
17 April 2024
|
|
[ML News] Microsoft to spend 100 BILLION DOLLARS on supercomputer (& more industry news)
|
|
1
|
58
|
15 April 2024
|
|
[ML News] Jamba, CMD-R+, and other new models (yes, I know this is like a week behind 🙃)
|
|
1
|
27
|
13 April 2024
|
|
Flow Matching for Generative Modeling (Paper Explained)
|
|
1
|
26
|
8 April 2024
|