Embeddings Walkthrough (Part 2): Context-Dependent Embeddings, Shifting Embedding Space
We'll talk about how to make Transformer's next-token objective more in line to sentence meaning objective.
- Joint query and key similarity retrieval, e.g. Cohere Reranker
- Shifting embedding space via generating hypothetical documents or via hinting, e.g. Hypothetical Document Embeddings (HyDE), Recitation Augmented Language Models
- My Experiments to change context for embeddings: Pre-pending context, Appending context, Modifying text chunk by context
~~~
Part 1: https://www.youtube.com/watch?v=gVZryxJRdSY
Slides: https://github.com/tanchongmin/strictjson/blob/main/Experiments/Embeddings%20Walkthrough.pdf
Jupyter Notebook for my experiments: https://github.com/tanchongmin/strictjson/blob/main/Experiments/Context-Dependent-Embeddings.ipynb
OpenAI Sentence Embedding Paper: https://arxiv.org/abs/2201.10005
Cohere Reranker: https://docs.cohere.com/docs/reranking
HyDE: https://arxiv.org/abs/2212.10496
Recitation Augmented Language Models: https://arxiv.org/abs/2210.01296
~~~
0:00 Issues with Sentence Embeddings
15:30 How Retrieval Augmented Generation (RAG) is typically done
17:29 Joint Query and Key Processing without External Embeddings
36:16 Hypothetical Document Embeddings (HyDE)
41:55 Guiding LLM by Hinting
50:00 My idea: Context-Dependent Embeddings
53:16 Key idea: Multiple embeddings by context modification
1:05:50 Discussion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin