Welcome to this Map of Content.
Notes
- 0. Glossary (NLP) - - Natural Language Processing (NLP): NLP is a field of linguistics and machine learning focused …
- Keyword Extraction & Topic Modelling - 1. Keyword Extraction 1. TF 2. TF-IDF 3. RAKE ([rake-keyword](https://github.com/u-prashant/RAKE…
- 3. Word Vectors - - Knowledge-based representation (e.g. WordNet) - Might miss nuance (e.g. “proficient” is listed…
- 5. Tokenization - - Character Tokenization: The simplest tokenization scheme is to feed each character individuall…
- 2. Language Modeling & N-grams - In general, language modeling is the task of predicting what word comes next. - Statistical LM (N-…
- 4. RNNs & CNNs for Text Classification - Improvements over n-gram - No sparsity problem - Model size is not Remainin…
- LM Benchmarks - For Coding: - HumanEval: Python coding tasks (higher % = better) - MBPP: Python programm…
- Large Language Models - Characteristics of LLMs: - Scale: They contain millions, billions, or even hundreds of billi…
- 1. What is NLP? - NLP is a field of linguistics and machine learning focused on understanding everything related to hu…
- KV Cache - A performance optimization technique used in Large Language Models (LLMs) to speed up text generation by storing the Key and Value vectors of previous tokens.