0. Glossary

Natural Language Processing (NLP): NLP is a field of linguistics and machine learning focused on understanding everything related to human language.
Large Language Model (LLM): A large language model (LLM) is an AI model trained on massive amounts of text data that can understand and generate human-like text, recognize patterns in language, and perform a wide variety of language tasks without task-specific training.
RNN: A recurrent neural network processes text sequentially, one word at a time, maintaining a hidden state that gets updated at each step to remember information from earlier in the sequence.
Transformer: A transformer processes all words in parallel and uses attention mechanisms to let each word directly look at and gather information from all other words in the sequence.
LLM Quantization: LLM quantization is a technique that shrinks large language models by reducing the precision of their numerical weights (e.g., from 32-bit to 8-bit or 4-bit integers), making them smaller, faster, and more efficient for deployment on devices with limited memory, while ideally preserving most of their performance. This process lowers memory usage, speeds up inference (predictions), and allows models to run on consumer hardware, though it involves a trade-off with potential minor accuracy loss.
Embedding: In machine learning, embeddings are dense, low-dimensional numerical vector representations of complex, high-dimensional data (like words, images, or users) that capture semantic meaning, placing similar items closer together in a vector space.