Posts tagged Quantization

3 posts

LLM Distillation vs Quantization: Making Models Smaller, Smarter, Cheaper

Two strategies to shrink LLMs — one compresses weights, the other transfers knowledge. A practical guide to distillation and quantization: when to use each, how to implement them with Hugging Face, and why the real answer is both.

25 Feb 2026·9 MIN READ Read →

Getting Hands-On with Mistral AI: From API to Self-Hosted in One Afternoon

A practical walkthrough of two paths to working with Mistral — the managed API for fast prototyping and self-hosted deployment for full control — with real code covering prompting, model selection, function calling, RAG, and INT8 quantization.

24 Feb 2026·9 MIN READ Read →

Python, Transformers, and SageMaker: A Practical Guide for Cloud Engineers

Everything a cloud/AWS engineer needs to know about Python, the Hugging Face Transformers framework, SageMaker integration, quantization, CUDA, and AWS Inferentia — without being a data scientist.

24 Feb 2026·14 MIN READ Read →

Back to Blog