Demystifying Quantization: Shrinking Models for Efficient AI