Bingwanxing's Blog

Diversity is essential to happiness

0%

On-device LLM Deployment-3

This week focus on paper reading and basic knowledge learning.

First,I summarised the difference of QAT and PTQ.

Quantization-Aware Training (QAT) Core Method

Simulated Quantization During Training,Full Model Fine-Tuning,Quantizing Weights and Activations.

Post-Training Quantization (PTQ) Core Method

Quantization After Training,Static Quantization,Dynamic Quantization.

Here is a chart about comparison of them.

Feature Quantization-Aware Training(QAT) Post-Training Quantization (PTQ)
Training Requires retraining with quantization No retraining needed, applied aftertraining
Use Case Best for high-precision tasks on resource.constrained devices Fast deployment, suited for simplerquantization tasks
Accuracy Loss Minimal, close to floating-point accuracy Potential for higher accuracy loss
Efficiency Gains High efficiency on low-precision hardware Also boosts efficiency, but lessoptimal for some models
Complexity Higher complexity due to simulatedquantization during training Simpler implementation

Secondly,I go over knowledge of transfomer,whose note is on the goodnotes.

我总感觉像生活在大海上,受到威胁,然而心中存有巨大的幸福。————加谬