3

FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing

Illegally using fine-tuned diffusion models to forge human portraits has been a major threat to trustworthy AI. While most existing work focuses on detection of the AI-forged contents, our recent work instead aims to mitigate such illegal domain adaptation by applying safeguards on diffusion models. Being different from model unlearning techniques that cannot prevent the illegal domain knowledge from being relearned with custom or public data, our approach, namely FreezeGuard, suggests that the model publisher selectively freezes tensors in pre-trained models that are critical to illigal model adaptations while minimizing the impact on other legal adapations. Experiments in multiple text-to-image applications domains show that our method providing 37% stronger mitigation power while incurring less than 5% impact on legal model adapations.

Kai Huang, Haoming Wang, Wei Gao

December 2024 In arXiv

FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing

Achieving Sparse Activation in Small Language Models

Being different from model compression that requires expensive retraining, sparse activation can effectively reduce neural network models’ inference cost at runtime without any prior retraining or adaptation efforts. Although sparse activation has been proved to be effective on Large Language Models (LLMs) that are usually redundant (e.g., OPT and BLOOMZ models), its applicability on recent Small Language Models (SLMs) with higher parameter efficiency remains questionable. Our recent work verified such possibility by using gradient-based attribution scores to evaluate neurons’ importance in inference, in both analytical and experimental perspectives. Our results show that we can achieve up to 80% sparsity in major SLM models, including Phi-1.5/2 and MobiLlama-0.5B/1B, with less than 5% model accuracy loss on QA tasks.

Jifeng Song, Kai Huang, Boyuan Yang, Wei Gao

June 2024 In arXiv

Achieving Sparse Activation in Small Language Models