DARE (Drop And REscale) is a technique used to prepare fine-tuned models for merging. It works by reducing the redundancy of the model's learned changes (delta parameters) before they are combined. The process involves two steps: first, it randomly 'drops' a large percentage (e.g., 90% or more) of the delta parameters by setting them to zero. Second, it 'rescales' the remaining non-zero parameters by a factor of 1/(1-p), where p is the drop rate. This rescaling ensures the overall magnitude of the model's changes is preserved. By aggressively pruning less critical parameters, DARE helps to resolve interference between the skills of different models, enabling the successful merging of many specialized models into a single, multi-talented one.
DARE was introduced in the 2023 paper 'Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch' by Yu et al. The technique was developed as an efficient method to combine the abilities of multiple fine-tuned models without the performance degradation typically seen when averaging many models. The authors found that a large portion of a fine-tuned model's parameter changes could be removed without harming its specialized abilities, which is the key insight that makes DARE effective.
DARE has been quickly adopted by the open-source AI community as a powerful tool for model merging, often used in conjunction with other methods like TIES-Merging. It is a key feature in popular model merging toolkits like `mergekit`. This technique has enabled the creation of highly capable 'frankendeels' by combining numerous models that are each specialized in different domains (e.g., coding, creative writing, reasoning). It provides a computationally cheap way to create versatile, state-of-the-art models without retraining from scratch.