We propose Chain-of-Experts (CoE), which fundamentally changes sparse Large Language Model (LLM) processing by implementing sequential communication between intra-layer experts within Mixture-of-Experts (MoE) models.
Mixture-of-Experts (MoE) models process information independently in parallel between experts and have high memory requirements. CoE introduces an iterative mechanism enabling experts to "communicate" by processing tokens on top of outputs from other experts.
Experiments show that CoE significantly outperforms previous MoE models in multiple aspects:
These advantages constitute a "free lunch" effect, enabling efficient scaling of LLMs.
Large Language Models (LLMs) continue to push the boundaries of artificial intelligence possibilities, but efficiently scaling these models remains a major challenge. Mixture of Experts (MoE) models have emerged as a promising approach to address this challenge by activating only a portion of parameters for each token, theoretically achieving more efficient scaling. However, MoE models have the following limitations: