Mixture
嘿,大家好!这里是一个专一于AI自动体的频道!
首先,让咱们来聊聊LLM。这些模型经过在海量数据集上预训练,曾经展现出了惊人的才干,无论是了解还是生成自然言语,它们都能做得很好。但疑问来了,这些模型的规模和训练老本都很高,这让它们在实践运行中有点不实际践。
这时刻,MoA退场了!MoA经过应用多个LLM的群体长处,提供了一个翻新的处置打算。构想一下,假设每个自动体都能奉献自己的一份力气,那么最终的输入结果将会如许弱小!
MoA的结构就像是一个分层的修建,每一层都有多个LLM自动体。每个自动体都会处置上一层的输入,而后生成更精细的照应。这个环节会不时迭代,直到发生一个最终的、弱小的输入。
这种结构的好处在于,LLM之间有一种自然的单干性。钻研标明,当LLM能够参考其余模型的输入时,它们会发生更高品质的照应,哪怕这些辅佐照应的品质低于模型独立输入的品质。
Together.ai就是MoA的一个典型例子。他们开发的Together MoA在AlpacaEval 2.0上的得分高达65.1%,超越了之前上游的GPT-4o的57.5%。这不只仅是一个数字的胜利,更是MoA方法在实践运行中的成功证实。
MoA的长处不只仅在于功能的优化,还在于老本效益和灵敏性。经过经常使用多个开源模型并优化层数和自动体的数量,MoA能够在坚持高功能的同时,老本也愈加可控。
Together的MOA框架曾经开源了~ ,上方是示例代码
import osfrom together import Togetheros.environ["TOGETHER_API_KEY"] = "your_api_key_here"client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))import asynciofrom together import AsyncTogetherasync_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))user_prompt = "What is Karma Yoga as per Bhagavad Gita, Vyadha Gita, Yoga Vasistham and Tripura Rahasya?"reference_models = ["Qwen/Qwen2-72B-Instruct","Qwen/Qwen1.5-72B-Chat","mistralai/Mixtral-8x22B-Instruct-v0.1","databricks/dbrx-instruct",]aggregator_model = "mistralai/Mixtral-8x22B-Instruct-v0.1"aggregator_system_prompt = """You have been provided with a set of responses from various open-source models to the latest user query. Your task is to synthesize these responses into a single, high-quality response. It is crucial to critically evaluate the information provided in these responses, recognizing that some of it may be biased or incorrect. Your response should not simply replicate the given answers but should offer a refined, accurate, and comprehensive reply to the instruction. Ensure your response is well-structured, coherent, and adheres to the highest standards of accuracy and reliability.Responses from models:"""async def run_llm(model):"""Run a single LLM call with a reference model."""response = await async_client.chat.completions.create(model=model,messages=[{"role": "user", "content": user_prompt}],temperature=0.7,max_tokens=512,)print(f"Response from {model}: {response.choices[0].message.content}\n")return response.choices[0].message.contentasync def main():results = await asyncio.gather(*[run_llm(model) for model in reference_models])finalStream = client.chat.completions.create(model=aggregator_model,messages=[{"role": "system", "content": aggregator_system_prompt},{"role": "user", "content": ",".join(str(element) for element in results)},],stream=True,)for chunk in finalStream:print(chunk.choices[0].delta.content or "", end="", flush=True)# asyncio.run(main())await main()