Google DeepMind Introduces Mind Evolution: Enhancing Natural Language Planning with Evolutionary Search in Large Language Models

It can significantly enhance LLMs’ problem-solving capabilities by guiding them to think more deeply about complex problems and effectively utilize inference-time computation. Prior research has explored various strategies, including chain-of-thought reasoning, self-consistency, sequential revision with feedback, and search mechanisms guided by auxiliary verifiers or evaluators. Search-based methods, particularly when paired with solution evaluators, leverage additional […] The post Google DeepMind Introduces Mind Evolution: Enhancing Natural Language Planning with Evolutionary Search in Large Language Models appeared first on MarkTechPost.

Jan 22, 2025 - 19:00

Google DeepMind Introduces Mind Evolution: Enhancing Natural Language Planning with Evolutionary Search in Large Language Models

It can significantly enhance LLMs’ problem-solving capabilities by guiding them to think more deeply about complex problems and effectively utilize inference-time computation. Prior research has explored various strategies, including chain-of-thought reasoning, self-consistency, sequential revision with feedback, and search mechanisms guided by auxiliary verifiers or evaluators. Search-based methods, particularly when paired with solution evaluators, leverage additional computational resources to explore a broader set of solution candidates. Techniques like best-of-N and tree search harness this capability to increase the likelihood of identifying successful solutions by examining a more extensive solution space.

Recent efforts have combined LLMs with evolutionary search for optimization tasks, such as numerical and combinatorial problems and natural language planning. Unlike earlier studies that required task formalization in structured spaces, these approaches evolve solutions directly in natural language, bypassing the need for expert knowledge in formalizing tasks. Evolutionary search has also been applied to prompt optimization and multi-agent system design, such as EvoAgent, which evolved agents for problem-solving. However, these approaches often achieved limited success compared to methods like Gemini 1.5 Flash, demonstrating significant improvements in tasks like the TravelPlanner benchmark. Additionally, program-based evaluators integrated during evolutionary search provide reliable feedback to refine solutions, a technique widely adopted in code generation and response refinement across various domains. While learned feedback models or self-evaluators have been explored, they often suffer from noise and unreliability, presenting opportunities for future advancements.

Researchers from Google DeepMind, UC San Diego, and the University of Alberta introduced Mind Evolution, an evolutionary search strategy designed to enhance inference-time computation for LLMs. Unlike previous methods like Best-of-N or sequential refinement, Mind Evolution uses a genetic approach to iteratively generate, refine, and recombine candidate solutions in natural language. It avoids formalizing tasks by relying on a solution evaluator, enabling higher success rates in natural language planning tasks like TravelPlanner and Natural Plan. Mind Evolution achieved 95.6% success on TravelPlanner and introduced new benchmarks like StegPoet, showcasing its versatility across challenging, non-formalized domains.

Mind Evolution integrates a genetic search approach with an LLM and customized prompts to efficiently address natural language planning tasks. It employs language-based genetic algorithms, where solutions are represented in natural language, enabling LLMs to facilitate key operations like crossover, mutation, and island reset. The process begins by generating initial solutions through LLM-driven prompts. Solutions are iteratively refined using a “Refinement through Critical Conversation” (RCC) process involving critic and author roles for evaluation and improvement. The framework incorporates Boltzmann tournament selection, cyclic migration between islands, and periodic island resets to sustain diversity and optimize solutions effectively.

The experiments evaluate Mind Evolution on three natural language planning benchmarks: TravelPlanner, Trip Planning, and Meeting Planning, excluding Calendar Scheduling due to its simplicity. The primary model, Gemini 1.5 Flash, is used with specified hyperparameters, while a two-stage approach incorporates Gemini 1.5 Pro for unsolved cases, improving cost efficiency. Mind Evolution outperforms baselines, achieving over 95% success in TravelPlanner and Trip Planning and 85% in Meeting Planning, with near-perfect results using the two-stage approach. Metrics such as success rate, LLM calls, token usage, and API costs highlight the efficiency of Mind Evolution’s evolutionary search strategy compared to baselines.

In conclusion, Mind Evolution introduces an evolutionary search strategy to enhance inference-time computation for complex natural language planning tasks, focusing on stochastic exploration and iterative refinement. Unlike methods relying on formal solvers, Mind Evolution leverages language models to generate, recombine, and refine candidate solutions, requiring only a solution evaluator. It outperforms strategies like Best-of-N and Sequential Revision in benchmarks such as TravelPlanner, Natural Plan, and the newly introduced StegPoet. Controlling for inference costs, it achieves remarkable success, solving over 98% of problem instances in TravelPlanner and Natural Plan benchmarks using Gemini 1.5 Pro, demonstrating its effectiveness without formal solver dependency.

https://platform.twitter.com/embed/Tweet.html?creatorScreenName=https%3A%2F%2Ftwitter.com%2Fasifrazzaq1988&dnt=true&embedId=twitter-widget-1&features=eyJ0ZndfdGltZWxpbmVfbGlzdCI6eyJidWNrZXQiOltdLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X2ZvbGxvd2VyX2NvdW50X3N1bnNldCI6eyJidWNrZXQiOnRydWUsInZlcnNpb24iOm51bGx9LCJ0ZndfdHdlZXRfZWRpdF9iYWNrZW5kIjp7ImJ1Y2tldCI6Im9uIiwidmVyc2lvbiI6bnVsbH0sInRmd19yZWZzcmNfc2Vzc2lvbiI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfZm9zbnJfc29mdF9pbnRlcnZlbnRpb25zX2VuYWJsZWQiOnsiYnVja2V0Ijoib24iLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X21peGVkX21lZGlhXzE1ODk3Ijp7ImJ1Y2tldCI6InRyZWF0bWVudCIsInZlcnNpb24iOm51bGx9LCJ0ZndfZXhwZXJpbWVudHNfY29va2llX2V4cGlyYXRpb24iOnsiYnVja2V0IjoxMjA5NjAwLCJ2ZXJzaW9uIjpudWxsfSwidGZ3X3Nob3dfYmlyZHdhdGNoX3Bpdm90c19lbmFibGVkIjp7ImJ1Y2tldCI6Im9uIiwidmVyc2lvbiI6bnVsbH0sInRmd19kdXBsaWNhdGVfc2NyaWJlc190b19zZXR0aW5ncyI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfdXNlX3Byb2ZpbGVfaW1hZ2Vfc2hhcGVfZW5hYmxlZCI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9LCJ0ZndfdmlkZW9faGxzX2R5bmFtaWNfbWFuaWZlc3RzXzE1MDgyIjp7ImJ1Y2tldCI6InRydWVfYml0cmF0ZSIsInZlcnNpb24iOm51bGx9LCJ0ZndfbGVnYWN5X3RpbWVsaW5lX3N1bnNldCI6eyJidWNrZXQiOnRydWUsInZlcnNpb24iOm51bGx9LCJ0ZndfdHdlZXRfZWRpdF9mcm9udGVuZCI6eyJidWNrZXQiOiJvbiIsInZlcnNpb24iOm51bGx9fQ%3D%3D&frame=false&hideCard=false&hideThread=false&id=1881845919643279686&lang=en&origin=https%3A%2F%2Fwww.marktechpost.com%2F2025%2F01%2F21%2Fgoogle-ai-releases-gemini-2-0-flash-thinking-model-gemini-2-0-flash-thinking-exp-01-21-scoring-73-3-on-aime-math-and-74-2-on-gpqa-diamond-science-benchmarks%2F&sessionId=1ed5eb4f69a672c17d5876887b650b313bb00bf6&siteScreenName=Marktechpost&theme=light&widgetsVersion=2615f7e52b7e0%3A1702314776716&width=550px

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 65k+ ML SubReddit.

read more

Tags:

Previous Article

The Rise of LLMOps in the Age of AI

Next Article

Samsung’s Galaxy S25 arrives with a better Google Gemini

What's Your Reaction?

0

Like

0

Dislike

0

Love

0

Funny

0

Angry

0

Sad

0

Wow

Related Posts

Blown Away

Jan 14, 2025 0

Top AI Newsletters to Follow in 2024

Jan 14, 2025 0

Building a Local Face Search Engine — A Step by Step Guide

Jan 14, 2025 0

Name

Email

Comment

Google DeepMind Introduces Mind Evolution: Enhancing Natural Language Planning with Evolutionary Search in Large Language Models

Tags:

What's Your Reaction?

Related Posts

Popular Posts