AI vs. Human: Can Large Language Models Outperform Portfolio Managers

As the Gen AI/LLM space continues to heat up, I decided to put several Large Language Models (LLMs) to the test by presenting them with a challenging problem statement: creating millions of portfolio combinations from thousands of equities, backtesting them for efficiency, and deploying the top-performing ones. This task requires advanced reasoning and inference capabilities, making it an ideal candidate for LLMs.

The Problem Statement

Portfolio optimization is a classic problem in finance, where the goal is to create a diversified portfolio that maximizes returns while minimizing risk. With thousands of equities to choose from, the number of possible portfolio combinations is staggering, making it a computationally intensive task. I posed the following question (simplified & not the exact prompt) to the LLMs:

“Design a system to generate millions of portfolio combinations from thousands of equities, backtest them for efficiency, and deploy the top-performing ones.”

The Models

I tested seven different LLMs (with HuggingChat, you can use LM Studio too), each with unique strengths and weaknesses. The models were:

  1. CohereForAI/c4ai-command-r-plus
  2. Meta-llama/Meta-Llama-3–70B-Instruct
  3. HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
  4. Mistralai/Mixtral-8x7B-Instruct-v0.1
  5. Google/gemma-1.1–7b-it
  6. NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
  7. Mistralai/Mistral-7B-Instruct-v0.2

The Results

The most impressive outputs came from Llama 3 and Cohere Command R+, which provided a comprehensive outline design and project schedule. Llama 3’s output included:

  • Reasonably good design inputs around Data Ingestion & PreProcessing, GNN Model Architecture (with Graph Convolution Network as well as Graph Attention Network), Portfolio Generation & BackTesting, Ranking & Recommendations
  • A very good scaffolding code with TensorFlow

Cohere Command R+ covered the basics and provided additional inputs around:

  • Audit Review/Reporting
  • Risk Analysis
  • Regulatory Compliance (Explainability, Ethical Considerations)
  • Documentation

Cohere Command R+ also provided more practical code and usage of packages like StellarGraph.

Comparison and Analysis

While all models provided some level of insight which differentiated themselves, Llama 3 and Cohere Command R+ stood out for their comprehensive and well-structured outputs. Noticeably, Gemma 1.1’s output was more conservative, while Mistral 7B failed to respond through HuggingChat (Possibly being a weekend).

Conclusion

This experiment demonstrates the potential of Large Language Models in tackling complex problems like portfolio optimization. While there is still much to be learned, the outputs from Llama 3 and Cohere Command R+ provide a solid foundation for further development. I plan to test drive these codes and publish my findings in the coming weeks.

Takeaways

  • LLMs can be used to generate baseline approaches to complex problems like portfolio optimization
  • Different models have unique strengths and weaknesses, and selecting the right model for the task is crucial
  • Further development and refinement are necessary to create a production-ready system

I hope this experiment inspires others to explore the capabilities of Large Language Models and their potential applications in finance and beyond.

Original article published by Senthil Ravindran on LinkedIn.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top