As the Gen AI/LLM space continues to heat up, I decided to put several Large Language Models (LLMs) to the test by presenting them with a challenging problem statement: creating millions of portfolio combinations from thousands of equities, backtesting them for efficiency, and deploying the top-performing ones. This task requires advanced reasoning and inference capabilities, making it an ideal candidate for LLMs.
The Problem Statement
Portfolio optimization is a classic problem in finance, where the goal is to create a diversified portfolio that maximizes returns while minimizing risk. With thousands of equities to choose from, the number of possible portfolio combinations is staggering, making it a computationally intensive task. I posed the following question (simplified & not the exact prompt) to the LLMs:
“Design a system to generate millions of portfolio combinations from thousands of equities, backtest them for efficiency, and deploy the top-performing ones.”
The Models
I tested seven different LLMs (with HuggingChat, you can use LM Studio too), each with unique strengths and weaknesses. The models were:
- CohereForAI/c4ai-command-r-plus
- Meta-llama/Meta-Llama-3–70B-Instruct
- HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
- Mistralai/Mixtral-8x7B-Instruct-v0.1
- Google/gemma-1.1–7b-it
- NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO
- Mistralai/Mistral-7B-Instruct-v0.2
The Results
The most impressive outputs came from Llama 3 and Cohere Command R+, which provided a comprehensive outline design and project schedule. Llama 3’s output included:
- Reasonably good design inputs around Data Ingestion & PreProcessing, GNN Model Architecture (with Graph Convolution Network as well as Graph Attention Network), Portfolio Generation & BackTesting, Ranking & Recommendations
- A very good scaffolding code with TensorFlow
Cohere Command R+ covered the basics and provided additional inputs around:
- Audit Review/Reporting
- Risk Analysis
- Regulatory Compliance (Explainability, Ethical Considerations)
- Documentation
Cohere Command R+ also provided more practical code and usage of packages like StellarGraph.
Comparison and Analysis
While all models provided some level of insight which differentiated themselves, Llama 3 and Cohere Command R+ stood out for their comprehensive and well-structured outputs. Noticeably, Gemma 1.1’s output was more conservative, while Mistral 7B failed to respond through HuggingChat (Possibly being a weekend).
Conclusion
This experiment demonstrates the potential of Large Language Models in tackling complex problems like portfolio optimization. While there is still much to be learned, the outputs from Llama 3 and Cohere Command R+ provide a solid foundation for further development. I plan to test drive these codes and publish my findings in the coming weeks.
Takeaways
- LLMs can be used to generate baseline approaches to complex problems like portfolio optimization
- Different models have unique strengths and weaknesses, and selecting the right model for the task is crucial
- Further development and refinement are necessary to create a production-ready system
I hope this experiment inspires others to explore the capabilities of Large Language Models and their potential applications in finance and beyond.
Original article published by Senthil Ravindran on LinkedIn.