Large language models such as ChatGPT have demonstrated proficiency in selecting multiple-choice answers on financial licensing exams, yet they encounter difficulties when tasked with more complex, nuanced activities. This was highlighted in a study by Washington State University, which scrutinised over 10,000 responses from artificial intelligence models, including BARD, Llama, and ChatGPT, to questions from financial exams.
The study, spearheaded by DJ Fairhurst from WSU’s Carson College of Business, involved the models not only selecting answers but also articulating the reasoning behind their choices. These explanations were then evaluated against those given by human finance professionals. Among the models, two iterations of ChatGPT emerged as the most adept at these tasks. Nevertheless, even these versions exhibited significant inaccuracies when addressing more intricate subjects.
DJ Fairhurst remarked, “It’s far too early to be concerned about ChatGPT completely taking over finance jobs.” He explained that while the model performs admirably with well-documented broad concepts, it struggles significantly with unique, specific issues. This study, published in the Financial Analysts Journal, included questions from licensing exams like the Securities Industry Essentials exam and the Series 6, 7, 65, and 66, aiming to mirror tasks that financial professionals might actually undertake.
The researchers took the assessment a step further by requiring the models to produce written explanations for their answers, specifically choosing questions that reflect the practical job tasks of financial professionals. Fairhurst emphasised the necessity of probing beyond the models’ ability to select correct answers to understand their capabilities.
Of all the models tested, the paid version of ChatGPT, version 4.0, was most closely aligned with human expert responses, showing a substantial lead in accuracy—18 to 28 percentage points higher than its counterparts. Interestingly, when the researchers fine-tuned an earlier version of ChatGPT, version 3.5, by providing examples of correct responses and explanations, it nearly matched and occasionally exceeded the performance of version 4.0.
Despite these advances, both versions of ChatGPT still fell short in certain areas. They performed well in reviewing securities transactions and monitoring financial market trends. Still, their responses were less accurate in specialised scenarios such as assessing clients’ insurance coverage and tax status.
Fairhurst, Greene, and WSU doctoral student Adam Bozman are further exploring ChatGPT’s limitations and capabilities in a new project involving evaluating potential merger deals. By focusing on deals concluded after September 2021, a period beyond ChatGPT’s training data, they aim to gauge the model’s effectiveness in real-world scenarios. Preliminary results suggest that the AI model is not particularly adept at these tasks.
The researchers concluded that while ChatGPT may alter the employment landscape for entry-level analysts in investment banks, it is more suitable as a supportive tool rather than a replacement for seasoned financial professionals. The evolving role of AI could potentially lead to a reduction in junior analyst positions, not because ChatGPT outperforms them but because it can handle the more routine tasks that have traditionally been assigned to them. This shift, as Fairhurst notes, could make the traditional practice of hiring a large number of junior analysts and retaining only the best a more costly approach.
More information: Douglas (DJ) Fairhurst et al, How Much Does ChatGPT Know about Finance? Financial Analysts Journal. DOI: 10.1080/0015198X.2024.2411941
Journal information: Financial Analysts Journal Provided by Washington State University