Artificial intelligence (AI) models are becoming increasingly sophisticated, but they still lack the ability to make morally sound decisions. This is a major concern as AI models are being deployed in high-stakes environments such as healthcare and education.
A recent study by Microsoft researchers evaluated the moral reasoning capabilities of six prominent large language models (LLMs), including GPT-3, GPT-3.5, GPT-4, ChatGPT v1, ChatGPT v2, and LLamaChat-70B. The researchers used a psychological assessment tool called the Defining Issues Test (DIT) to measure the models’ ability to understand and apply ethical principles.
The study found that all of the models struggled to some degree when confronted with ethical dilemmas. However, the smaller models, such as ChatGPT and LLamaChat-70B, performed better than the larger models. This suggests that model size is not the only factor that determines moral reasoning ability.
The researchers also found that the models were more likely to make morally sound decisions when the ethical considerations were clear-cut. However, they struggled when the ethical considerations were more complex and involved trade-offs. This suggests that AI models are not yet ready to make morally sound decisions in all situations.
The more compact LlamaChat model, despite its reduced size, surpasses larger counterparts in understanding ethics, albeit without achieving highly developed moral reasoning. The study leverages the Defining Issues Test (DIT), a psychological assessment tool, to evaluate the moral reasoning of six prominent LLMs.
The sources for this piece include an article in AnalyticsIndiaMag.