With the release of Casual Conversations v2, Meta hopes to assist AI researchers in making their tools and processes more universally inclusive.
It consists of face-to-face video clips featuring a diverse range of people, and it will help developers assess how well their models work for different demographic groups. It is described by Meta as “a consent-driven, publicly available dataset that enables researchers to better evaluate the fairness and robustness of certain types of AI models, with the goal of making them more inclusive.”
A thorough evaluation of literature around relevant demographic categories informed and shaped the dataset, which was formed in deliberations with experienced practitioners in fields such as civil rights. This dataset provides a granular list of 11 self-provided and annotated categories that can be used to further measure the algorithmic fairness and robustness of these AI systems.
It includes 26,467 video monologues recorded in seven countries, as well as 5,567 paid participants who provided self-identified attributes such as age and gender, and is the follow-up to the original Casual Conversations consent-driven dataset, which was released in 2021.
The inclusion of participant monologues recorded outside the United States distinguishes Casual Conversations v2 from the first version. Brazil, India, Indonesia, Mexico, Vietnam, the Philippines, and the United States are among the seven countries represented in v2. It hopes to expand the dataset to include more geographies in the future. Another distinction in the most recent dataset is that participants were allowed to speak in both their primary and secondary languages.
The new dataset will aid AI developers in addressing concerns about language barriers, as well as physical diversity, which has been a source of contention in some AI contexts. It will be accessible both externally and internally.
The sources for this piece include an article in Axios.