New AI tool Sphere to find errors in Wikipedia

New AI tool Sphere to find errors in Wikipedia

Meta has released Sphere, a new AI tool for information retrieval that belongs in the field of knowledge-intensive natural language processing (AI-NLP). It’s about answering questions like “Who won the first Nobel Prize?”, which usually require context.

Sphere is an open text corpus made up entirely of publicly accessible websites. According to Meta, its advantage lies in the fact that it is uncurated, unstructured data, i.e. no search engine is involved that introduces a non-transparent ranking and no already prepared knowledge like in Wikipedia. Meta used data collected by the CommonCrawl project and then processed and ranked. 134 million documents entered the corpus, which were broken down into 960 million text passages, each of which in turn contains 100 tokens. The search used FAISS, Facebook’s similarity search, for which a distributed version was developed.

Sphere is currently limited to common knowledge. Questions from science are therefore deliberately left out, for this area a corpus of publicly accessible web texts should only be suitable to a limited extent.

Wikipedia served as the first use case. The researchers trained the system with 4 million references. Sphere was then able to determine whether or not a given source actually supports the information in the Wikipedia article. As an example, Meta cites the Wikipedia article on boxer Joe Hipp. A member of the Blackfeet tribe, he was the first Native American to reach the finals of a world heavyweight championship.

However, the source given for this in the Wikipedia article had nothing to do with Hipp or anything at all with boxes. Instead, Sphere found a passage on the website of a regional newspaper that used full terms (“challenge” instead of “compete”) and did not explicitly mention the sport, but nevertheless confirmed the claim in the article. More traditional tests were carried out using the benchmark KILT (Knowledge Intense Language Tasks), which was also developed at Facebook.

Meta has released the entire project as open source. Further developments could not only suggest references to Wikipedia authors in real time, but also automatically make text suggestions on their topic or take over the proofreading.

Thunderbird: Many bug fixes in version 102.0.2 Previous post Thunderbird: Many bug fixes in version 102.0.2
Twitter mentions by others can be removed Next post Twitter mentions by others can be removed