The story of Karen Hunter asking ChatGPT to discuss whether Bessie Smith influenced Mahalia Jackson is pretty telling. As ChatGPT was not fed information that related to the connections between the musical careers of these two influential Black women, it could not shed any light on how they were connected. It could only offer snippets of their biographies, which it gleaned from Wikipedia (or similar).
Despite their seemingly magical “knowledge,” Large Language Models (LLMs) are only able to respond with things they know (or think they know). They aren’t able to create novel connections between subjects in the way that people can.
That’s why it’s absolutely critical that LLMs be trained on content from a variety of sources, perspectives, etc. A LLM is only as good as the data it’s fed. To create truly powerful, creative, and exhaustive LLMs, we need to train them on content created by people whose voices aren’t often centered and subjects that extend far into the long tail.