Accusations of Plagiarism and Unethical Practices Surround Perplexity AI

July 5, 2024

Perplexity AI, a startup that merges a search engine with a large language model to provide detailed answers instead of links, is facing serious accusations. Major news outlets accuse the company of both plagiarism and unethical web scraping, raising questions about the fine line between fair use and intellectual property theft in the era of generative AI.

Forbes and Wired have accused Perplexity of plagiarizing their content. Forbes claims Perplexity republished an investigative report with minimal changes, while Wired alleges the AI produced a summary closely mirroring its original article. Both news outlets assert that Perplexity’s actions constitute plagiarism, as it used significant portions of their content without proper attribution.

Wired has also accused Perplexity of ignoring the Robots Exclusion Protocol to scrape restricted web content. The protocol is designed to prevent web crawlers from accessing certain parts of websites, but Wired claims Perplexity bypassed these restrictions to gather information for its summaries. The accusation is supported by observations that Perplexity’s IP addresses accessed restricted areas of websites to summarize URLs provided by users.

Perplexity Defense Towards Allegations

Perplexity denies any wrongdoing. The company argues that its practices fall within fair use copyright laws and that it honors publishers’ requests not to scrape content. According to Perplexity, summarizing a URL provided by a user does not equate to web crawling since it responds to specific user requests rather than systematically indexing web content.

Robots Exclusion Protocol:

Perplexity claims that visiting a URL to summarize its content at a user’s request does not constitute web crawling. However, critics argue that this distinction is meaningless, as the practice effectively replicates the results of web scraping.

Fair Use:

Fair use allows for limited use of copyrighted material for purposes like commentary, criticism, and news reporting. Perplexity maintains that its summaries are within these bounds. However, the extent to which these summaries use the original expression of the articles versus just the ideas is crucial. If the summaries too closely mimic the original text, they could be seen as infringing on copyright.

Potential Impact on Publishers

The dispute highlights broader concerns about the impact of AI on journalism. If AI systems like Perplexity continue to summarize content without proper attribution, it could undermine the financial viability of news outlets by reducing web traffic and ad revenue. This could lead to a decline in original content available for scraping, potentially resulting in AI systems training on synthetic data, which might degrade the quality and reliability of generated content.

Perplexity aims to address these concerns by negotiating advertising revenue-sharing deals with publishers. These deals would involve sharing ad revenue generated from query responses that cite content from participating publishers. Perplexity is also exploring ways to allow publishers to use its technology to enhance their own websites and products.

In conclusion, the allegations against Perplexity AI underscore the complex ethical and legal challenges that arise as generative AI becomes more prevalent. The resolution of these issues will likely shape the future relationship between AI companies and content creators, determining how they can coexist in a rapidly evolving digital landscape.

Post Views: 933