![A 6x4 black and white contact sheet print of a miner performing the act of swinging a pickaxe. Each frame shows the course of the swing over his head until it finally meets the ground.](https://jweberle.com/wp-content/uploads/2023/10/chjpdmf0zs9zdgf0awmvzmlszxmvd2vic2l0zs8ymdiylta4l21pytyzota5lwltywdllmpwzw.jpg)
My website was used to train an A.I. chatbot. Now what?
I started this site in 2011 as a place to house my personal and professional writing. Over the past 12 years, I’ve poured my heart and soul into this website and in the process, published just shy of 180,000 words—nearly the length of The Fellowship of the Rings by J.R.R. Tolkien. My intention when I signed up for WordPress and purchased a domain name all those years ago was to practice writing and build my brand as an author and storyteller. I never expected that someone would steal my work for their own profit.
As a digital native, I’ve grown up with the idea that most things on the Internet are free. There are pros and cons to this, but many websites can be accessed and enjoyed without paying a fee. My site is one of those that anyone can stumble upon and read. While I’d love to make some money from my work, the purpose of this blog is more about honing my skills and building an audience than a way to monetize my writing. However, just because I don’t charge to read my work here, that doesn’t mean it isn’t protected by copyright. In fact, if you scroll to the bottom of the homepage, you’ll find a copyright notice letting you know that all of the content on this site is my intellectual property, meaning you can’t copy or modify it.
So, imagine my surprise when I found out that my site has been scraped tens of thousands of times by tech companies to train artificial intelligence data sets, the foundation for large language models like ChatGPT, Bard, and a host of others. The Washington Post recently published a fascinating analysis of the websites that are used by Google’s developers to train its A.I. To date, the tech giant has scraped content from more than 15 million websites and uses that enormous trove of text to teach its A.I. products how to read, write, and mimic human language. Included in this massive data set is content copied from 3.8 million personal blogs—sites like mine. The Washington Post allows you to search the database and see how many tokens (usually a word or phrase) have been gleaned from which sites.
Out of curiosity, I typed in jweberle.com to see what would happen. Spoiler alert: I’m on the list. Pretty far down the list, mind you, ranking at 239,649th out of those 15 million websites. It turns out Google has scraped 95,000 words from my site. Ninety-five thousand data points that are used to train its A.I. to write like me and parrot my words back to one of its users, who will then use my words, without attribution or compensation, in some new, derivative work.1 And this is just one data set used by one company. OpenAI, the maker of ChatGPT, doesn’t disclose where it gets the content it trains its products on, but I assume I’m on that list, too.
![](https://jweberle.com/wp-content/uploads/2023/10/screenshot-2023-10-25-at-11.18.34-am.png?w=711)
I don’t pretend to understand the legal landscape of how corporations can scoop anything and everything on the internet into their algorithms, but I do know that these companies are making money off of my work in violation of my copyright. That doesn’t feel right.
The simple fact is that writers deserve to be paid for their work. If Google were my client and I was charging middle-of-the-road freelance rates (about 30–50 cents per word2), the company would owe me roughly $38,000, assuming that each token corresponds to one word.
In his recent article on LinkedIn about the theft of intellectual property by the tech industry, media entrepreneur Joe Lazauskas argues that since the innovation in A.I. is being fueled by the creative output of writers and artists, they are entitled to compensation. Lazauskas quotes NYU Professor of Marketing Scott Galloway, who compares creatives to coal miners, “pulling ideas out of the earth that didn’t exist before and refining them”3 to fuel the insatiable appetite of the machine.
I think the analogy is apt. We’re the ones mining the coal. We’re the ones using our imagination and talent to create something from nothing—art, literature, music—that is in turn being funneled into these enormous data sets, ingested by A.I., and regurgitated as more art, literature, and music without the original creator’s knowledge or consent. Without us, there is no coal to burn and the machine stops.
Without us, there is no coal to burn.
OpenAI, the company behind ChatGPT, is projected to rake in $1 billion in profits this year by selling access to its A.I. tools, and Bloomberg estimates that artificial intelligence will be a $1.3 trillion industry by the year 2032. That is staggering, even more so because all that wealth is built on the labor of millions of unpaid artists.
I hate knowing that this very blog post is likely to become fodder for some large language model. It makes me hesitant to share my work at all, to be leery of posting my fiction where anyone can access it. I even worry about posting transcripts of my podcasts, which are crucial for accessibility, for fear that I’m inviting these companies to scrape even more of my hard work to improve their products. I’m sure other artists are also reconsidering how much they share online, stifling creativity and free expression on the internet.
That’s not the kind of world I want to live in. Something has to change. It’s time for the big tech companies to start paying the creatives who put the intelligence in artificial intelligence. And they need to start now.
— 30 —
Jonny Eberle is a writer, podcaster, and storyteller mining the depths of the human experience. He lives in Tacoma, WA with his family, a dog, and three adorable typewriters. His writing has been published in Creative Colloquy, Grit City Magazine, and All Worlds Wayfarer. You can listen to his audio drama, The Adventures of Captain Radio, and his writing podcast, Dispatches with Jonny Eberle, wherever you enjoy podcasts. If you want to hear more of his thoughts on the future of A.I., check out Dispatches Season 2, Episode 7, “Will Artificial Intelligence Replace Human Writers?”
If you liked this post, please consider subscribing to Jonny’s monthly email newsletter for news, curated reading recommendations, and more delivered straight to your inbox and away from the prying eyes of artificial intelligence companies. Thanks for reading!
Footnotes
- My other website, which I built for Obscure Studios, my film and podcast production company, obscurestudios.net, is also in Google’s C4 data set. It has been scraped 1,500 times. Many thanks to the journalists at The Washington Post for their reporting. ↩︎
- 2023 freelance writing rate data courtesy of Tim Stoddart at Copyblogger.com ↩︎
- This quote comes from an episode of Galloway’s podcast, Pivot, which you can listen to here. ↩︎
Copyright 2011–2024 Jonny Eberle. All rights reserved.