The great science fiction author Isaac Asimov wisely provided us with his three laws of robotics to ensure that robots acted ethically and, by extension, that we humans act ethically in our use of robots:
- “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”
- “A robot must obey orders given it by human beings except where such orders would conflict with the First Law.”
- “A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.”
Yes, Asimov may have created these laws as an organizing principle to drive the plot of his stories, but that doesn’t mean that we as business leaders can’t learn from them. Take a look in your email inbox or your newsfeed, and you’ll see that privacy and the proper use of digital data are front and center. Dig a little deeper and you’ll notice that it’s the use of robots or, more specifically, web-scraping bots, that’s behind the data privacy concerns.
A web-scraping bot is software or code that automates a process for collecting data from the internet. This type of web scraping has existed for a long time, and can take both a “good” and “bad” forms. “Good” bots are mutually beneficial for both the scraper and the “scrape-e.” For example, Google uses bots to index web content for its search engine, which then drives users to those websites. On the flip side, “bad bots” have some sort of malicious intent for gathering the data, such as identify theft, online fraud or scams, and denial-of-service attacks.
In the recent Facebook data privacy scandal with Cambridge Analytica, there are two primary ways that web-scraping bots are involved. First, about 270,000 people downloaded an app called “This Is Your Digital Life,” which used Facebook’s login feature for users to create accounts, thus opting users in to share their own personal profile data as well as some data on all of their friends. This gave the app access to use web-scraping bots to collect data from about 87 million Facebook users. In violation of Facebook’s terms of service, this data was shared with Cambridge Analytica. The second web-scraping issue was revealed in April, when CEO Mark Zuckerberg told reporters that public profile data for all of Facebook’s two billion users may have been scraped by malicious bots through the search and account recovery features using phone numbers and email addresses. Essentially, these bots can be used to match pieces of personal data from across multiple sources to create a more complete profile that may be used in all sorts of scams.
LinkedIn also is battling data privacy concerns. Back in 2015, LinkedIn settled a class action lawsuit by agreeing to pay $13 million to compensate users that signed up for its “add connections” feature. This service gave LinkedIn permission to scrape their email address book and send out emails on their behalf to request connections, but users felt LinkedIn went too far by sending out a barrage of emails that damaged individual reputations. LinkedIn currently is appealing a recent court decision that gave third parties the right to use web-scraping bots to collect public profile data on the site’s 500 million users. The judge ruled that LinkedIn did not have the right to block access to data that its own users had deemed to be public.
I’d imagine that Asimov would point us back to his three laws of robotics. First, the use of web-scraping bots should not harm the individual or company that they are collecting data on. This drives right at the dichotomy of “good” versus “bad” bots. As a business leader, using automated scraping bots to gather publicly available data on market trends, customer reviews, partner strategy and competitive intelligence doesn’t directly harm the individual or company. If a company is using web-scraping bots to overload a competitor’s server and shut down its website or gather personal information from behind a firewall, now that’s malicious intent.
Second, web-scraping bots must follow orders that are given to it to ensure proper collection of the data. Here are the orders the bots should follow:
- Every website actually has a common place where it provides instructions to bots (the automated scraping behaviors they allow and don’t allow on their site), called a “robots.txt” file. This file is commonly recognized and utilized to guide web scraping, so follow it. It’s easy to find: just add “robots.txt” after the URL.
- If a website offers an API to access its data, use it. Twitter is a perfect example.
- Gather data at a reasonable rate and time. In other words, don’t bombard a website with so many requests that it overloads the server, and try scraping during off-peak times.
- Be transparent. Identify your web-scraping bot and explain what you are doing.
Finally, a web-scraping bot should protect its own existence while adhering to the first two laws, of course. What exactly does that mean? A web-scraping bot is only as good as the code that makes it run. Since the digital environment in which these bots are operating continues to advance at an exponential rate, they run the risk of becoming obsolete very quickly. In order to protect their own existence, web-scraping bots must evolve through machine learning techniques as they are exposed to new data and technology.
Yes, there are good and bad applications of web-scraping bots in today’s world, but if we follow Asimov’s guidance, we can ensure proper and ethical use of publicly available digital data to help us run our businesses more effectively. Are you in?