Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Auction house in New York caught Dallas auction house bulk scraping their website
#1
This article on web scraping just came to my attention:

http://www.nytimes.com/aponline/2016/12/....html?_r=0

Summary: An auction house in New York is suing an auction house in Dallas for copyright law violations regarding scraping the New York auction house's website listings including their listing photos and then SELLING those listings in an aggregate database for profit.

As I'm the author of one of the most powerful and flexible web scraping toolkits (the Ultimate Web Scraping Toolkit), I have to reiterate the messaging found on the main page:  Don't use the toolkit for illegal purposes.  If you are going to bulk scrape someone's website, you need to make sure you are free and clear legally for doing so and that you respect reasonable rate limits and the like.  Reselling the data acquired with a scraping toolkit seems like an extremely questionable thing to do from a legal perspective.

The problem with bulk web scraping is that it costs virtually nothing on the client end of things.  Most pages with custom data these days are served dynamically to some extent.  It takes not insignificant CPU time and RAM on the server side to build a response.  If one bad actor is excessively scraping a website, it drags down the performance of the website for everyone else. If a website operator starts to rate limit requests, then they will run into additional technical issues (e.g. they'll end up blocking Googlebot and effectively delist themselves from Google search results).

The Ultimate Web Scraper Toolkit is capable of easily hiding requests in ways that mimic a real web browser or a bot like Googlebot.  It follows redirects, passes in HTTP Referer headers, transparently and correctly handles HTTP cookies, and can even extract forms and form fields from a page. It even has an interactive mode that comes about as close to the real thing from the command-line that a person can get in a few thousand lines of code. I'm only aware of a few toolkits that can even come close to the capabilities of the Ultimate Web Scraper Toolkit, so I won't be surprised if it comes to light that what I've built has been used for illegal purposes despite the very clear warnings. With great power comes great responsibility and it appears that some people just aren't capable of handling that.

I've got more to say, but I'll continue the discussion in a blog post:

http://cubicspot.blogspot.com/2016/12/bu...-then.html
Author of Barebones CMS

If you found my reply to be helpful, be sure to donate!
All funding goes toward future product development.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)
© CubicleSoft