Barebones CMS

Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
enabling cookies in the web scraper
#1
Hi,

I just came across your ultimate web scraper toolkit and would be happy to donate if I can figure out how to use it in my application. The documentation does not show how to enable cookies as the web page I am trying to scrape requires the browser to have them enabled.

Some sample code showing how to set a cookie jar, etc. would be much appreciated.

Additionally, besides submitting forms, it would be great to have an API for clicking on an image (the kind that send X/Y coordinates) beyond form submissions.

Thanks!
Reply
#2
The WebBrowser class automatically handles cookies and redirection transparently and automatically. There is nothing to enable - it just works. For cookies to work properly, just keep using the same WebBrowser instance for future requests. It will automatically determine which cookies are appropriate and send them to the server. I opted to not extract forms automatically because Simple HTML DOM isn't exactly the speediest piece of software out there.

X/Y coordinates are kind of difficult. I'm not actually displaying the DOM anywhere. You should be able to fake it easily enough if you need to pass that information but I don't know of any website that actually uses that because not all browsers will pass click location information along.

When in doubt about something, var_dump() the relevant variable. (Except Simple HTML DOM objects - var_dump()'ing those is a bad idea.)

If you need to save cookies for a later WebBrowser class, use the $web->GetState() method. Serialize the output and store it. Then use $web->SetState() (or the object constructor) to restore the previous state.
Author of Barebones CMS

If you found my reply to be helpful, be sure to donate!
All funding goes toward future product development.
Reply
#3
Hmmm. That didn't seem to work on Amazon - they report back that I need to enable cookies. I was able to call curl with setopt(COOKIE_JAR) and that got past the Amazon problem, but I wasn't able to with the WebBrowser class.


(02-06-2015, 07:46 PM)thruska Wrote: The WebBrowser class automatically handles cookies and redirection transparently and automatically. There is nothing to enable - it just works. For cookies to work properly, just keep using the same WebBrowser instance for future requests. It will automatically determine which cookies are appropriate and send them to the server. I opted to not extract forms automatically because Simple HTML DOM isn't exactly the speediest piece of software out there.

X/Y coordinates are kind of difficult. I'm not actually displaying the DOM anywhere. You should be able to fake it easily enough if you need to pass that information but I don't know of any website that actually uses that because not all browsers will pass click location information along.

When in doubt about something, var_dump() the relevant variable. (Except Simple HTML DOM objects - var_dump()'ing those is a bad idea.)

If you need to save cookies for a later WebBrowser class, use the $web->GetState() method. Serialize the output and store it. Then use $web->SetState() (or the object constructor) to restore the previous state.
Reply
#4
I'll need a specific example to diagnose the issue. I've never had any problems but that doesn't mean that there aren't any.

Also, are you using the latest GitHub repo?

https://github.com/cubiclesoft/ultimate-web-scraper
Author of Barebones CMS

If you found my reply to be helpful, be sure to donate!
All funding goes toward future product development.
Reply
#5
Yes I am using the latest from github. Perhaps we can converse privately by email as I will have to share account info to provide you an example.
Reply
#6
Sure, that will work fine. Any account information you share with me will be permanently deleted at the end of our conversation and used only to resolve the issue.
Author of Barebones CMS

If you found my reply to be helpful, be sure to donate!
All funding goes toward future product development.
Reply
#7
Okay, let me spend a little time to write the smallest piece of code to reproduce the problem and then I will email you.

Thanks for your help.

(02-08-2015, 07:22 AM)thruska Wrote: Sure, that will work fine. Any account information you share with me will be permanently deleted at the end of our conversation and used only to resolve the issue.
Reply
#8
Hi,

Sorry it took a little longer than I anticipated to extract the smallest amount of code to reproduce the cookie problem. If you could provide me an email address where I can send you the zipped code, that would be great. I have sent you my email address by private message.

Thanks,

Phoenix



(02-08-2015, 07:38 AM)Phoenix Smith Wrote: Okay, let me spend a little time to write the smallest piece of code to reproduce the problem and then I will email you.

Thanks for your help.

(02-08-2015, 07:22 AM)thruska Wrote: Sure, that will work fine. Any account information you share with me will be permanently deleted at the end of our conversation and used only to resolve the issue.
Reply
#9
Just wondering if you had a solution for this. I too am using the web browser abstraction but hitting a warning to enable cookies instead of getting the form I'm expecting. The web server is not able to determine that the scraper can handle cookies. Any way around this?
Reply
#10
My general recommendation is to use an Incognito or Private Browser window and visit the precise URL you are requesting and watch all network traffic to and from the server very carefully. Usually something important is overlooked in one of the requests to the server.
Author of Barebones CMS

If you found my reply to be helpful, be sure to donate!
All funding goes toward future product development.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)