Problem HTTP error 403 in Python 3 Web Scraping
Problem HTTP error 403 in Python 3 Web Scraping
This is probably because of mod_security
or some similar server security feature which blocks known spider/bot user agents (urllib
uses something like python urllib/3.3.0
, its easily detected). Try setting a known browser user agent with:
from urllib.request import Request, urlopen
req = Request(http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1, headers={User-Agent: Mozilla/5.0})
webpage = urlopen(req).read()
This works for me.
By the way, in your code you are missing the ()
after .read
in the urlopen
line, but I think that its a typo.
TIP: since this is exercise, choose a different, non restrictive site. Maybe they are blocking urllib
for some reason…
Definitely its blocking because of your use of urllib based on the user agent. This same thing is happening to me with OfferUp. You can create a new class called AppURLopener which overrides the user-agent with Mozilla.
import urllib.request
class AppURLopener(urllib.request.FancyURLopener):
version = Mozilla/5.0
opener = AppURLopener()
response = opener.open(http://httpbin.org/user-agent)
Problem HTTP error 403 in Python 3 Web Scraping
This is probably because of mod_security or some similar server security feature which blocks known
spider/bot
user agents (urllib uses something like python urllib/3.3.0, its easily detected) – as already mentioned by Stefano Sanfilippo
from urllib.request import Request, urlopen
url=https://stackoverflow.com/search?q=html+error+403
req = Request(url, headers={User-Agent: Mozilla/5.0})
web_byte = urlopen(req).read()
webpage = web_byte.decode(utf-8)
The web_byte is a byte object returned by the server and the content type present in webpage is mostly utf-8.
Therefore you need to decode web_byte using decode method.
This solves complete problem while I was having trying to scrape from a website using PyCharm
P.S -> I use python 3.4