爬虫问题与伪装浏览器
发布于 8 年前 作者 victor112991 4936 次浏览 来自 问答

本人新手,一般的网站现在都能爬了,现在碰到个棘手的,request 包里加了个 user-agent 后爬出来以下这些东西:

<!DOCTYPE html>

<head> <META NAME=“ROBOTS” CONTENT=“NOINDEX, NOFOLLOW”> <meta http-equiv=“cache-control” content=“max-age=0” /> <meta http-equiv=“cache-control” content=“no-cache” /> <meta http-equiv=“expires” content=“0” /> <meta http-equiv=“expires” content=“Tue, 01 Jan 1980 1:00:00 GMT” /> <meta http-equiv=“pragma” content=“no-cache” /> <meta http-equiv=“refresh” content=“10; url=/distil_r_captcha.html?Ref=/hublot-doorbuster-event.html&distil_RID=BC74834C-7123-11E6-890A-9172A4CF1AED&distil_TID=20160902154138” /> <script type=“text/javascript”> (function(window){ try { if (typeof sessionStorage !== ‘undefined’){ sessionStorage.setItem(‘distil_referrer’, document.referrer); } } catch (e){} })(window); </script> <script type=“text/javascript” src="/jmshpdstlyucqwezyadusvdvbuvwu.js" defer></script><style type=“text/css”>#d__fFH{position:absolute;top:-5000px;left:-5000px}#d__fF{font-family:serif;font-size:200px;visibility:hidden}#yduyseuuybvy{display:none!important}</style></head> <body> <div id=“distil_ident_block”> </div> </body> </html>

我想了一下,作为一个浏览器的话,发一个get请求也是会发出去不少参数的,不止 user-agent 一个, 所以想请教是否有一些包或者办法能够更好地伪装成浏览器呢?

回到顶部