在讀取網(wǎng)站源碼時發(fā)現(xiàn)部分頁面讀到的內容不完整,瀏覽器打開正常
1 說明不是人家服務器問題
2 fiddler里打開發(fā)現(xiàn)也不完整,而且亂碼,但在transformer里設置成 no compression 后也正常。說明讀取的東西是完整的,是后續(xù)處理的問題
3 c#里調試發(fā)現(xiàn)讀取的字符串被截斷,copy字符串到notepad++里發(fā)現(xiàn)被截斷的地方有\(zhòng)0\0,原來如此,\0表示字符串結su呢
4.處理程序使用的自動解壓縮方法的設置
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip | DecompressionMethods.None;
處理的方法如下:
try { strUrl = "http://www.xxx.com"; CookieContainer cc = new CookieContainer(); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(strUrl); request.Method = "Get"; request.CookieContainer = cc; request.KeepAlive = true; request.ContentType = "text/html"; request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36"; request.Headers.Add("x-requested-with:XMLHttpRequest"); request.Headers.Add(HttpRequestHeader.AcceptLanguage, "zh-CN,zh;q=0.8,en;q=0.6,nl;q=0.4,zh-TW;q=0.2"); request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8"; request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip | DecompressionMethods.None; request.Headers.Add("Accept-Encoding", "gzip, deflate"); if (request.Method == "POST") { (request as HttpWebRequest).ContentType = "application/x-www-form-urlencoded"; } HttpWebResponse response = (HttpWebResponse)request.GetResponse(); //StreamReader reader = new StreamReader(response.GetResponseStream(), Encoding.GetEncoding("gb2312")); StreamReader reader = new StreamReader(response.GetResponseStream(), encoder); strMsg = reader.ReadToEnd(); // .\0為null,空字符,也是字符串結束標志 strMsg = strMsg.Replace("\0", ""); reader.Close(); reader.Dispose(); response.Close(); } catch { }
如對本文有疑問,請?zhí)峤坏浇涣髡搲瑥V大熱心網(wǎng)友會為你解答?。?點擊進入論壇