Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

调用response = self._session.get(url),获取的网页内容,和浏览器获取的不一样。 #18

Open
6xian opened this issue May 1, 2017 · 2 comments

Comments

@6xian
Copy link

6xian commented May 1, 2017

我想定义一个类,用来实现获取指定用户的“提问”列表。
比如,获取用户heikehuawuya所有提问的标题。
url = https://www.zhihu.com/people/heikehuawuya/asks

我的代码都是依托zhihu-api框架,自己的代码就是asks_list()函数。
调用response = self._session.get(url)时,获取的网页内容,和浏览器获取的不一样。
调试很久,找不出原因,请大家帮忙!
代码如下。

class Userasks(Model):
def init(self, slug=None, url=None):
slug = slug if slug is not None else self._extract_slug(url)
if not slug:
raise ZhihuError("没有指定用户的的slug或者url")
self.slug = slug
super(Userasks, self).init()

@staticmethod
def _extract_id(url):
    """
    从url中提取目标id
    :param url: 
    :return: 
    """
    pattern = re.compile("https://www.zhihu.com/people/(\w+).*?/")
    match = pattern.search(url)
    return match.group(1) if match else None

def asks_list(self, **kwargs):
    question_list = []
    url = URL.user_asks(self.slug)
    response = self._session.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    tag_list = soup.find_all("div", "ContentItem")
    for name in tag_list.find_all("a"):
        question_list.append(name.get_text())

    return question_list

`

@6xian
Copy link
Author

6xian commented May 1, 2017

我更改了
User-Agent:Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36

获取的response.content,和浏览器的一样了。

@lzjun567
Copy link
Owner

lzjun567 commented May 2, 2017

ok

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants