@myles 2018-12-20T12:50:05.000000Z 字数 853 阅读 913

BeautifulSoup4 基本使用

未分类

1、BeautifulSoup 环境准备

个人习惯使用pycharm作为python集成开发环境，所以这里记录下BeatifulSoup库需要安装的过程信息。

在pycharm中使用BeatifulSoup 需要提前在pycharm中安装好以下 2 个模块：

安装好后，直接导入即可使用，导入语句：from bs4 import BeautifulSoup

进入到 pycharm project Interpreter 位置进行模块bs4 与lxml 查找和安装即可；

pycharm 路径信息：file - setting - project:xxx - project Interpreter`

from bs4 import BeautifulSoup

with open('html_doc','r') as f:
    html_text = f.read()    # 获取到html字符串对象

将获取到的html字符对象传入 BeatufulSoup()中，创建html文档树的结构对象，以方便后续进行tag标签信息的定位于提取；

soup = BeautifulSoup(html_text,lxml)
print(type(soup))

tag_info = soup.select('body > div.main-content > ul > li:nth-type-of(1) > img')

tag_info = soup.find_all('a',{'class':'read'})