html5lib
是一个html解析库,使用BeautifulSoup
进行解析html会用到。当然也可以使用python自带的库html.parser
不过兼容性没有html5lib
好。 html5lib
不是自带,使用的时候需要安装一下。
安装 html5lib 解析库
pip3 install html5lib
如果使用的时候没有安装,会提示如下的的错误
!python3 a.py
Traceback (most recent call last):
File "a.py", line 5, in <module>
bsObj = BeautifulSoup(html.read(), "html5lib")
File "/usr/local/python3/lib/python3.7/site-packages/bs4/__init__.py", line 196, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?
shell returned 1
了解更多: https://pypi.org/project/html5lib/
beautifulsoup4 安装
pip3 install beautifulsoup4
测试
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://www.liuhaolin.com");
bsObj = BeautifulSoup(html.read(), "html5lib")
# bsObj = BeautifulSoup(html.read(), "html.parser")
print(bsObj.title);