@songying
2018-08-27T17:50:48.000000Z
字数 4805
阅读 2065
python爬虫
# 在.zshrc文件中添加如下内容:
export QB=root,10.127.25.14,22
alias sshjump="ssh songyingxin_sx@jumpbox.qiyi.domain -o SendEnv=QB"
你将 UGCRobot/UGCRobot/__init__.py
下的部分代码注释掉了。
- 失败网址: http://www.56.com/u91/v_NTE3MDkzMjA.html
- 日志地址: http://vtc-log.qiyi.domain:6100/swiftlog?dc=jy&logtype=UGC-robot&date=20180815&logname=5b7332f2b302c34ca96bd046
- 失败原因: 该视频为私有,无法提供观看服务
解决方法: 无
- 失败原因: 该url失效
- 解决办法: 无
- 失败网址: http://www.tudou.com/programs/view/Af8JKPREuYs/
- 失败原因: 网址失效
- 解决方法: 无
- 失败网址: http://video.tudou.com/v/XMjAzNDUxMDYwOA==.html
- 失败原因: 网址为旧网址,徐跳转为新的网址,且需要设置相对应的参数
- 解决办法: 重新编写tudou.py, 需要解决相关参数问题
- json文件地址: ups.youku.com
- header:
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36
Referer: http://video.tudou.com/v/XMjAzNDUxMDYwOA==.html
ckey=110%23KLkkAUkfkrai3FGg2wgAMuy2kMUOcYlOmkm2hQ7%2F8DBLkDUt21J%2FjnsoufZ881GQaUI2hn0yJHbHgBZxktOWjqjk0zFL8MzQfwbrGTUt5OJf5JdByUpgRGIvJZ2%2BxWRIuJ3%2BsAkwsBQhGq88j9cwkP5ysLgwjT444yjOmHvmb2ckBi%2BtjAZiYbJhs9bws9T4kcjQsBsisR449LdAltC3PEomZTzXs9g7krSj9BjM2pUkDoEo9%2FWxN9%2FDWUjfG5Lx90KM7agUGg7vKFDdiywugn7snRQjwXT%2BbR04JEZNj09g%2FTvu2wFtNq1BWM7b7AiR%2B8ToLggsHi%2BrG6kchUFjqPrEMa9LSET%2BcM9Dgrc%2BnjD4nD8I2B%2BbEKLbxyREtmxHV1R56TXr5ylby9UqZ599hqH31w9YbiF%2B2%2FnXDOdW7LoB34gZiH4uL9OHbbJAZTF0XLzeBM2c&site=-1&wintype=interior&p=1&fu=0&vs=1.0&rst=mp4&dq=flv&os=win&osv=&d=0&bt=pc&aw=w
- 失败网址: http://www.mgtv.com/v/7/157220/f/1724888.html
- 失败原因: 视频下线
解决方法: 无
- 失败原因: json文件地址更改
https://pstream.api.mgtv.com
, 该文件中包含了m3d8文件地址。 在m3d8文件中的视频地址头为m3d8地址的前缀:http://pcvideoyd.titan.mgtv.com/c1/2018/08/14_0/CBEF114C80186974547FFF7AC592E926_20180814_1_1_1935_mp4
https://pstream.api.mgtv.com/player/getSource?video_id=4507688&
did=c477fa3e-84d5-4086-9bec-a29a61bce900&suuid=e2c487af-2387-4f68-8a14-f49be542407a
&collection_id=320516&pm2=nV~_UltXq93XzgAa30FIUGqnBpMtiDYTfpR0Jutxr3nlPAQ8q5f35WPo9ha4paNU3kmjLiaC_B29L0ahNYXhkqEi61_ERZ2gTBxC97wcXcooDFpNBN1ecFVn59aK7_dBHSQRIXRo3IiRV5tE9z8_V58DR5pyFAiEIDtaShWKJHsHNhSmwJxgJrrYSSHkIUxdA8AkXosKccH~JmCr8QeJQSbokzdhBhRdk~opAZHU8pST2Z1pKix17FPSALMDQLm4tgS2qUzyVvHoZ1VIX0T45kOHxVjre8_q9YOKAuRSe2aDyhzcYbOLhrVBpPntmpyCtoxYvqTYees-&tk2=wATOlNmYxYTY5ITYtMWZilTL2gDM00SNkRDOtU2MhZ2N3QzY9QWakxHMwATM98mbwxXMwADMuMjLw0jclZHfzQDNxADN0MTNx0Ddpx2Y&_support=10000000
- 原url: http://v.pptv.com/show/iaL6XFX3jU5H0cto.html?spm=tv_index_web.sb_2719337.0.1.0.1.0.1.0.0
- xml地址:
http://web-play.pptv.com/
- 问题: xml文件参数更改,需要重新破解, 视频参数也需重新破解, 视频部分缺失参数
- 真实视频url:
http://221.194.64.114/0/0/1024/3a6e11dfe9d7e87e8cd4627d951e6431.mp4?fpp.ver=1.3.0.23&key=b3880596c36e415819069dde26b30987&k=281abac9137c4efb5a5400ee237ded09-4544-1534484423%26bppcataid%3D980&type=web.fpp&vvid=b87c8de8-e9a7-d231-9a66-458a6a44bdc1- 伪装的url
http://221.194.64.115/4/0/1024/04ba724ff6f45ceb63a380d3b330bb8e.mp4?key=7a14854c5b5353781c200d6895708a77&fpp.ver=1.3.0.19&k=52f4a43d5bbc41be3200cb17b5907d4b-758f-1534483873%26bppcataid%3D980&type=web.fpp
- 问题1: 视频版权到期, 如 http://tv.sohu.com/20170809/n600096256.shtml
- 问题2: 无法建立连接,浏览器也无法访问视频, 如:http://tv.sohu.com/20121209/n359935788.shtml
- 问题3: url失效,如:http://tv.sohu.com/20180102/n600329640.shtml
- 问题1. 链接失效
- 问题2. 有部分是文章而非视频如:https://www.toutiao.com/i6583996057443631630/
- 问题3. 在本地可以获取视频url,但日志中抛出了异常, 如:https://www.toutiao.com/item/6586415010359017987/
2018-08-08 16:36:58 - VideoMetaUtil.getVideoIntroduction.39 - ERROR : Traceback (most recent call last):
File "/opt/vtc/UGCRobot/resolution/VideoMetaUtil.py", line 33, in getVideoIntroduction
video_dict = parser.parse()
File "/opt/vtc/UGCRobot/meta/meta_resolution.py", line 219, in parse
title = match.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
- 问题1. 抓取视频没有问题,但是日志中有抛出异常部分, 日志地址:http://vtc-log.qiyi.domain:6100/swiftlog?dc=bj&logtype=UGC-robot&date=20180808&logname=5b6aab70b302c34cab68b68b
2018-08-08 16:36:17 - VideoMetaUtil.getVideoIntroduction.39 - ERROR : Traceback (most recent call last):
File "/opt/vtc/UGCRobot/resolution/VideoMetaUtil.py", line 32, in getVideoIntroduction
parser = parser_cls(url, logger)
File "/opt/vtc/UGCRobot/meta/meta_resolution.py", line 23, in __init__
raise Exception("meta parser exception!")
Exception: meta parser exception!
问题2: 上传失败, 今日超出上传上限