IT瘾
关于互联网、软件开发的网页收藏、分享、发现
tag:itindex.net,0000-00-00:default
itindex
2017-05-17T14:00:00Z
itindex
2017-05-17T14:00:00Z
zh
itindex
陈爱云:打造坚如磐石的搜索架构 - 中生代技术 | 十条
IT瘾
https://itindex.net
tag:itindex.net,2017-05-17:default/1495029600000
2017-05-17T14:00:00Z
2017-05-17T14:00:00Z
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">对于一个在线系统而言,性能和稳定性是永远要追求的两个方向,如果是分布式系统,性能不够可以用机器来凑(当然这不是最好的方法,性能的提升不是本文的关注点,所以这里不对提升性能的方法赘述),但是稳定性不能靠机器来堆,并且机器越来越多可能会带来更多的稳定性的问题。做在线系统的同学应该会对墨菲定理感触特别深,如果系统中的某个模块可能会出错,那么它一定会出错。或许可以尝试把奶酪面包和猫绑一起,制作一个永动机:)</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">下面介绍爱奇艺搜索架构高可用之路的演进。爱奇艺搜索始于</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">2011</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">年,距今已有六个年头,在这六年间爱奇艺搜索的用户量和</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">qps</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">都呈指数增长,这也对稳定性提出了很高的挑战。前面我们说,任何一个系统都不可能做的完美,一定会出错,我们需要做的是出错后,把对用户的影响降到最低,让用户感知不到系统的故障。把系统做稳定,多活、降级、限流、扩容都必不可少,下面分别说说我们是怎么做的。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="小标题" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; max-width: 100%; word-wrap: break-word !important; font-size: 24px;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span class="小标题" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">异地多活</span></span></span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">曾有人开玩笑说<span style="margin: 0px; padding: 0px; max-width: 100%; color: rgb(255, 79, 121); box-sizing: border-box !important; word-wrap: break-word !important;">挖掘机是互联网行业最恐怖的武器</span>,挖掘机一铲子下去,可能就把机房的光缆电缆挖断,此时如果没有备份机房,那再完美的系统都无法对外服务了。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">备份可以分为两种,<span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span style="margin: 0px; padding: 0px; max-width: 100%; color: rgb(0, 0, 0); box-sizing: border-box !important; word-wrap: break-word !important;">一种是备用的机房平时是不接线上流量</span></span><span style="margin: 0px; padding: 0px; max-width: 100%; color: rgb(0, 0, 0); box-sizing: border-box !important; word-wrap: break-word !important;">的</span>,主机房出问题后再把流量切到备用机房上。这样做需要注意几点,一是备用机房的数据和程序版本要保持与主机房一致,二是备用机房平时没有接线上流量,在线上真出问题时,是否敢把线上流量切到备用机房上去,备用机房此时是否可以正常服务,三是备用机房的机器平时不在线上服务,也在一定程度上带来资源的浪费。<span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;">另外一种备份方式是</span><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;">备用机房平时也在线上服务,一个机房出问题后可以把流量打向其余的机房</span>。假设说有三个机房,那每个机房冗余</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">1/3</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">的机器即可,这样可以接受任意一个机房挂掉。爱奇艺采用的是后一种方式,采取离线数据多写的方式,在线读本机房的数据。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">异地多活主要还是在数据一致性上比较难处理,搜索在线系统对数据一致性要求不是非常非常高,所以在线方面在实现异地多活困难并不多。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="小标题" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; font-size: 24px; box-sizing: border-box !important; word-wrap: break-word !important;">降级</span></span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">降级可以通过各种手段降低系统的负载,去掉一些锦上添花的功能,保证基本的服务质量。基本的降级可以让用户几乎无感知的情况下降低系统的负载。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; max-width: 100%; word-wrap: break-word !important; font-size: 18px;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; word-wrap: break-word !important;">1. </span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">延长缓存过期时间</span></span></span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">众所周知,增加缓存可以大大降低后端模块的负载,增加缓存过期时间可以提高缓存命中率,同样可以降低打向后端模块的流量。但是如果缓存时间固定不变且比较长的话,后端数据更新后就不能及时体现在前端,如果缓存时间固定不变且比较短的话,缓存命中率比较低,对后端模块起不到太大的降低流量的作用。所以缓存过期时间要随着后端模块的压力而动态变化,那么,如何实现动态控制缓存过期时间的目的呢?</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">首先在写缓存时,把写缓存的时间戳也加进去,假设这个时间戳为</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">t1</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,从缓存系统中取出缓存后,同时取得</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">t1</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,假设当前时间为</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">t2, t = t2 - t1</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,同时系统中维护一个过期时间的阈值,让</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">t</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">与这个阈值比较,如果</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">t</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">小于阈值,则认为缓存足够新,反之则认为缓存过期。这个阈值不是固定不变的,随着后端压力和后端的成功率而变化。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">如果取到的缓存已过期,那么去请求后端模块,如果请求后端模块失败了,可以返回过期的缓存的内容。因为对于视频搜索来说,返回一个稍微旧点的结果,要远远好于不返回结果。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><img data-src="http://mmbiz.qpic.cn/mmbiz_png/3xsFRgx4kHrCKwLgA5LgqgSH3ZE2U2x3RNbNuWNnLHrP61kEVOh6JXV0SveSBokvw8ibm1x7VMwa0ibeibqPjVicpg/0?wx_fmt=png" width="481.89426pt" data-type="png" data-ratio="0.47701149425287354" data-w="696" src="http://www.10tiao.com/img.do?url=http%3A//mmbiz.qpic.cn/mmbiz_png/3xsFRgx4kHrCKwLgA5LgqgSH3ZE2U2x3RNbNuWNnLHrP61kEVOh6JXV0SveSBokvw8ibm1x7VMwa0ibeibqPjVicpg/0%3Fwx_fmt%3Dpng" style="box-sizing: border-box !important; margin: 0px; padding: 0px; border: 0px; vertical-align: middle; height: auto !important; max-width: 100%; word-wrap: break-word !important; box-shadow: rgb(170, 170, 170) 0em 0em 1em 0px; border-radius: 20px;" alt="" /></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; max-width: 100%; word-wrap: break-word !important; font-size: 18px;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; word-wrap: break-word !important;">2. </span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">降低计算复杂度</span></span></span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">先举一个上学考试的例子,大部分同学考到</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">80</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">分相对比较容易,但是如果想要考到</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">100</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">分,需要为提高的</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">20</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">分花费大量的精力。可能花</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">20%</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">的精力可以达到</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">80</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">分,但是要再花</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">80%</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">的精力才能再提高</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">20</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">分。搜索质量的提高也如此。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">可以通过缩小索引的数据范围来降低计算的复杂度。在</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">100</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">亿个</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">doc</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">中进行检索与在</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">100</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">万个</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">doc</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">中进行检索,所消耗的时间都不是一个量级的,所以可以通过缩小索引的数据范围来减少搜索消耗的</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">CPU</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">和时间。这里的范围也不是随便缩小的,需要保证重点数据依旧在索引范围内,可以通过文档的点击数、上线时间等来确定是否属于重点数据。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">还可以通过省略部分大量消耗</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">CPU</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">的步骤来达到降低计算复杂度的目的,这些步骤就相当于我们前边所说的为了最后的</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">20</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">分而额外消耗了</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">80%</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">的精力,对于爱奇艺搜索来说,这部分是重排序,由于之前已经有粗排序了,去掉重排序依旧可以达到一个基本满意的搜索结果,同时消耗的</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">CPU</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">也很低。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">这里所说的缩小数据集的范围与去掉重排序的步骤,可以是人为操作的,也可以是自动触发的。比如根据该进程所消耗的</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">CPU</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,一旦达到某一阈值,则自动进行降级。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; max-width: 100%; word-wrap: break-word !important; font-size: 18px;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; word-wrap: break-word !important;">3. </span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">LIFO</span></span></span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">很多系统都有一个任务队列,有一个</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">IO</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">线程负责往任务队列里</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">push</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">任务,工作线程从队列里</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">pull</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">任务,默认情况下,这个队列是</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">FIFO</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">的,但是设想这种情况,当工作线程处理不及时时,队列里的任务会越积越多,导致每一个任务都需要在队列里等待很长时间才会被得到处理,但是由于任务在队列里等待了过长的时间,当任务处理完后,可能</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">client</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">已经认为本次请求超时了,</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">server</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">做了无用功,极端情况下可能会导致</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">server</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">一直在做无用功。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><img data-src="http://mmbiz.qpic.cn/mmbiz_png/3xsFRgx4kHrCKwLgA5LgqgSH3ZE2U2x3a5CTAOPkVuGfDic8nISVibt32licl89PXkYPPwlushcRn9GfMVP54dvKQ/0?wx_fmt=png" width="481.89426pt" data-type="png" data-ratio="0.4568245125348189" data-w="718" src="http://www.10tiao.com/img.do?url=http%3A//mmbiz.qpic.cn/mmbiz_png/3xsFRgx4kHrCKwLgA5LgqgSH3ZE2U2x3a5CTAOPkVuGfDic8nISVibt32licl89PXkYPPwlushcRn9GfMVP54dvKQ/0%3Fwx_fmt%3Dpng" style="box-sizing: border-box !important; margin: 0px; padding: 0px; border: 0px; vertical-align: middle; height: auto !important; max-width: 100%; word-wrap: break-word !important; box-shadow: rgb(170, 170, 170) 0em 0em 1em 0px; border-radius: 8px;" alt="" /></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">所以在需要降级时,可以改变任务队列取任务的方式,改为</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">LIFO</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,起码可以保证新任务得以处理,旧任务如果过期则丢弃掉,有舍才有得嘛。因为任务队列中的任务是按照时间顺序放进去的,所以在</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">LIFO</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">时,一旦取出来一个任务已经过期,则意味着接下来取出的任务也是过期的,此时可以直接把队列清空,不需要挨个取出来再丢弃。</span><img data-src="http://mmbiz.qpic.cn/mmbiz_png/3xsFRgx4kHrCKwLgA5LgqgSH3ZE2U2x3QBzbLkBaKSsbCHsQVm8iaGvCVwiajKYicSbxhZrPxLaj6al7CkYBJSogA/0?wx_fmt=png" width="481.89426pt" data-type="png" data-ratio="0.45290858725761773" data-w="722" src="http://www.10tiao.com/img.do?url=http%3A//mmbiz.qpic.cn/mmbiz_png/3xsFRgx4kHrCKwLgA5LgqgSH3ZE2U2x3QBzbLkBaKSsbCHsQVm8iaGvCVwiajKYicSbxhZrPxLaj6al7CkYBJSogA/0%3Fwx_fmt%3Dpng" style="box-sizing: border-box !important; margin: 0px; padding: 0px; border: 0px; vertical-align: middle; height: auto !important; max-width: 100%; word-wrap: break-word !important; box-shadow: rgb(170, 170, 170) 0em 0em 1em 0px; border-radius: 12px;" alt="" /></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; max-width: 100%; word-wrap: break-word !important; font-size: 18px;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; box-sizing: border-box !important; word-wrap: break-word !important;">4. </span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">缩短任务队列过期时间</span></span></span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">为什么要给队列中的任务设置过期时间?因为防止任务已经在队列中等待了很长时间,防止取出执行完后任务已经超时。比如说从队列中取出任务后,得知任务已经在队列中等待了</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">90ms</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,而此时处理一个任务平均需要消耗</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">20ms</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">client</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">设置的超时时间是</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">100ms</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,那么就可以大概率的得知这个任务处理完后,</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">client</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">已经认为该任务超时了,那么就没必要再消耗</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">CPU</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">继续处理。所以设置一个任务队列的过期时间是必要的。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">为什么要动态调整队列中任务的过期时间呢?正如我们前边所说,任务的过期时间是跟当前处理任务的平均耗时相关的,而当前处理任务的平均时间不是一成不变的,会随着当前机器的各种资源的情况发生变化。当任务平均处理时间比较短时,可以容忍任务在队列中多等一会儿,而当任务平均处理时间比较长时,只能允许任务在队列中停留较短的时间。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="小标题" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; font-size: 24px; box-sizing: border-box !important; word-wrap: break-word !important;">限流</span></span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">相比降级,限流会直接让部分用户的请求不会被处理,会在一定程度上影响用户体验,是有损的。但是如果经过降级,还不能保证系统的负载处于一个安全的范围内,就需要限流了。限流属于舍小为大,拒绝一部分用户的请求,保证整个系统可用,而不是不顾自身实力蛮力处理所有请求,导致系统挂掉,最终一个请求都处理不了。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">限流又可以分为两种,分为</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">server</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">根据自身处理能力主动丢弃部分请求,和</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">client</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">根据</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">server</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">的平均响应时间和</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">server</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">的成功率减少给</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">server</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">发的请求的数量。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">限流的维度可以有多个,根据程序本身的指标:连接数、总的</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">qps</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">、分类的</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">qps</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">,</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">server</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">一般要对接不同的端,不能因为某一个端的流量上涨而影响到其他端的流量。也可以根据机器的总体指标来进行限流,比如说网络流量、</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">CPU</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">、内存等。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="小标题" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span style="box-sizing: border-box !important; margin: 0px; padding: 0px; font-weight: 700; max-width: 100%; word-wrap: break-word !important;"><span style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; font-size: 24px; box-sizing: border-box !important; word-wrap: break-word !important;">扩容</span></span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">如果经过降级、限流后,系统的负载依旧不能维持在一个相对安全的范围内,此时说明我们现有的资源已经不足以满足用户热情的需求了,此时就需要扩容了。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">扩容可以使用</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: Helvetica; box-sizing: border-box !important; word-wrap: break-word !important;">docker</span><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">自动完成,根据请求的平均响应时间、请求的成功率、机器的负载等自动判断和扩容,也可以根据过去一天、一周、一个月、一年的流量,来提前预估接下来的流量趋势,提前做出扩容和缩容。</span></p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"> </p>
<p class="正文" style="margin: 0px 1em 10px; padding: 0px; max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); font-family: "Helvetica Neue", Helvetica, "Hiragino Sans GB", "Microsoft YaHei", Arial, sans-serif; font-size: 16px; letter-spacing: 2px; box-sizing: border-box !important; word-wrap: break-word !important;"><span class="正文" style="margin: 0px; padding: 0px; max-width: 100%; font-family: "Arial Unicode MS"; box-sizing: border-box !important; word-wrap: break-word !important;">经过上述异地多活、降级、限流、扩容等措施,可以保证系统不出大问题,但是优化的道路任重道远,要持续不断地优化才能保证系统越来越稳定。也希望我们的经验教训能帮到大家,让大家再也不用担心各种异常各种突然状况。(全文完)</span></p><p><a href="http://www.10tiao.com/html/476/201705/2651818009/1.html">阅读全文……</a></p>
2017-05-17T14:00:00Z
Fix certificate problem in HTTPS - Real's Java How-to
IT瘾
https://itindex.net
tag:itindex.net,2016-12-11:default/1481443320000
2016-12-11T08:02:00Z
2016-12-11T08:02:00Z
<p><span style="color: rgba(0, 0, 0, 0.870588); font-family: Roboto, sans-serif; font-size: 15px;">HTTPS protocol is supported since JDK1.4 (AFAIK), you have nothing special to do.</span></p>
<div class="howtocode" style="box-sizing: inherit; font-size: 10px; background: rgb(211, 211, 211); margin-left: 0.1cm; padding-left: 0.1cm; border-left: 2px solid rgb(78, 127, 217); overflow: auto; font-family: Roboto, sans-serif;">
<pre style="box-sizing: inherit; overflow: auto; font-family: "Lucida Console", "Courier New", Courier, monospace;">
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import java.net.URLConnection;
public class ConnectHttps {
public static void main(String[] args) throws Exception {
URL url = new URL("https://securewebsite.com");
URLConnection con = url.openConnection();
Reader reader = new InputStreamReader(con.getInputStream());
while (true) {
int ch = reader.read();
if (ch==-1) {
break;
}
System.out.print((char)ch);
}
}
}
</pre>
</div>
<p><span style="color: rgba(0, 0, 0, 0.870588); font-family: Roboto, sans-serif; font-size: 15px;">However, you can have a problem if the server certificate is self-signed by a testing certification authority (CA) which is not in trusted CAs of Java on the client side. An exception like</span></p>
<div class="howtocode" style="box-sizing: inherit; font-size: 10px; background: rgb(211, 211, 211); margin-left: 0.1cm; padding-left: 0.1cm; border-left: 2px solid rgb(78, 127, 217); overflow: auto; font-family: Roboto, sans-serif;">
<pre style="box-sizing: inherit; overflow: auto; font-family: "Lucida Console", "Courier New", Courier, monospace;">
Exception in thread "main" javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException:
PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException:
unable to find valid certification path to requested target
</pre>
</div>
<p><span style="color: rgba(0, 0, 0, 0.870588); font-family: Roboto, sans-serif; font-size: 15px;">is thrown. This is a common situation with a development server.</span></p>
<p style="box-sizing: inherit; color: rgba(0, 0, 0, 0.870588); font-family: Roboto, sans-serif; font-size: 15px;">The fix is to add the self signed certificate to trusted CAs on the client side. You do that by updating the CACERT file in the your JRE_HOME/lib directory.</p>
<p style="box-sizing: inherit; color: rgba(0, 0, 0, 0.870588); font-family: Roboto, sans-serif; font-size: 15px;">Check this tutorial : <a href="http://www.java-samples.com/showtutorial.php?tutorialid=210" target="_new" style="box-sizing: inherit; background-color: transparent; color: rgb(100, 181, 246); text-decoration: none; -webkit-tap-highlight-color: transparent;">http://www.java-samples.com/showtutorial.php?tutorialid=210</a></p>
<p style="box-sizing: inherit; color: rgba(0, 0, 0, 0.870588); font-family: Roboto, sans-serif; font-size: 15px;">Or you can override the check and accept an untrusted certificate (with the risk coming with it!).</p>
<div class="howtocode" style="box-sizing: inherit; font-size: 10px; background: rgb(211, 211, 211); margin-left: 0.1cm; padding-left: 0.1cm; border-left: 2px solid rgb(78, 127, 217); overflow: auto; font-family: Roboto, sans-serif;">
<pre style="box-sizing: inherit; overflow: auto; font-family: "Lucida Console", "Courier New", Courier, monospace;">
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;
import java.net.URLConnection;
import javax.net.ssl.HostnameVerifier;
import javax.net.ssl.HttpsURLConnection;
import javax.net.ssl.SSLContext;
import javax.net.ssl.SSLSession;
import javax.net.ssl.TrustManager;
import javax.net.ssl.X509TrustManager;
import java.security.cert.X509Certificate;
public class ConnectHttps {
public static void main(String[] args) throws Exception {
/*
* fix for
* Exception in thread "main" javax.net.ssl.SSLHandshakeException:
* sun.security.validator.ValidatorException:
* PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException:
* unable to find valid certification path to requested target
*/
TrustManager[] trustAllCerts = new TrustManager[] {
new X509TrustManager() {
public java.security.cert.X509Certificate[] getAcceptedIssuers() {
return null;
}
public void checkClientTrusted(X509Certificate[] certs, String authType) { }
public void checkServerTrusted(X509Certificate[] certs, String authType) { }
}
};
SSLContext sc = SSLContext.getInstance("SSL");
sc.init(null, trustAllCerts, new java.security.SecureRandom());
HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
// Create all-trusting host name verifier
HostnameVerifier allHostsValid = new HostnameVerifier() {
public boolean verify(String hostname, SSLSession session) {
return true;
}
};
// Install the all-trusting host verifier
HttpsURLConnection.setDefaultHostnameVerifier(allHostsValid);
/*
* end of the fix
*/
URL url = new URL("https://securewebsite.com");
URLConnection con = url.openConnection();
Reader reader = new InputStreamReader(con.getInputStream());
while (true) {
int ch = reader.read();
if (ch==-1) {
break;
}
System.out.print((char)ch);
}
}
}
</pre>
<div> </div>
</div><p><a href="http://www.rgagnon.com/javadetails/java-fix-certificate-problem-in-HTTPS.html">阅读全文……</a></p>
2016-12-11T08:02:00Z
爬取百度网盘用户分享 | Guodong
IT瘾
https://itindex.net
tag:itindex.net,2016-12-11:default/1481440380000
2016-12-11T07:13:00Z
2016-12-11T07:13:00Z
<ul style="list-style: none; color: rgb(85, 85, 85); font-family: Lato, "PingFang SC", "Microsoft YaHei", sans-serif; font-size: 14px; text-align: justify;">
<li style="list-style: circle;">获取用户订阅:<br />
<a href="http://yun.baidu.com/pcloud/friend/getfollowlist?query_uk=%s&limit=24&start=%s&bdstoken=e6f1efec456b92778e70c55ba5d81c3d&channel=chunlei&clienttype=0&web=1&logid=MTQ3NDA3NDg5NzU4NDAuMzQxNDQyMDY2MjA5NDA4NjU=" target="_blank" rel="external" style="background-color: transparent; color: rgb(85, 85, 85); text-decoration: none; border-bottom: 1px solid rgb(153, 153, 153); word-wrap: break-word;">http://yun.baidu.com/pcloud/friend/getfollowlist?query_uk=%s&limit=24&start=%s&bdstoken=e6f1efec456b92778e70c55ba5d81c3d&channel=chunlei&clienttype=0&web=1&logid=MTQ3NDA3NDg5NzU4NDAuMzQxNDQyMDY2MjA5NDA4NjU=</a><br />
(query_uk limit start是必须参数)</li>
<li style="list-style: circle;">获取用户粉丝:<br />
<a href="http://pan.baidu.com/pcloud/friend/getfanslist?query_uk=%s&limit=24&start=%s&bdstoken=null&channel=chunlei&clienttype=0&web=1&logid=MTQ3NDAzNjQwNzg3OTAuNzM1MzMxMDUyMDczMjYxNA==" target="_blank" rel="external" style="background-color: transparent; color: rgb(85, 85, 85); text-decoration: none; border-bottom: 1px solid rgb(153, 153, 153); word-wrap: break-word;">http://pan.baidu.com/pcloud/friend/getfanslist?query_uk=%s&limit=24&start=%s&bdstoken=null&channel=chunlei&clienttype=0&web=1&logid=MTQ3NDAzNjQwNzg3OTAuNzM1MzMxMDUyMDczMjYxNA==</a><br />
(query_uk limit start是必须参数)</li>
<li style="list-style: circle;">获取用户分享:<br />
<a href="http://pan.baidu.com/pcloud/feed/getsharelist?t=1474202771918&category=0&auth_type=1&request_location=share_home&start=0&limit=60&query_uk=224534490&channel=chunlei&clienttype=0&web=1&logid=MTQ3NDIwMjc3MTkxOTAuMzA1NjAzMzQ4MTM1MDc0MTc=&bdstoken=e6f1efec456b92778e70c55ba5d81c3d" target="_blank" rel="external" style="background-color: transparent; color: rgb(85, 85, 85); text-decoration: none; border-bottom: 1px solid rgb(153, 153, 153); word-wrap: break-word;">http://pan.baidu.com/pcloud/feed/getsharelist?t=1474202771918&category=0&auth_type=1&request_location=share_home&start=0&limit=60&query_uk=224534490&channel=chunlei&clienttype=0&web=1&logid=MTQ3NDIwMjc3MTkxOTAuMzA1NjAzMzQ4MTM1MDc0MTc=&bdstoken=e6f1efec456b92778e70c55ba5d81c3d</a><br />
(query_uk limit start auth_type是必须参数)</li>
</ul>
<blockquote style="margin: 0px; padding: 0px 15px; color: rgb(102, 102, 102); border-left: 4px solid rgb(221, 221, 221); font-family: Lato, "PingFang SC", "Microsoft YaHei", sans-serif; font-size: 14px; text-align: justify;">
<p style="margin: 0px 0px 25px;">上面3个连接请求必须带上 <code style="font-family: "Input Mono", "PT Mono", Consolas, Monaco, Menlo, monospace; font-size: 1em; word-break: break-all; background: rgb(238, 238, 238); text-shadow: rgb(255, 255, 255) 0px 1px; padding: 0px 0.3em;">("Referer", "https://yun.baidu.com/share/home?uk=23432432#category/type=0")</code>,uk多少无所谓,否则请求不到json数据,<br />
获取用户订阅和获取用户粉丝每次请求一次休眠2s的话可以无限制请求,对ip没要求,获取用户分享超坑,一个ip只能请求10次,并且休眠也没用.<br />
因为没有那么多ip,我就去研究手机版的用户分享,手机版获取用户分享可以一次性连续请求60次,60次后必须休眠35s左右在继续请求就可以,不会像pc版那样必须换ip,<br />
但是手机版只能请求网页源码,然后用正则进行匹配.</p>
</blockquote>
<ul style="list-style: none; color: rgb(85, 85, 85); font-family: Lato, "PingFang SC", "Microsoft YaHei", sans-serif; font-size: 14px; text-align: justify;">
<li style="list-style: circle;">手机版分享:<br />
<a href="http://pan.baidu.com/wap/share/home?uk=2889076181&start=%s&adapt=pc&fr=ftw" target="_blank" rel="external" style="background-color: transparent; color: rgb(85, 85, 85); text-decoration: none; border-bottom: 1px solid rgb(153, 153, 153); word-wrap: break-word;">http://pan.baidu.com/wap/share/home?uk=2889076181&start=%s&adapt=pc&fr=ftw</a> (uk:<strong>每个百度网盘用户的唯一标示</strong>,start:用户可能有上百上千个分享,必须分页,start默认从0开始,手机版默认分页是20个每页)</li>
</ul>
<p style="margin: 0px 0px 25px; color: rgb(85, 85, 85); font-family: Lato, "PingFang SC", "Microsoft YaHei", sans-serif; font-size: 14px; text-align: justify;">下面上源码:</p>
<p><figure class="highlight java" style="margin: 20px 0px; background: rgb(247, 247, 247); padding: 15px; overflow: auto; font-size: 13px; color: rgb(77, 77, 76); line-height: 1.6; font-family: Lato, "PingFang SC", "Microsoft YaHei", sans-serif; text-align: justify;">
<table style="border-collapse: collapse; border-spacing: 0px; margin: 0px; width: auto; border-width: initial; border-style: none; border-color: initial; font-size: 14px;">
<tbody>
<tr style="background-color: rgb(249, 249, 249);">
<td class="gutter" style="padding: 0px; vertical-align: middle; border-width: initial; border-style: none; border-color: initial;">
<pre style="overflow: auto; font-family: "Input Mono", "PT Mono", Consolas, Monaco, Menlo, monospace; font-size: 13px; background: rgb(247, 247, 247); margin-top: 0px; margin-bottom: 0px; padding: 1px 20px 1px 1px; color: rgb(102, 102, 102); line-height: 1.6; border: none; text-align: right;"><span class="line" style="height: 20px;">1</span><br /><span class="line" style="height: 20px;">2</span><br /><span class="line" style="height: 20px;">3</span><br /><span class="line" style="height: 20px;">4</span><br /><span class="line" style="height: 20px;">5</span><br /><span class="line" style="height: 20px;">6</span><br /><span class="line" style="height: 20px;">7</span><br /><span class="line" style="height: 20px;">8</span><br /><span class="line" style="height: 20px;">9</span><br /><span class="line" style="height: 20px;">10</span><br /><span class="line" style="height: 20px;">11</span><br /><span class="line" style="height: 20px;">12</span><br /><span class="line" style="height: 20px;">13</span><br /><span class="line" style="height: 20px;">14</span><br /><span class="line" style="height: 20px;">15</span><br /><span class="line" style="height: 20px;">16</span><br /><span class="line" style="height: 20px;">17</span><br /><span class="line" style="height: 20px;">18</span><br /><span class="line" style="height: 20px;">19</span><br /><span class="line" style="height: 20px;">20</span><br /><span class="line" style="height: 20px;">21</span><br /><span class="line" style="height: 20px;">22</span><br /><span class="line" style="height: 20px;">23</span><br /><span class="line" style="height: 20px;">24</span><br /><span class="line" style="height: 20px;">25</span><br /><span class="line" style="height: 20px;">26</span><br /><span class="line" style="height: 20px;">27</span><br /><span class="line" style="height: 20px;">28</span><br /><span class="line" style="height: 20px;">29</span><br /><span class="line" style="height: 20px;">30</span><br /><span class="line" style="height: 20px;">31</span><br /><span class="line" style="height: 20px;">32</span><br /><span class="line" style="height: 20px;">33</span><br /><span class="line" style="height: 20px;">34</span><br /><span class="line" style="height: 20px;">35</span><br /><span class="line" style="height: 20px;">36</span><br /><span class="line" style="height: 20px;">37</span><br /><span class="line" style="height: 20px;">38</span><br /><span class="line" style="height: 20px;">39</span><br /><span class="line" style="height: 20px;">40</span><br /><span class="line" style="height: 20px;">41</span><br /><span class="line" style="height: 20px;">42</span><br /><span class="line" style="height: 20px;">43</span><br /><span class="line" style="height: 20px;">44</span><br /><span class="line" style="height: 20px;">45</span><br /><span class="line" style="height: 20px;">46</span><br /><span class="line" style="height: 20px;">47</span><br /><span class="line" style="height: 20px;">48</span><br /><span class="line" style="height: 20px;">49</span><br /><span class="line" style="height: 20px;">50</span><br /><span class="line" style="height: 20px;">51</span><br /><span class="line" style="height: 20px;">52</span><br /><span class="line" style="height: 20px;">53</span><br /><span class="line" style="height: 20px;">54</span><br /><span class="line" style="height: 20px;">55</span><br /><span class="line" style="height: 20px;">56</span><br /><span class="line" style="height: 20px;">57</span><br /><span class="line" style="height: 20px;">58</span><br /><span class="line" style="height: 20px;">59</span><br /><span class="line" style="height: 20px;">60</span><br /><span class="line" style="height: 20px;">61</span><br /><span class="line" style="height: 20px;">62</span><br /><span class="line" style="height: 20px;">63</span><br /><span class="line" style="height: 20px;">64</span><br /><span class="line" style="height: 20px;">65</span><br /><span class="line" style="height: 20px;">66</span><br /><span class="line" style="height: 20px;">67</span><br /><span class="line" style="height: 20px;">68</span><br /><span class="line" style="height: 20px;">69</span><br /><span class="line" style="height: 20px;">70</span><br /><span class="line" style="height: 20px;">71</span><br /><span class="line" style="height: 20px;">72</span><br /><span class="line" style="height: 20px;">73</span><br /><span class="line" style="height: 20px;">74</span><br /><span class="line" style="height: 20px;">75</span><br /><span class="line" style="height: 20px;">76</span><br /><span class="line" style="height: 20px;">77</span><br /><span class="line" style="height: 20px;">78</span><br /><span class="line" style="height: 20px;">79</span><br /><span class="line" style="height: 20px;">80</span><br /><span class="line" style="height: 20px;">81</span><br /><span class="line" style="height: 20px;">82</span><br /><span class="line" style="height: 20px;">83</span><br /><span class="line" style="height: 20px;">84</span><br /><span class="line" style="height: 20px;">85</span><br /><span class="line" style="height: 20px;">86</span><br /><span class="line" style="height: 20px;">87</span><br /><span class="line" style="height: 20px;">88</span><br /><span class="line" style="height: 20px;">89</span><br /><span class="line" style="height: 20px;">90</span><br /><span class="line" style="height: 20px;">91</span><br /><span class="line" style="height: 20px;">92</span><br /><span class="line" style="height: 20px;">93</span><br /><span class="line" style="height: 20px;">94</span><br /><span class="line" style="height: 20px;">95</span><br /><span class="line" style="height: 20px;">96</span><br /><span class="line" style="height: 20px;">97</span><br /><span class="line" style="height: 20px;">98</span><br /><span class="line" style="height: 20px;">99</span><br /><span class="line" style="height: 20px;">100</span><br /><span class="line" style="height: 20px;">101</span><br /><span class="line" style="height: 20px;">102</span><br /><span class="line" style="height: 20px;">103</span><br /><span class="line" style="height: 20px;">104</span><br /><span class="line" style="height: 20px;">105</span><br /><span class="line" style="height: 20px;">106</span><br /><span class="line" style="height: 20px;">107</span><br /><span class="line" style="height: 20px;">108</span><br /><span class="line" style="height: 20px;">109</span><br /><span class="line" style="height: 20px;">110</span><br /><span class="line" style="height: 20px;">111</span><br /><span class="line" style="height: 20px;">112</span><br /><span class="line" style="height: 20px;">113</span><br /><span class="line" style="height: 20px;">114</span><br /><span class="line" style="height: 20px;">115</span><br /><span class="line" style="height: 20px;">116</span><br /><span class="line" style="height: 20px;">117</span><br /><span class="line" style="height: 20px;">118</span><br /><span class="line" style="height: 20px;">119</span><br /><span class="line" style="height: 20px;">120</span><br /><span class="line" style="height: 20px;">121</span><br /><span class="line" style="height: 20px;">122</span><br /><span class="line" style="height: 20px;">123</span><br /><span class="line" style="height: 20px;">124</span><br /><span class="line" style="height: 20px;">125</span><br /><span class="line" style="height: 20px;">126</span><br /><span class="line" style="height: 20px;">127</span><br /><span class="line" style="height: 20px;">128</span><br /><span class="line" style="height: 20px;">129</span><br /><span class="line" style="height: 20px;">130</span><br /><span class="line" style="height: 20px;">131</span><br /><span class="line" style="height: 20px;">132</span><br /><span class="line" style="height: 20px;">133</span><br /><span class="line" style="height: 20px;">134</span><br /><span class="line" style="height: 20px;">135</span><br /><span class="line" style="height: 20px;">136</span><br /><span class="line" style="height: 20px;">137</span><br /><span class="line" style="height: 20px;">138</span><br /><span class="line" style="height: 20px;">139</span><br /><span class="line" style="height: 20px;">140</span><br /><span class="line" style="height: 20px;">141</span><br /><span class="line" style="height: 20px;">142</span><br /><span class="line" style="height: 20px;">143</span><br /><span class="line" style="height: 20px;">144</span><br /><span class="line" style="height: 20px;">145</span><br /><span class="line" style="height: 20px;">146</span><br /><span class="line" style="height: 20px;">147</span><br /><span class="line" style="height: 20px;">148</span><br /><span class="line" style="height: 20px;">149</span><br /><span class="line" style="height: 20px;">150</span><br /><span class="line" style="height: 20px;">151</span><br /><span class="line" style="height: 20px;">152</span><br /><span class="line" style="height: 20px;">153</span><br /><span class="line" style="height: 20px;">154</span><br /><span class="line" style="height: 20px;">155</span><br /><span class="line" style="height: 20px;">156</span><br /><span class="line" style="height: 20px;">157</span><br /><span class="line" style="height: 20px;">158</span><br /><span class="line" style="height: 20px;">159</span><br /><span class="line" style="height: 20px;">160</span><br /><span class="line" style="height: 20px;">161</span><br /><span class="line" style="height: 20px;">162</span><br /><span class="line" style="height: 20px;">163</span><br /><span class="line" style="height: 20px;">164</span><br /><span class="line" style="height: 20px;">165</span><br /><span class="line" style="height: 20px;">166</span><br /><span class="line" style="height: 20px;">167</span><br /><span class="line" style="height: 20px;">168</span><br /><span class="line" style="height: 20px;">169</span><br /><span class="line" style="height: 20px;">170</span><br /><span class="line" style="height: 20px;">171</span><br /></pre>
</td>
<td class="code" style="padding: 0px; vertical-align: middle; border-width: initial; border-style: none; border-color: initial;">
<pre style="overflow: auto; font-family: "Input Mono", "PT Mono", Consolas, Monaco, Menlo, monospace; font-size: 13px; background: rgb(247, 247, 247); margin-top: 0px; margin-bottom: 0px; padding: 1px; color: rgb(77, 77, 76); line-height: 1.6; border: none;"><span class="line" style="height: 20px;"><span class="keyword" style="color: rgb(137, 89, 168);">private</span> Logger log = LoggerFactory.getLogger(FollowStartIndex.class);</span><br /><span class="line" style="height: 20px;"><span class="function" style="color: rgb(66, 113, 174);"><span class="keyword" style="color: rgb(137, 89, 168);">public</span> <span class="keyword" style="color: rgb(137, 89, 168);">void</span> <span class="title" style="color: rgb(62, 153, 159);">startIndex</span><span class="params" style="color: rgb(245, 135, 31);">()</span> </span>{</span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//无限循环</span></span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">while</span> (<span class="keyword" style="color: rgb(137, 89, 168);">true</span>) {</span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//从数据库获取可用uk,可用首先从某个粉丝超多的用户入手,获取他粉丝的uk,存入数据库</span></span><br /><span class="line" style="height: 20px;"> Avaiuk avaiuk = Avaiuk.dao.findFirst(<span class="string" style="color: rgb(113, 140, 0);">"select * from avaiuk where flag=0 limit 1"</span>);</span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//更新数据库,标记该uk已经被用户爬过</span></span><br /><span class="line" style="height: 20px;"> avaiuk.set(<span class="string" style="color: rgb(113, 140, 0);">"flag"</span>, <span class="number" style="color: rgb(245, 135, 31);">1</span>).update();</span><br /><span class="line" style="height: 20px;"> getFllow(avaiuk.getLong(<span class="string" style="color: rgb(113, 140, 0);">"uk"</span>), <span class="number" style="color: rgb(245, 135, 31);">0</span>);</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">static</span> String url = <span class="string" style="color: rgb(113, 140, 0);">"http://yun.baidu.com/pcloud/friend/getfollowlist?query_uk=%s&limit=24&start=%s&bdstoken=e6f1efec456b92778e70c55ba5d81c3d&channel=chunlei&clienttype=0&web=1&logid=MTQ3NDA3NDg5NzU4NDAuMzQxNDQyMDY2MjA5NDA4NjU="</span>;</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">static</span> Map map = <span class="keyword" style="color: rgb(137, 89, 168);">new</span> HashMap();</span><br /><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">static</span> {</span><br /><span class="line" style="height: 20px;"> map.put(<span class="string" style="color: rgb(113, 140, 0);">"User-Agent"</span>, <span class="string" style="color: rgb(113, 140, 0);">"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36"</span>);</span><br /><span class="line" style="height: 20px;"> map.put(<span class="string" style="color: rgb(113, 140, 0);">"X-Requested-With"</span>, <span class="string" style="color: rgb(113, 140, 0);">"XMLHttpRequest"</span>);</span><br /><span class="line" style="height: 20px;"> map.put(<span class="string" style="color: rgb(113, 140, 0);">"Accept"</span>, <span class="string" style="color: rgb(113, 140, 0);">"application/json, text/javascript, */*; q=0.01"</span>);</span><br /><span class="line" style="height: 20px;"> map.put(<span class="string" style="color: rgb(113, 140, 0);">"Referer"</span>, <span class="string" style="color: rgb(113, 140, 0);">"https://yun.baidu.com/share/home?uk=325913312#category/type=0"</span>);</span><br /><span class="line" style="height: 20px;"> map.put(<span class="string" style="color: rgb(113, 140, 0);">"Accept-Language"</span>, <span class="string" style="color: rgb(113, 140, 0);">"zh-CN"</span>);</span><br /><span class="line" style="height: 20px;"> }</span><br /><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//获取订阅用户</span></span><br /><span class="line" style="height: 20px;"> <span class="function" style="color: rgb(66, 113, 174);"><span class="keyword" style="color: rgb(137, 89, 168);">public</span> <span class="keyword" style="color: rgb(137, 89, 168);">void</span> <span class="title" style="color: rgb(62, 153, 159);">getFllow</span><span class="params" style="color: rgb(245, 135, 31);">(<span class="keyword" style="color: rgb(137, 89, 168);">long</span> uk, <span class="keyword" style="color: rgb(137, 89, 168);">int</span> start, <span class="keyword" style="color: rgb(137, 89, 168);">boolean</span> index)</span> </span>{</span><br /><span class="line" style="height: 20px;"> log.info(<span class="string" style="color: rgb(113, 140, 0);">"进来getFllow,uk:{},start:{}"</span>, uk, start);</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">boolean</span> exitUK = <span class="keyword" style="color: rgb(137, 89, 168);">false</span>;</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">try</span> {</span><br /><span class="line" style="height: 20px;"> exitUK = Redis.use().exists(uk);</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">catch</span> (Exception e) {</span><br /><span class="line" style="height: 20px;"> exitUK = <span class="keyword" style="color: rgb(137, 89, 168);">true</span>;</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (!exitUK) {</span><br /><span class="line" style="height: 20px;"> Redis.use().set(uk, <span class="string" style="color: rgb(113, 140, 0);">""</span>);</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (index) {</span><br /><span class="line" style="height: 20px;"> indexResource(uk);</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> recFollow(uk,start,<span class="keyword" style="color: rgb(137, 89, 168);">true</span>);</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">else</span> {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (start > <span class="number" style="color: rgb(245, 135, 31);">0</span>) {<span class="comment" style="color: rgb(142, 144, 140);">//分页订阅</span></span><br /><span class="line" style="height: 20px;"> recFollow(uk,start,<span class="keyword" style="color: rgb(137, 89, 168);">false</span>);</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">else</span> {</span><br /><span class="line" style="height: 20px;"> log.warn(<span class="string" style="color: rgb(113, 140, 0);">"uk is index:{}"</span>, uk);</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> </span><br /><span class="line" style="height: 20px;"> </span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> <span class="function" style="color: rgb(66, 113, 174);"><span class="keyword" style="color: rgb(137, 89, 168);">public</span> <span class="keyword" style="color: rgb(137, 89, 168);">void</span> <span class="title" style="color: rgb(62, 153, 159);">recFollow</span><span class="params" style="color: rgb(245, 135, 31);">(<span class="keyword" style="color: rgb(137, 89, 168);">long</span> uk,<span class="keyword" style="color: rgb(137, 89, 168);">int</span> start,<span class="keyword" style="color: rgb(137, 89, 168);">boolean</span> goPage)</span></span>{</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">try</span> {</span><br /><span class="line" style="height: 20px;"> Thread.sleep(<span class="number" style="color: rgb(245, 135, 31);">4000</span>);</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">catch</span> (InterruptedException e) {</span><br /><span class="line" style="height: 20px;"> e.printStackTrace();</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> String real_url = String.format(url, uk, start);</span><br /><span class="line" style="height: 20px;"> ResponseBody body = OkhttpUtil.syncGet(real_url, Headers.of(map));</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (body != <span class="keyword" style="color: rgb(137, 89, 168);">null</span>) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">try</span> {</span><br /><span class="line" style="height: 20px;"> Follow follow = JSON.parseObject(body.string(), Follow.class);</span><br /><span class="line" style="height: 20px;"> </span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (follow.getErrno() == <span class="number" style="color: rgb(245, 135, 31);">0</span>) {</span><br /><span class="line" style="height: 20px;"> List<Follow.FollowListBean> followListBeen = follow.getFollow_list();</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (followListBeen != <span class="keyword" style="color: rgb(137, 89, 168);">null</span> && followListBeen.size() > <span class="number" style="color: rgb(245, 135, 31);">0</span>) {</span><br /><span class="line" style="height: 20px;"> log.info(<span class="string" style="color: rgb(113, 140, 0);">"不为空:{}"</span>, follow);</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">for</span> (Follow.FollowListBean bean : followListBeen) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">int</span> follow_count = bean.getFollow_count();</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">int</span> shareCount=bean.getPubshare_count();</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (follow_count > <span class="number" style="color: rgb(245, 135, 31);">0</span>) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (shareCount > <span class="number" style="color: rgb(245, 135, 31);">0</span>) {</span><br /><span class="line" style="height: 20px;"> getFllow(bean.getFollow_uk(), <span class="number" style="color: rgb(245, 135, 31);">0</span>, <span class="keyword" style="color: rgb(137, 89, 168);">true</span>);</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">else</span> {</span><br /><span class="line" style="height: 20px;"> getFllow(bean.getFollow_uk(), <span class="number" style="color: rgb(245, 135, 31);">0</span>, <span class="keyword" style="color: rgb(137, 89, 168);">false</span>);</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span>(goPage){</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">int</span> total_count = follow.getTotal_count();</span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//log.warn("分页页数:{}",total_count);</span></span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//分页</span></span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">int</span> total_page = (total_count - <span class="number" style="color: rgb(245, 135, 31);">1</span>) / <span class="number" style="color: rgb(245, 135, 31);">24</span> + <span class="number" style="color: rgb(245, 135, 31);">1</span>;</span><br /><span class="line" style="height: 20px;"> </span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">for</span> (<span class="keyword" style="color: rgb(137, 89, 168);">int</span> i = <span class="number" style="color: rgb(245, 135, 31);">1</span>; i < total_page; i++) {</span><br /><span class="line" style="height: 20px;"> getFllow(uk, i * <span class="number" style="color: rgb(245, 135, 31);">24</span>,<span class="keyword" style="color: rgb(137, 89, 168);">false</span>);</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> </span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">else</span> {</span><br /><span class="line" style="height: 20px;"> log.info(<span class="string" style="color: rgb(113, 140, 0);">"为空:{}"</span>, follow);</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> </span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">catch</span> (IOException e) {</span><br /><span class="line" style="height: 20px;"> e.printStackTrace();</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">long</span> uinfoId = <span class="number" style="color: rgb(245, 135, 31);">0</span>;</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">long</span> nullStart = System.currentTimeMillis();</span><br /><br /><span class="line" style="height: 20px;"> <span class="function" style="color: rgb(66, 113, 174);"><span class="keyword" style="color: rgb(137, 89, 168);">public</span> <span class="keyword" style="color: rgb(137, 89, 168);">void</span> <span class="title" style="color: rgb(62, 153, 159);">indexResource</span><span class="params" style="color: rgb(245, 135, 31);">(<span class="keyword" style="color: rgb(137, 89, 168);">long</span> uk)</span> </span>{</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">while</span> (<span class="keyword" style="color: rgb(137, 89, 168);">true</span>) {</span><br /><span class="line" style="height: 20px;"> String url = <span class="string" style="color: rgb(113, 140, 0);">"http://pan.baidu.com/wap/share/home?uk=%s&start=%s&adapt=pc&fr=ftw"</span>;</span><br /><span class="line" style="height: 20px;"> String real_url = String.format(url, uk, <span class="number" style="color: rgb(245, 135, 31);">0</span>);</span><br /><br /><span class="line" style="height: 20px;"> YunData yunData = DataUtil.getData(real_url);</span><br /><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (yunData != <span class="keyword" style="color: rgb(137, 89, 168);">null</span>) {</span><br /><span class="line" style="height: 20px;"> log.info(<span class="string" style="color: rgb(113, 140, 0);">"{}"</span>, yunData.toString());</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">int</span> share_count = yunData.getUinfo().getPubshare_count();</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (share_count > <span class="number" style="color: rgb(245, 135, 31);">0</span>) {</span><br /><span class="line" style="height: 20px;"> Uinfo uinfo = <span class="keyword" style="color: rgb(137, 89, 168);">new</span> Uinfo();</span><br /><span class="line" style="height: 20px;"> uinfo.set(<span class="string" style="color: rgb(113, 140, 0);">"uname"</span>, yunData.getUinfo().getUname()).set(<span class="string" style="color: rgb(113, 140, 0);">"avatar_url"</span>, yunData.getUinfo().getAvatar_url()).set(<span class="string" style="color: rgb(113, 140, 0);">"uk"</span>, uk).set(<span class="string" style="color: rgb(113, 140, 0);">"incache"</span>, <span class="number" style="color: rgb(245, 135, 31);">1</span>).save();</span><br /><span class="line" style="height: 20px;"> uinfoId = uinfo.getLong(<span class="string" style="color: rgb(113, 140, 0);">"id"</span>);</span><br /><span class="line" style="height: 20px;"> List<Records> recordses = yunData.getFeedata().getRecords();</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">for</span> (Records record : recordses) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">new</span> ShareData().set(<span class="string" style="color: rgb(113, 140, 0);">"title"</span>, record.getTitle()).set(<span class="string" style="color: rgb(113, 140, 0);">"shareid"</span>, record.getShareid()).set(<span class="string" style="color: rgb(113, 140, 0);">"uinfo_id"</span>, uinfoId).save();</span><br /><span class="line" style="height: 20px;"> }</span><br /><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">int</span> totalPage = (share_count - <span class="number" style="color: rgb(245, 135, 31);">1</span>) / <span class="number" style="color: rgb(245, 135, 31);">20</span> + <span class="number" style="color: rgb(245, 135, 31);">1</span>;</span><br /><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">int</span> start = <span class="number" style="color: rgb(245, 135, 31);">0</span>;</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (totalPage > <span class="number" style="color: rgb(245, 135, 31);">1</span>) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">for</span> (<span class="keyword" style="color: rgb(137, 89, 168);">int</span> i = <span class="number" style="color: rgb(245, 135, 31);">1</span>; i < totalPage; i++) {</span><br /><span class="line" style="height: 20px;"> start = i * <span class="number" style="color: rgb(245, 135, 31);">20</span>;</span><br /><span class="line" style="height: 20px;"> real_url = String.format(url, uk, start);</span><br /><span class="line" style="height: 20px;"> yunData = DataUtil.getData(real_url);</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (yunData != <span class="keyword" style="color: rgb(137, 89, 168);">null</span>) {</span><br /><span class="line" style="height: 20px;"> log.info(<span class="string" style="color: rgb(113, 140, 0);">"{}"</span>, yunData.toString());</span><br /><span class="line" style="height: 20px;"> List<Records> recordses = yunData.getFeedata().getRecords();</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">for</span> (Records record : recordses) {</span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//用户分享的数据存入数据库</span></span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">new</span> ShareData().set(<span class="string" style="color: rgb(113, 140, 0);">"title"</span>, record.getTitle()).set(<span class="string" style="color: rgb(113, 140, 0);">"shareid"</span>, record.getShareid()).set(<span class="string" style="color: rgb(113, 140, 0);">"uinfo_id"</span>, uinfoId).save();</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">else</span> {</span><br /><span class="line" style="height: 20px;"> i--;</span><br /><span class="line" style="height: 20px;"> log.warn(<span class="string" style="color: rgb(113, 140, 0);">"uk:{},msg:{}"</span>, uk, yunData);</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">long</span> temp = nullStart;</span><br /><span class="line" style="height: 20px;"> nullStart = System.currentTimeMillis();</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> ((nullStart - temp) < <span class="number" style="color: rgb(245, 135, 31);">1500</span>) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">try</span> {</span><br /><span class="line" style="height: 20px;"> Thread.sleep(<span class="number" style="color: rgb(245, 135, 31);">60000</span>);</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">catch</span> (InterruptedException e) {</span><br /><span class="line" style="height: 20px;"> e.printStackTrace();</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><br /><span class="line" style="height: 20px;"> }</span><br /><br /><span class="line" style="height: 20px;"> }</span><br /><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">break</span>;</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">else</span> {</span><br /><span class="line" style="height: 20px;"> log.warn(<span class="string" style="color: rgb(113, 140, 0);">"uk:{},msg:{}"</span>, uk, yunData);</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">long</span> temp = nullStart;</span><br /><span class="line" style="height: 20px;"> nullStart = System.currentTimeMillis();</span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//在1500毫秒内2次请求到的数据都为null时,此时可能被百度限制了,休眠一段时间就可以恢复</span></span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> ((nullStart - temp) < <span class="number" style="color: rgb(245, 135, 31);">1500</span>) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">try</span> {</span><br /><span class="line" style="height: 20px;"> Thread.sleep(<span class="number" style="color: rgb(245, 135, 31);">60000</span>);</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">catch</span> (InterruptedException e) {</span><br /><span class="line" style="height: 20px;"> e.printStackTrace();</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> }</span><br /><br /><br /><span class="line" style="height: 20px;"> }</span><br /></pre>
</td>
</tr>
</tbody>
</table>
</figure><figure class="highlight java" style="margin: 20px 0px; background: rgb(247, 247, 247); padding: 15px; overflow: auto; font-size: 13px; color: rgb(77, 77, 76); line-height: 1.6; font-family: Lato, "PingFang SC", "Microsoft YaHei", sans-serif; text-align: justify;">
<table style="border-collapse: collapse; border-spacing: 0px; margin: 0px; width: auto; border-width: initial; border-style: none; border-color: initial; font-size: 14px;">
<tbody>
<tr style="background-color: rgb(249, 249, 249);">
<td class="gutter" style="padding: 0px; vertical-align: middle; border-width: initial; border-style: none; border-color: initial;">
<pre style="overflow: auto; font-family: "Input Mono", "PT Mono", Consolas, Monaco, Menlo, monospace; font-size: 13px; background: rgb(247, 247, 247); margin-top: 0px; margin-bottom: 0px; padding: 1px 20px 1px 1px; color: rgb(102, 102, 102); line-height: 1.6; border: none; text-align: right;"><span class="line" style="height: 20px;">1</span><br /><span class="line" style="height: 20px;">2</span><br /><span class="line" style="height: 20px;">3</span><br /><span class="line" style="height: 20px;">4</span><br /><span class="line" style="height: 20px;">5</span><br /><span class="line" style="height: 20px;">6</span><br /><span class="line" style="height: 20px;">7</span><br /><span class="line" style="height: 20px;">8</span><br /><span class="line" style="height: 20px;">9</span><br /><span class="line" style="height: 20px;">10</span><br /><span class="line" style="height: 20px;">11</span><br /><span class="line" style="height: 20px;">12</span><br /><span class="line" style="height: 20px;">13</span><br /><span class="line" style="height: 20px;">14</span><br /><span class="line" style="height: 20px;">15</span><br /><span class="line" style="height: 20px;">16</span><br /><span class="line" style="height: 20px;">17</span><br /><span class="line" style="height: 20px;">18</span><br /><span class="line" style="height: 20px;">19</span><br /><span class="line" style="height: 20px;">20</span><br /><span class="line" style="height: 20px;">21</span><br /><span class="line" style="height: 20px;">22</span><br /><span class="line" style="height: 20px;">23</span><br /><span class="line" style="height: 20px;">24</span><br /><span class="line" style="height: 20px;">25</span><br /><span class="line" style="height: 20px;">26</span><br /><span class="line" style="height: 20px;">27</span><br /></pre>
</td>
<td class="code" style="padding: 0px; vertical-align: middle; border-width: initial; border-style: none; border-color: initial;">
<pre style="overflow: auto; font-family: "Input Mono", "PT Mono", Consolas, Monaco, Menlo, monospace; font-size: 13px; background: rgb(247, 247, 247); margin-top: 0px; margin-bottom: 0px; padding: 1px; color: rgb(77, 77, 76); line-height: 1.6; border: none;"><span class="line" style="height: 20px;"><span class="keyword" style="color: rgb(137, 89, 168);">public</span> <span class="class"><span class="keyword" style="color: rgb(137, 89, 168);">class</span> <span class="title" style="color: rgb(62, 153, 159);">DataUtil</span> </span>{</span><br /><span class="line" style="height: 20px;"> <span class="function" style="color: rgb(66, 113, 174);"><span class="keyword" style="color: rgb(137, 89, 168);">public</span> <span class="keyword" style="color: rgb(137, 89, 168);">static</span> YunData <span class="title" style="color: rgb(62, 153, 159);">getData</span><span class="params" style="color: rgb(245, 135, 31);">(String url)</span> </span>{</span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//自己对okhttp的封装 </span></span><br /><span class="line" style="height: 20px;"> ResponseBody body = OkhttpUtil.syncGet(url);</span><br /><span class="line" style="height: 20px;"> String html = <span class="keyword" style="color: rgb(137, 89, 168);">null</span>;</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (body == <span class="keyword" style="color: rgb(137, 89, 168);">null</span>) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">return</span> <span class="keyword" style="color: rgb(137, 89, 168);">null</span>;</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">try</span> {</span><br /><span class="line" style="height: 20px;"> html = body.string();</span><br /><span class="line" style="height: 20px;"> } <span class="keyword" style="color: rgb(137, 89, 168);">catch</span> (IOException e) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">return</span> <span class="keyword" style="color: rgb(137, 89, 168);">null</span>;</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> Pattern pattern = Pattern.compile(<span class="string" style="color: rgb(113, 140, 0);">"window.yunData = (.*})"</span>);</span><br /><span class="line" style="height: 20px;"> Matcher matcher = pattern.matcher(html);</span><br /><span class="line" style="height: 20px;"> String json = <span class="keyword" style="color: rgb(137, 89, 168);">null</span>;</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">while</span> (matcher.find()) {</span><br /><span class="line" style="height: 20px;"> json = matcher.group(<span class="number" style="color: rgb(245, 135, 31);">1</span>);</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">if</span> (json == <span class="keyword" style="color: rgb(137, 89, 168);">null</span>) {</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">return</span> <span class="keyword" style="color: rgb(137, 89, 168);">null</span>;</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;"> <span class="comment" style="color: rgb(142, 144, 140);">//fastjson</span></span><br /><span class="line" style="height: 20px;"> YunData yunData = JSON.parseObject(json, YunData.class);</span><br /><span class="line" style="height: 20px;"> <span class="keyword" style="color: rgb(137, 89, 168);">return</span> yunData;</span><br /><span class="line" style="height: 20px;"> }</span><br /><span class="line" style="height: 20px;">}</span><br /></pre>
</td>
</tr>
</tbody>
</table>
</figure></p>
<p style="margin: 0px 0px 25px; color: rgb(85, 85, 85); font-family: Lato, "PingFang SC", "Microsoft YaHei", sans-serif; font-size: 14px; text-align: justify;">YunData自己获取下json数据,就能创建出来,代码就不放了.</p>
<p style="margin: 0px 0px 25px; color: rgb(85, 85, 85); font-family: Lato, "PingFang SC", "Microsoft YaHei", sans-serif; font-size: 14px; text-align: justify;">这么爬取速度很快,3台服务器一天就爬取了100多万.<br />
<a href="https://github.com/gudegg/yunSpider" target="_blank" rel="external" style="background-color: transparent; color: rgb(85, 85, 85); text-decoration: none; border-bottom: 1px solid rgb(153, 153, 153); word-wrap: break-word;">Golang版本</a></p><p><a href="http://zhangguodong.me/2016/09/18/%E7%88%AC%E5%8F%96%E7%99%BE%E5%BA%A6%E7%BD%91%E7%9B%98%E7%94%A8%E6%88%B7%E5%88%86%E4%BA%AB/">阅读全文……</a></p>
2016-12-11T07:13:00Z