Host.HowPick 发表于 2012-3-23 00:42:39

AWK 路由故障。 好几个小时了, 估计还没有解决。

本帖最后由 Host.HowPick 于 2012-3-23 00:44 编辑

出了问题之后, 马上通知技术检查,等了两个小时之后才给通知说明故障情况。


Host.HowPick 发表于 2012-3-23 00:43:15

We have dropped what was supposed to be the new core at our newest datacenter and rolled back to an old router in the 800 s hope datacenter. Cisco techs are helping us try to diagnose the core - we're uncertain what's happening, but we suspect there may be a faulty distributed forwarding card on our 6704 10ge blade.

The current setup should keep you online, we appologize for the outages but this is an extreme case of apparent hardware failure.

Host.HowPick 发表于 2012-3-23 00:43:47

3个小时之后, 发了通告。
static/image/smiley/default/mad.gif
static/image/smiley/default/mad.gif
static/image/smiley/default/mad.gif




Last night our new fancy upgraded core in our newest datacenter experienced a very unexpected hardware failure.

Initially technicians thought we may be dealing with a case of damaged fiber between buildings, as the network was experiencing intermittent lag and packet loss despite no DDoS events. Some IP ranges were fine while others were almost completely inaccessible. As techs cycled through the fiber pairs it became clear that this was not the case.

Additional staff were paged to try and diagnose the problem - after trying literally everything under the sun we began the process of migrating core switching back to the 800 s hope datacenter to try and resolve the packet loss some customers were still experiencing. All traffic is now switching out of the old core as we go over the problem with Cisco Techs. At this time it appears a Distributed Forwarding module for the 10GE blade connecting the new core to the old datacenter is the culprit.

We appologize for any issues this may have caused you or your customers, but we do not expect any more network outages while techs replace the faulty device. If we encounter a similar problem in the future techs know to force the other DCs into direct routing mode rather than experiencing lag.
页: [1]
查看完整版本: AWK 路由故障。 好几个小时了, 估计还没有解决。