标签: ibm 开建 世界 | 发表时间:2011-08-28 10:54 | 作者:QJJ 疯癫二楞子

来源IBM Builds Biggest Data Drive Ever - Technology Review

IBM Builds Biggest Data Drive Ever


The system could enable detailed simulations of real-world phenomena—or store 24 billion MP3s.


Thursday, August 25, 2011 By Tom Simonite

2011年8月25日星期四,汤姆 西蒙尼特提供

A data repository almost 10 times bigger than any made before is being built by researchers at IBM's Almaden, California, research lab. The 120 petabyte "drive"—that's 120 million gigabytes—is made up of 200,000 conventional hard disk drives working together. The giant data container is expected to store around one trillion files and should provide the space needed to allow more powerful simulations of complex systems, like those used to model weather and climate.


A 120 petabyte drive could hold 24 billion typical five-megabyte MP3 files or comfortably swallow 60 copies of the biggest backup of the Web, the 150 billion pages that make up the Internet Archive's WayBack Machine.

120拍字节驱动能够存放240亿首标准的五兆字节MP3文件或者轻松地吞下万维网最大备份的60份拷贝,一个备份由1500亿个网页组成的互联网档案馆WayBack Machine.  

The data storage group at IBM Almaden is developing the record-breaking storage system for an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena. However, the new technologies developed to build such a large repository could enable similar systems for more conventional commercial computing, says Bruce Hillsberg, director of storage research at IBM and leader of the project.

IBM阿尔马登研究实验室的数据存储组正在为不知名的客户开发破纪录的存储系统,该客户需要为一台超级计算机用于详细模拟现实世界现象。然而,开发建造如此大的数据仓库使用的新技术也能为更多普通商业计算机建立类似系统,布鲁斯 Hillsberg说,IBM存储研究主任,这个项目的带头人。

"This 120 petabyte system is on the lunatic fringe now, but in a few years it may be that all cloud computing systems are like it," Hillsberg says. Just keeping track of the names, types, and other attributes of the files stored in the system will consume around two petabytes of its capacity.


Steve Conway, a vice president of research with the analyst firm IDC who specializes in high-performance computing (HPC), says IBM's repository is significantly bigger than previous storage systems. "A 120-petabye storage array would easily be the largest I've encountered," he says. The largest arrays available today are about 15 petabytes in size. Supercomputing problems that could benefit from more data storage include weather forecasts, seismic processing in the petroleum industry, and molecular studies of genomes or proteins, says Conway.

史蒂夫 康韦,IDC公司研究分析副总经理,专门研究高性能计算(HPC),他说IBM的数据仓库明显比以前的存储系统大。“120拍字节存储阵列毫无疑问是我遇到过最大的,”他说。如今在用最大的存储阵列大约15拍字节。更大的数据存储有利于超级计算难题,包括天气预报,石油工业的震波图分析和基因组或蛋白质的分子研究,康韦说。

IBM's engineers developed a series of new hardware and software techniques to enable such a large hike in data-storage capacity. Finding a way to efficiently combine the thousands of hard drives that the system is built from was one challenge. As in most data centers, the drives sit in horizontal drawers stacked inside tall racks. Yet IBM's researchers had to make those significantly wider than usual to fit more disks into a smaller area. The disks must be cooled with circulating water rather than standard fans.


The inevitable failures that occur regularly in such a large collection of disks present another major challenge, says Hillsberg. IBM uses the standard tactic of storing multiple copies of data on different disks, but it employs new refinements that allow a supercomputer to keep working at almost full speed even when a drive breaks down.


When a lone disk dies, the system pulls data from other drives and writes it to the disk's replacement slowly, so the supercomputer can continue working. If more failures occur among nearby drives, the rebuilding process speeds up to avoid the possibility that yet another failure occurs and wipes out some data permanently. Hillsberg says that the result is a system that should not lose any data for a million years without making any compromises on performance.


The new system also benefits from a file system known as GPFS that was developed at IBM Almaden to enable supercomputers faster data access. It spreads individual files across multiple disks so that many parts of a file can be read or written at the same time. GPFS also enables a large system to keep track of its many files without laboriously scanning through every one. Last month a team from IBM used GPFS to index 10 billion files in 43 minutes, effortlessly breaking the previous record of one billion files scanned in three hours.


Software improvements like those being developed for GPFS and disk recovery are crucial to enabling such giant data drives, says Hillsberg, because in order to be practical, they must become not only bigger, but also faster. Hard disks are not becoming faster or more reliable in proportion to the demands for more storage, so software must make up the difference.


IDC's Conway agrees that faster access to larger data storage systems is becoming crucial to supercomputing—even though supercomputers are most often publicly compared on their processor speeds, as is the case with the global TOP500 list used to determine international bragging rights. Big drives are becoming important because simulations are getting larger and many problems are tackled using so-called iterative methods, where a simulation is run thousands of times and the results compared, says Conway. "Checkpointing," a technique in which a supercomputer saves snapshots of its work in case the job doesn't complete successfully, is also common. "These trends have produced a data explosion in the HPC community," says Conway.

IDC公司康威承认对于超级计算来说快速存取大型数据仓库越来越重要--虽然超级计算更多在计算速度方面进行比较,全球TOP500通常就是这样确定国际(bragging rights)。因为模拟量变得更大,很多问题使用称为迭代的方法解决,一项模拟要运行上千次并比较结果,康韦说。检验点,一项用于保存万一任务没有成功完成时的工作快照,也经常被用到。高性能计算机群体出现了数据爆炸的趋势。康韦说。






  川崎   年内、国内最大级太阳光发电区域


相关 [ibm 开建 世界] 推荐:


- 疯癫二楞子 - 译言-电脑/网络/数码科技
来源IBM Builds Biggest Data Drive Ever - Technology Review. IBM开建迄今为止最大数据驱动. 该系统能够详细模拟现实世界现象--或存储240亿首MP3歌曲. 2011年8月25日星期四,汤姆 西蒙尼特提供. A data repository almost 10 times bigger than any made before is being built by researchers at IBM's Almaden, California, research lab.


- Adam - cnBeta.COM
本报讯 近日,IBM正在部署磁盘容量为120拍字节(PB)的数据存储中心,这一容量比目前世界上最大的存储中心还要大8倍左右. 美国加利福尼亚州圣何塞市的IBM专家们正在开发世界上最大的数据中心,其容量将达到120拍字节. 这一容量足以拷贝约240亿个MP3文件,足以将Facebook所有用户的数据保存大约13年.

Jeopardy, IBM 和 Wolfram|Alpha

- Grandbook -
近一期的时代杂志讲 Singularity,据说人类到 2045 年将进入半电子人时代,电脑获得意识,人类开始衰亡. 怀疑的人也有,但半月前,IBM 开发的 Watson 系统在 Jeopardy!(一译《危险边缘》)中战胜人类对手似乎加剧了人们对机器的预期,引得雀跃或焦虑. 开赛前,Stephen Wolfram 谈论 Wolfram|Alpha 和 Watson 之间的差异,以及,在该领域做过的分析和经验.

百年老店 IBM(1911-2011)

- 友剑 - 弯曲评论
IBM,创办于1911年6月16日. 综观IBM的过去,IBM孕育了许许多多影响了人类文明的创造发明. 在计算机方面,IBM 360,Personal PC,Fortran语言,关系数据库,RISC处理器. N个图灵奖获得者的主要或者部分工作出自IBM的研究中心【Frances E. Allen[第一位女性,For编译和高性能计算], John Cocke[For RISC], John Backus[For Fortran语言], Edgar Codd[For关系数据库],Frederick Brooks[For体系结构,360的主要贡献者], Amir Pnueli[For时序逻辑和形式系统], Richard Manning Karp[For计算理论]】.

IBM PC 今天已满30周岁

- Far Soul - cnBeta.COM
今天是IBM PC整整走过30年的日子,从来没有一款机器可以如此深远地改变世界思考问题的方式,1981年8月12日,它用一个小巧的姿态:售价1565美元走入千家万户,当时的电脑与今天相比非常简陋:ROM里装有Microsoft BASIC. 其彩色图形适配器可以使用普通的电视机作为图像输出设备,或者使用单色显示适配器和5151型单色荧光屏.

IBM Watson变身行业多面手

- Leo - cnBeta.COM
经历了与人类智慧的重重比拼,IBM Watson改变了机器冷冰冰的印象,成为了智慧代言. 引来惊叹过后,Watson也给我们留下了思考和启发,企业开始追问Watson的应用价值,思考Watson能给企业带来什么,人机大战并不是目的,人机协作才是未来.


- wang - Solidot
IBM推出了停车分析系统,帮助城市消除停车拥堵现象同时征收到更多的停车费,它也能帮助摩托车手更容易的在拥挤的闹市区找到停车空间. 智能泊车(Smarter Parking)系统结合IBM的数据分析能力和Streetline在停车感测器与软件上的创新技术. 安置在停车场的Streetline省电感测器,监测是否有车子在场,然后把信息即时传到城市网络与Android或iOS应用程序.

IBM任命Virginia Rometty为下任CEO

- lin - Solidot
IBM任命公司高管Virginia"Ginni"Rometty为公司新任CEO兼总裁,她将是蓝色巨人历史上第一位女性CEO. Rometty是IBM负责市场和销售的高级副总裁,她将从明年一月一日起接替现任CEO兼董事长Sam Palmisano. Palmisano从2002年开始担任CEO,他在卸任之后继续担任董事长.


- - 人月神话的BLOG
SOA系统的松耦合特性决定了这样的系统天生具有灵活性——当外部环境发生改变时,SOA系统可以随之简便而快捷地进行调整. 在云计算、企业移动、大数据和社交商务兴起之际,依靠与生俱来的灵活性,下一代SOA化解了IT环境所拥有的高复杂性,简化了企业IT架构的建设,并充分外延,打上了云计算和企业移动的新标签——传统SOA专注于企业内部信息和应用的整合,下一代SOA则从企业内部延伸到外部,从技术层面上升到业务层面.


- - HTML5研究小组
北京时间1月31日晚间消息,IBM今日宣布,将收购以色列移动应用开发商WorkLight. WorkLight可帮助企业开发和传播HTML5、混合( hybrid)及本地应用,在确保完美的用户体验的前提下,极大地缩减上市时间、成本和复杂性. Worklight客户涵盖各个领域,金融、零售和保健等. 例如,Worklight可帮助一家银行开发一项应用,确保用户安全地访问其账户、支付账单和管理投资等,无论用户使用何种设备.