IBM开建世界最大数据仓库

标签: ibm 开建 世界 | 发表时间:2011-08-28 10:54 | 作者:QJJ 疯癫二楞子
出处:http://www.yeeyan.org

原作者:
来源IBM Builds Biggest Data Drive Ever - Technology Review
译者QJJ

IBM Builds Biggest Data Drive Ever

IBM开建迄今为止最大数据驱动

The system could enable detailed simulations of real-world phenomena—or store 24 billion MP3s.

该系统能够详细模拟现实世界现象--或存储240亿首MP3歌曲。

Thursday, August 25, 2011 By Tom Simonite

2011年8月25日星期四,汤姆 西蒙尼特提供

A data repository almost 10 times bigger than any made before is being built by researchers at IBM's Almaden, California, research lab. The 120 petabyte "drive"—that's 120 million gigabytes—is made up of 200,000 conventional hard disk drives working together. The giant data container is expected to store around one trillion files and should provide the space needed to allow more powerful simulations of complex systems, like those used to model weather and climate.

加利福尼亚IBM阿尔马登研究实验室的研究人员正在建造一个数据仓库,他比现有的数据仓库几乎大十倍。这个120拍字节的“驱动”--也是是12,000万千兆字节--是由一同工作的200,000个普通硬盘组成。这个巨大的数据集装箱有望存储大约1万亿个文档,允许更加强大而复杂系统模拟,比如那些用于模拟天气和气候,能够为它们提供需要的空间。

A 120 petabyte drive could hold 24 billion typical five-megabyte MP3 files or comfortably swallow 60 copies of the biggest backup of the Web, the 150 billion pages that make up the Internet Archive's WayBack Machine.

120拍字节驱动能够存放240亿首标准的五兆字节MP3文件或者轻松地吞下万维网最大备份的60份拷贝,一个备份由1500亿个网页组成的互联网档案馆WayBack Machine.  

The data storage group at IBM Almaden is developing the record-breaking storage system for an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena. However, the new technologies developed to build such a large repository could enable similar systems for more conventional commercial computing, says Bruce Hillsberg, director of storage research at IBM and leader of the project.

IBM阿尔马登研究实验室的数据存储组正在为不知名的客户开发破纪录的存储系统,该客户需要为一台超级计算机用于详细模拟现实世界现象。然而,开发建造如此大的数据仓库使用的新技术也能为更多普通商业计算机建立类似系统,布鲁斯 Hillsberg说,IBM存储研究主任,这个项目的带头人。

"This 120 petabyte system is on the lunatic fringe now, but in a few years it may be that all cloud computing systems are like it," Hillsberg says. Just keeping track of the names, types, and other attributes of the files stored in the system will consume around two petabytes of its capacity.

“现在120拍字节系统属于极少数,但以后几年也许所有云计算系统与他类似,”Hillsberg说。存储在系统中光光保存姓名,类型和其他文件属性的记录就会占大约2拍字节容量。

Steve Conway, a vice president of research with the analyst firm IDC who specializes in high-performance computing (HPC), says IBM's repository is significantly bigger than previous storage systems. "A 120-petabye storage array would easily be the largest I've encountered," he says. The largest arrays available today are about 15 petabytes in size. Supercomputing problems that could benefit from more data storage include weather forecasts, seismic processing in the petroleum industry, and molecular studies of genomes or proteins, says Conway.

史蒂夫 康韦,IDC公司研究分析副总经理,专门研究高性能计算(HPC),他说IBM的数据仓库明显比以前的存储系统大。“120拍字节存储阵列毫无疑问是我遇到过最大的,”他说。如今在用最大的存储阵列大约15拍字节。更大的数据存储有利于超级计算难题,包括天气预报,石油工业的震波图分析和基因组或蛋白质的分子研究,康韦说。

IBM's engineers developed a series of new hardware and software techniques to enable such a large hike in data-storage capacity. Finding a way to efficiently combine the thousands of hard drives that the system is built from was one challenge. As in most data centers, the drives sit in horizontal drawers stacked inside tall racks. Yet IBM's researchers had to make those significantly wider than usual to fit more disks into a smaller area. The disks must be cooled with circulating water rather than standard fans.

IBM工程师开发了一系列新的软件和硬件技术,使数据存储容量得到很大提升。难题之一是找到解决办法有效地联合系统建造中用到的上千个硬盘驱动器。与大部分数据中心相同,硬盘驱动器被堆叠安放在高大的机架的水平抽屉里。但是IBM的研究人员必须使他比普通的更宽,在更小的区域装下更多硬盘。硬盘必须用循环水冷却而不是标准风扇。

The inevitable failures that occur regularly in such a large collection of disks present another major challenge, says Hillsberg. IBM uses the standard tactic of storing multiple copies of data on different disks, but it employs new refinements that allow a supercomputer to keep working at almost full speed even when a drive breaks down.

另一个主要的难题是,如此多的硬盘不可避免地定期发生故障,Hillsberg说。IBM使用的标准策略是在不同的硬盘存放多个数据拷贝,然而他们使用新的改良方法,即使当驱动器发生故障,容许超级计算机保持近乎全速工作状态。

When a lone disk dies, the system pulls data from other drives and writes it to the disk's replacement slowly, so the supercomputer can continue working. If more failures occur among nearby drives, the rebuilding process speeds up to avoid the possibility that yet another failure occurs and wipes out some data permanently. Hillsberg says that the result is a system that should not lose any data for a million years without making any compromises on performance.

一个单独的硬盘停止时,系统会从其他驱动器取得数据并且缓慢地向替换硬盘写入数据,因此超级计算机能够继续工作。如果相邻驱动器出现更多故障,加速重建过程避免可能另一故障发生使得部分数据永久毁坏。Hillsberg说,结果是不需要对性能作任何妥协,系统一百万年内不会丢失任何数据。

The new system also benefits from a file system known as GPFS that was developed at IBM Almaden to enable supercomputers faster data access. It spreads individual files across multiple disks so that many parts of a file can be read or written at the same time. GPFS also enables a large system to keep track of its many files without laboriously scanning through every one. Last month a team from IBM used GPFS to index 10 billion files in 43 minutes, effortlessly breaking the previous record of one billion files scanned in three hours.

新系统得益于一种称为GPFS的文件系统,是IBM阿尔马登研究实验室开发的能够让超级计算机数据存取更快。他把多个独立文件摆放到多个硬盘,这样就可以同时对文件的多个部分进行读或写。GPFS也能够让一个大的系统记录很多文件而不必费力地扫描每个文件。上个月,IBM一个工作组用GPFS在43分钟内索引100亿个文件,毫不费力地打破了原先三个小时内扫描10亿文件的纪录。

Software improvements like those being developed for GPFS and disk recovery are crucial to enabling such giant data drives, says Hillsberg, because in order to be practical, they must become not only bigger, but also faster. Hard disks are not becoming faster or more reliable in proportion to the demands for more storage, so software must make up the difference.

要使这么巨大的数据驱动实现,那些正在对GPFS的开发和硬盘恢复方面的软件改进至关重要,Hillsberg说,为了实用性,所以他们必一定不能那么大,但速度要更快。要求更大的存储量相对应硬盘会变得不再更快更可靠,因此软件必须弥补其中的差别。

IDC's Conway agrees that faster access to larger data storage systems is becoming crucial to supercomputing—even though supercomputers are most often publicly compared on their processor speeds, as is the case with the global TOP500 list used to determine international bragging rights. Big drives are becoming important because simulations are getting larger and many problems are tackled using so-called iterative methods, where a simulation is run thousands of times and the results compared, says Conway. "Checkpointing," a technique in which a supercomputer saves snapshots of its work in case the job doesn't complete successfully, is also common. "These trends have produced a data explosion in the HPC community," says Conway.

IDC公司康威承认对于超级计算来说快速存取大型数据仓库越来越重要--虽然超级计算更多在计算速度方面进行比较,全球TOP500通常就是这样确定国际(bragging rights)。因为模拟量变得更大,很多问题使用称为迭代的方法解决,一项模拟要运行上千次并比较结果,康韦说。检验点,一项用于保存万一任务没有成功完成时的工作快照,也经常被用到。高性能计算机群体出现了数据爆炸的趋势。康韦说。

添加新评论

相关文章:

  IBM开建世界最大数据中心

  2011最大的20家欧洲公司

  IBM新存储技术:比Flash闪存快100倍

  川崎   年内、国内最大级太阳光发电区域

  IBM庆百岁大寿,苹果微软学转型经验

相关 [ibm 开建 世界] 推荐:

IBM开建世界最大数据仓库

- 疯癫二楞子 - 译言-电脑/网络/数码科技
来源IBM Builds Biggest Data Drive Ever - Technology Review. IBM开建迄今为止最大数据驱动. 该系统能够详细模拟现实世界现象--或存储240亿首MP3歌曲. 2011年8月25日星期四,汤姆 西蒙尼特提供. A data repository almost 10 times bigger than any made before is being built by researchers at IBM's Almaden, California, research lab.

IBM建世界最大数据中心

- Adam - cnBeta.COM
本报讯 近日,IBM正在部署磁盘容量为120拍字节(PB)的数据存储中心,这一容量比目前世界上最大的存储中心还要大8倍左右. 美国加利福尼亚州圣何塞市的IBM专家们正在开发世界上最大的数据中心,其容量将达到120拍字节. 这一容量足以拷贝约240亿个MP3文件,足以将Facebook所有用户的数据保存大约13年.

Jeopardy, IBM 和 Wolfram|Alpha

- Grandbook - Apple4.us
近一期的时代杂志讲 Singularity,据说人类到 2045 年将进入半电子人时代,电脑获得意识,人类开始衰亡. 怀疑的人也有,但半月前,IBM 开发的 Watson 系统在 Jeopardy!(一译《危险边缘》)中战胜人类对手似乎加剧了人们对机器的预期,引得雀跃或焦虑. 开赛前,Stephen Wolfram 谈论 Wolfram|Alpha 和 Watson 之间的差异,以及,在该领域做过的分析和经验.

百年老店 IBM(1911-2011)

- 友剑 - 弯曲评论
IBM,创办于1911年6月16日. 综观IBM的过去,IBM孕育了许许多多影响了人类文明的创造发明. 在计算机方面,IBM 360,Personal PC,Fortran语言,关系数据库,RISC处理器. N个图灵奖获得者的主要或者部分工作出自IBM的研究中心【Frances E. Allen[第一位女性,For编译和高性能计算], John Cocke[For RISC], John Backus[For Fortran语言], Edgar Codd[For关系数据库],Frederick Brooks[For体系结构,360的主要贡献者], Amir Pnueli[For时序逻辑和形式系统], Richard Manning Karp[For计算理论]】.

IBM PC 今天已满30周岁

- Far Soul - cnBeta.COM
今天是IBM PC整整走过30年的日子,从来没有一款机器可以如此深远地改变世界思考问题的方式,1981年8月12日,它用一个小巧的姿态:售价1565美元走入千家万户,当时的电脑与今天相比非常简陋:ROM里装有Microsoft BASIC. 其彩色图形适配器可以使用普通的电视机作为图像输出设备,或者使用单色显示适配器和5151型单色荧光屏.

IBM Watson变身行业多面手

- Leo - cnBeta.COM
经历了与人类智慧的重重比拼,IBM Watson改变了机器冷冰冰的印象,成为了智慧代言. 引来惊叹过后,Watson也给我们留下了思考和启发,企业开始追问Watson的应用价值,思考Watson能给企业带来什么,人机大战并不是目的,人机协作才是未来.

IBM推出停车分析系统

- wang - Solidot
IBM推出了停车分析系统,帮助城市消除停车拥堵现象同时征收到更多的停车费,它也能帮助摩托车手更容易的在拥挤的闹市区找到停车空间. 智能泊车(Smarter Parking)系统结合IBM的数据分析能力和Streetline在停车感测器与软件上的创新技术. 安置在停车场的Streetline省电感测器,监测是否有车子在场,然后把信息即时传到城市网络与Android或iOS应用程序.

IBM任命Virginia Rometty为下任CEO

- lin - Solidot
IBM任命公司高管Virginia"Ginni"Rometty为公司新任CEO兼总裁,她将是蓝色巨人历史上第一位女性CEO. Rometty是IBM负责市场和销售的高级副总裁,她将从明年一月一日起接替现任CEO兼董事长Sam Palmisano. Palmisano从2002年开始担任CEO,他在卸任之后继续担任董事长.

谈IBM的新一代SOA思路

- - 人月神话的BLOG
SOA系统的松耦合特性决定了这样的系统天生具有灵活性——当外部环境发生改变时,SOA系统可以随之简便而快捷地进行调整. 在云计算、企业移动、大数据和社交商务兴起之际,依靠与生俱来的灵活性,下一代SOA化解了IT环境所拥有的高复杂性,简化了企业IT架构的建设,并充分外延,打上了云计算和企业移动的新标签——传统SOA专注于企业内部信息和应用的整合,下一代SOA则从企业内部延伸到外部,从技术层面上升到业务层面.

IBM收购HTML5应用开发商WorkLight

- - HTML5研究小组
北京时间1月31日晚间消息,IBM今日宣布,将收购以色列移动应用开发商WorkLight. WorkLight可帮助企业开发和传播HTML5、混合( hybrid)及本地应用,在确保完美的用户体验的前提下,极大地缩减上市时间、成本和复杂性. Worklight客户涵盖各个领域,金融、零售和保健等. 例如,Worklight可帮助一家银行开发一项应用,确保用户安全地访问其账户、支付账单和管理投资等,无论用户使用何种设备.