Sector/Sphere 比hadoop快2-4倍
Sector/Sphere
High Performance Distributed File System and Parallel Data Processing Engine
高性能分布式文件系统和并行数据处理引擎
Sector/Sphere supports distributed data storage, distribution, and processing over large clusters of commodity computers, either within a data center or across multiple data centers. Sector is a high performance, scalable, and secure distributed file system. Sphere is a high performance parallel data processing engine that can process Sector data files on the storage nodes with very simple programming interfaces.
sector/sphere 支持分布式数据存储,基于大数据的分布式处理和常规计算,数据可以基于一个或者多个数据中心。sector是一个高性能,可扩展并且安全的分布式文件系统。sphere是一个高性能并行处理引擎,他可以处理存储在sector节点上得数据文件并且实现接口很简单。
Why Sector/Sphere? 为什么要用?
High Performance. Sector and Sphere are highly optimized for data intensive applications. Sphere supports massive parallel in-storage data processing, supported by Sector's unique application-aware data placement mechanism. In our benchmarks,
Sphere runs constantly 2 - 4 times faster than Hadoop MapReduce (see
benchmark).
高性能: sector和sphere 是经过优化来的数据敏感的应用。sphere支持大数据并行处理,且只识别sector存储的数据。sphere的计算速度通常比hadoop的MapReduce框架快出2-4倍。
WAN Support. Sector is one of the few file systems that can effectively support multiple data centers across wide area networks. Sector uses UDT to enable high speed data transfer, while its data placement strategy can make Sector effectively work as a content distribution network over WAN.
wan支持,sector 是一个极少数的可以高效支持存在于网络中的多数据中心的文件系统。sector使用UDT来提高数据的传输速度,所以他的数据存储策略能够使得sector高效的工作通过wan。
Software Level Fault Tolerance. Sector does not require hardware RAID for reliability; instead, data is automatically replicated in Sector for high reliability and availability. Meanwhile, both Sector slaves and masters can be removed and inserted at run time. Sector also supports multiple active masters for high performance and availability.
软件级容错机制, sector不需要硬件RAID保证稳定性,相反数据在sector间自动备份保证了高可用性和高稳定性,同时支持sector主节点或从节点热插拔。sector还支持多活跃masters节点来确保性能和可用性。
Rule-based Data Management. For each file, users can control its replication factor, replication distance, and replication locations (when necessary). The rules can be changed at run time.
基于规则的数据管理策略, 用户可以对每个文件的复制因子,复制距离,和复制地址(当需要的时候)的可控配置。这些规则可以在运行时来改变。