锦州麻将吧

分布感知在線采樣在子數據集上高效準確的近似研究



活動地點:校本部東區計算機大樓1001室

活動時間:2019-08-23 09:00:00

報 告 人:Jun Wang, University of Central Florida, USA

報告時間:08月23日(周五)9:00~10:00

報告地點:校本部東區計算機大樓1001室

邀 請 人:李衛民 副教授

報告摘要: In this talk, we aim to enable both efficient and accurate approximations on arbitrary sub-datasets of a large dataset. Due to the prohibitive storage overhead of caching offline samples for each sub-dataset, existing offline sample based systems provide high accuracy results for only a limited number of sub-datasets, such as the popular ones. On the other hand, current online sample based approximation systems, which generate samples at runtime, do not take into account the uneven storage distribution of a sub-dataset. They work well for uniform distribution of a sub-dataset while suffer low sampling efficiency and poor estimation accuracy on unevenly distributed sub-datasets. To address the problem, we develop a distribution aware method called Sapprox. Our idea is to collect the occurrences of a sub-dataset at each logical partition of a dataset (storage distribution) in the distributed system, and make good use of such information to facilitate online sampling. There are three thrusts in Sapprox. First, we develop a probabilistic map to reduce the exponential number of recorded sub-datasets to a linear one. Second, we apply the cluster sampling with unequal probability theory to implement a distribution-aware sampling method for efficient online sub-dataset sampling. Third, we quantitatively derive the optimal sampling unit size in a distributed file system by associating it with approximation costs and accuracy. We have implemented Sapprox into Hadoop ecosystem as an example system and open sourced it on GitHub. Our comprehensive experimental results show that Sapprox can achieve a speedup by up to a factor of 20 over the precise execution.

報告人簡介:

 Prof. Jun Wang is the Director of the Computer Architecture and Storage Systems (CASS) Laboratory at the University of Central Florida, Orlando, FL, USA. He has authored over 120 publications in premier journals such as IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, and leading HPC and systems conferences such as VLDB, HPDC, EuroSys, IPDPS, ICS, Middleware, FAST. He has conducted extensive research in the areas of Computer Systems and High Performance Computing. His specific research interests include massive storage and file System in local, distributed and parallel systems environment. His group has secured multi-million dollars federal research fundings in last five years. At present, his group is investigating three US National Science Foundation projects, one DARPA and one NASA project. He has graduated 13 Ph.D. students who upon their graduations were employed by major US IT corporations. In 2019, he won IEEE Transactions on Cloud Computing Editorial Excellence and Eminence (EEE) award. He has been serving on the editorial board for the IEEE transactions on parallel and distributed systems, and IEEE transactions on cloud computing. He is a general executive chair for IEEE DASC/DataCom/PIcom/CyberSciTech 2017, and has co-chaired technical programs in numerous computer systems conferences including the 2018 IEEE international conference on High Performance Computing and Communications (HPCC18). 

主辦單位:上海大學計算機工程與科學學院


锦州麻将吧 澳洲快乐时时是真的吗 真钱抢庄牌九 百人牛牛稳赢公式 北京pk计划软件手机版 网站地址链接澳门 彩票和值大小单双技巧 北京pk10定位计划软件 下载牛牛游戏 时时彩大小网页计划 如何刷彩票返点