Leading by Game-changing Cloud Innovations

Tony Shan

Subscribe to Tony Shan: eMailAlertsEmail Alerts
Get Tony Shan via: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Top Stories by Tony Shan

NoHadoop is not only Hadoop. Why? According to the 2014 Big Data & Advanced Analytics Survey conducted by the market research firm Evans Data, only 16% of over 400 developers surveyed worldwide indicated that Hadoop batch processing was satisfactory in all use cases. 71% of developers also expressed a need for real-time complex event processing more than half the time in their applications, and 27% said they use it all the time. Hadoop has evolved from MapReduce and HDFS in the very beginning to a set of technologies, including Hive, HBase, Sqoop, Flume, Pig, Mahout, etc. Though, Hadoop was originally designed for batch processing. It was based on the map and reduce programming model and it is an overstretch for real-time transactions. Various efforts have been made to enhance Hadoop. For example, YARN was designed to decouple the resource management in the underlying... (more)


The adoption of Hadoop has been increasing. More and more organizations are using Hadoop for various solutions. Some companies replace the existing data stores with Hive and HBase. Some firms make use of Mahout for machine learning. Others are building new applications on the Hadoop platform from the ground up. Now the critical question is where Hadoop can and should be used.  Hadooplicability is the measure of the Hadoop applicability. It helps users find the most applicable areas where Hadoop can be leveraged. While Hadoopability, as defined before in another post of this blog,... (more)

Big Data Capability Model

A capability model is a structure that represents the core abilities and competencies of an entity (department, organization, person, system, and technology) to achieve its objectives, especially in relation to its overall mission and functions. The Big Data Capability Model (BDCM) is defined as the key functionalities in dealing with Big Data problems and challenges. It describes the major features, behaviors, practices and processes in an organization, which can reliably and sustainably produce required outcomes for Big Data demands. BDCM consist of the following elements: Coll... (more)


MapReduce is a programming model for processing massive amounts of data in parallel, and the model was first implemented by Google. MapReduce is typically used to run distributed jobs on clusters of computers. This model inherits the concepts of the map and reduce functions commonly used in functional programming, even though their purpose in the MapReduce framework is significantly different from their original forms. A well-known implementation is the open source Apache Hadoop. Amazon offers a cloud-based solution called Elastic MapReduce (EMR), which utilizes a hosted Hadoop fra... (more)

Technology Outlook for 2014

As we are approaching the year end in 2013, we have seen significant growth of big data this year in the rear mirror. Looking forward, what will be the forecast for the upcoming new year? Where are technologies headed in 2014? There are many interesting movements and activities in the technology frontier. It is exciting that more open source innovations are steadily changing the paradigm. The following items are predicted to be the Top 5 hot areas next year: Connection, Hybrid, Analytics, SDX, and Mobile (CHASM). Connection: The Internet of everything enables people, devices and e... (more)