Cloud Computing

James Kobielus: Big Data, Cognitive Computing and Future of Product

Editor’s Note: As IBM’s Big Data Evangelist, James Kobielus is IBM Senior Program Director, Product Marketing, Big Data Analytics Solutions. He is an industry veteran, a popular speaker and social media participant, and a thought leader in Big Data, Hadoop, Enterprise Data Warehousing, Advanced Analytics, Business Intelligence, Data Management, and Next Best Action Technologies. He works with IBM’s product management and marketing teams in Big Data. He has spoken at such leading industry events as IBM Information On Demand, IBM Big Data Integration and Governance, Hadoop Summit, Strata, and Forrester Business Process Forum. He has published several business technology books and is a very popular provider of original commentary on blogs and many social media.

To learn more about his research, works, ideas, theories and knowledge, please check this this this this this this and this out.

eTalk’s Niaz Uddin has interviewed James Kobielus recently to gain insights about his ideas, research and works in the field of Big Data which is given below.

Niaz: Dear James, thank you so much for joining us in the midst of your busy schedule. We are very thrilled and honored to have you at eTalks.

James: And I’m thrilled and honored that you asked me.

Niaz: You are a leading expert on Big Data, as well as on such enabling technologies as enterprise data warehousing, advanced analytics, Hadoop, cloud services, database management systems, business process management, business intelligence, and complex-event processing. At the beginning of our interview can you please tell us about Big Data? How does Big Data make sense of the new world?

James: Big Data refers to approaches for extracting deep value from advanced analytics and trustworthy data at all scales. At the heart of advanced analytics is data mining, which is all about using statistical analysis to find non-obvious patterns (segmentations, correlations, trends, propensities, etc.) within historical data sets.

Some might refer to advanced analytics as tools for “making sense” of this data in ways that are beyond the scope of traditional reporting and visualization. As we aggregate and mine a wider variety of data sources, we can find far more “sense”–also known as “insights”–that previously lay under the surface. Likewise, as we accumulate a larger volume of historical data from these sources and incorporate a wider variety of variables from them into our models, we can build more powerful predictive models of what might happen under various future circumstances. And if we can refresh this data rapidly with high-velocity high-quality feeds, while iterating and refining our models more rapidly, we can ensure that our insights reflect the latest, greatest data and analytics available.

That’s the power of Big Data: achieve more data-driven insights (aka “making sense”) by enabling our decision support tools to leverage the “3 Vs”: a growing Volume of stored data, higher Velocity of data feeds, and broader Variety of data sources.

Niaz: As you know, Big Data has already started to redefine search, media, computing, social media, products, services and so on. Availability of Data helping us analyzing trend and doing interesting things in more accurate and efficient ways than before. What are some of the most interesting uses of big data out there today?

James: Where do I start? There are interesting uses of Big Data in most industries and in most business functions.

I think cognitive computing applications of Big Data are among the most transformative tools in modern business.

Cognitive computing is a term that probably goes over the head of most of the general public. IBM defines it as the ability of automated systems to learn and interact naturally with people to extend what either man or machine could do on their own, thereby helping human experts drill through big data rapidly to make better decisions.

One way I like to describe cognitive computing is as the engine behind “conversational optimization.” In this context, the “cognition” that drives the “conversation” is powered by big data, advanced analytics, machine learning and agile systems of engagement. Rather than rely on programs that predetermine every answer or action needed to perform a function or set of tasks, cognitive computing leverages artificial intelligence and machine learning algorithms that sense, predict, infer and, if they drive machine-to-human dialogues, converse.

Cognitive computing performance improves over time as systems build knowledge and learn a domain’s language and terminology, its processes and its preferred methods of interacting. This is why it’s such a powerful conversation optimizer. The best conversations are deep in give and take, questioning and answering, tackling topics of keenest interest to the conversants. When one or more parties has deep knowledge and can retrieve it instantaneously within the stream of the moment, the conversation quickly blossoms into a more perfect melding of minds. That’s why it has been deployed into applications in healthcare, banking, education and retail that build domain expertise and require human-friendly interaction models.

IBM Watson is one of the most famous exemplars of the power of cognitive computing driving agile human-machine conversations.  In its famous “Jeopardy!” appearance, Watson illustrated how its Deep Question and Answer technology—which is cognitive computing to the core—can revolutionize the sort of highly patterned “conversation” characteristic of a TV quiz show. By having its Deep Q&A results rendered (for the sake of that broadcast) in a synthesized human voice, Watson demonstrated how it could pass (and surpass) any Turing test that tried to tell whether it was a computer rather than, say, Ken Jennings. After all, the Turing test is conversational at its very core.

What’s powering Watson’s Deep Q&A technology is an architecture that supports an intelligent system of engagement. Such an architecture is able to mimic real human conversation, in which the dialogue spans a broad, open domain of subject matter; uses natural human language; is able to process complex language with a high degree of accuracy, precision and nuance; and operates with speed-of-thought fluidity.

Where the “Jeopardy!” conversational test was concerned (and where the other participants were humans literally at the top of that game), Watson was super-optimized. However, in the real-world of natural human conversation, the notion of “conversation optimization” might seem, at first glance, like a pointy-headed pipedream par excellence. However, you don’t have to be an academic sociologist to realize that society, cultures and situational contexts impose many expectations, constraints and other rules to which our conversations and actions must conform (or face disapproval, ostracism, or worse). Optimizing our conversations is critical to surviving and thriving in human society.

Wouldn’t it be great to have a Watson-like Deep Q&A adviser to help us understand the devastating faux pas to avoid and the right bon mot to drop into any conversation while we’re in the thick of it? That’s my personal dream and I’ll bet that before long, with mobile and social coming into everything, it will be quite feasible (no, this is not a product announcement—just the dream of one IBMer). But what excites me even more (and is definitely not a personal pipedream), is IBM Watson Engagement Advisor, which we unveiled earlier this year. It is a cognitive-computing assistant that revolutionizes what’s possible in multichannel B2C conversations. The  solution’s “Ask Watson” feature uses Deep Q&A to greet customers, conduct contextual conversations on diverse topics, and ensure that the overall engagement is rich with answers, guidance and assistance.

Cognitive/conversational computing is also applicable to “next best action,” which is one of today’s hottest new focus areas in intelligent systems. At its heart, next best action refers to an intelligent infrastructure that optimizes agile engagements across many customer-facing channels, including portal, call center, point of sales, e-mail and social. With cognitive-computing infrastructure the silent assistant, customers engage in a never-ending whirligig of conversations with humans and, increasingly, with automated bots, recommendation engines and other non-human components that, to varying degrees, mimic real-human conversation.

Niaz: So do you think machine learning is the right way to analyze Big Data?

James: Machine learning is an important approach for extracting fresh insights from unstructured data in an automated fashion, but it’s not the only approach. For example, machine learning doesn’t eliminate the need for data scientists to build segmentation, regression, propensity, and other models for data mining and predictive analytics.

Fundamentally, machine learning is a productivity tool for data scientists, helping them to get smarter, just as machine learning algorithms can’t get smarter without some ongoing training by data scientists. Machine learning allows data scientists to train a model on an example data set, and then leverage algorithms that automatically generalize and learn both from that example and from fresh feeds of data. To varying degrees, you’ll see the terms “unsupervised learning,” “deep learning,” “computational learning,” “cognitive computing,” “machine perception,” “pattern recognition,” and “artificial intelligence” used in this same general context.

Machine learning doesn’t mean that the resultant learning is always superior to what human analysts might have achieved through more manual knowledge-discovery techniques. But you don’t need to believe that machines can think better than or as well as humans to see the value of machine learning. We gladly offload many cognitive processes to automated systems where there just aren’t enough flesh-and-blood humans to exercise their highly evolved brains on various analytics tasks.

Niaz:What are the available technologies out there those help profoundly to analyze data? Can you please briefly tell us about Big Data technologies and their important uses?

James: Once again, it’s a matter of “where do I start?” The range of Big Data analytics technologies is wide and growing rapidly. We live in the golden age of database and analytics innovation. Their uses are everywhere: in every industry, every business function, and every business process, both back-office and customer-facing.

For starters, Big Data is much more than Hadoop. Another big data “H”—hybrid—is becoming dominant, and Hadoop is an important (but not all-encompassing) component of it. In the larger evolutionary perspective, big data is evolving into a hybridized paradigm under which Hadoop, massively parallel processing enterprise data warehouses, in-memory columnar, stream computing, NoSQL, document databases, and other approaches support extreme analytics in the cloud.

Hybrid architectures address the heterogeneous reality of big data environments and respond to the need to incorporate both established and new analytic database approaches into a common architecture. The fundamental principle of hybrid architectures is that each constituent big data platform is fit-for-purpose to the role for which it’s best suited. These big data deployment roles may include any or all of the following: data acquisition, collection, transformation, movement, cleansing, staging, sandboxing, modeling, governance, access, delivery, archiving, and interactive exploration. In any role, a fit-for-purpose big data platform often supports specific data sources, workloads, applications, and users.

Hybrid is the future of big data because users increasingly realize that no single type of analytic platform is always best for all requirements. Also, platform churn—plus the heterogeneity it usually produces—will make hybrid architectures more common in big data deployments.

Hybrid deployments are already widespread in many real-world big data deployments. The most typical are the three-tier—also called “hub-and-spoke”—architectures. These environments may have, for example, Hadoop (e.g., IBM InfoSphere BigInsights) in the data acquisition, collection, staging, preprocessing, and transformation layer; relational-based MPP EDWs (e.g., IBM PureData System for Analytics) in the hub/governance layer; and in-memory databases (e.g., IBM Cognos TM1) in the access and interaction layer.

The complexity of hybrid architectures depends on range of sources, workloads, and applications you’re trying to support. In the back-end staging tier, you might need different preprocessing clusters for each of the disparate sources: structured, semi-structured, and unstructured.

In the hub tier, you may need disparate clusters configured with different underlying data platforms—RDBMS, stream computing, HDFS, HBase, Cassandra, NoSQL, and so on—-and corresponding metadata, governance, and in-database execution components.

And in the front-end access tier, you might require various combinations of in-memory, columnar, OLAP, dimensionless, and other database technologies to deliver the requisite performance on diverse analytic applications, ranging from operational BI to advanced analytics and complex event processing.

Niaz: That’s really amazing. How to you connect these two dots: Big Data Analytics and Cognitive Computing? How does this connection make sense?

James: The relationship between Cognitive computing and Big Data is simple. Cognitive computing is an advanced analytic approach that helps humans drill through the unstructured data within Big Data repositories more rapidly in order to see correlations, patterns, and insights more rapidly.

Think of cognitive computing as a “speed-of-thought accelerator.” Speed of thought is something we like to imagine operates at a single high-velocity setting. But that’s just not the case. Some modes of cognition are painfully slow, such as pondering the bewildering panoply of investment options available under your company’s retirement plan. But some other modes are instantaneous, such as speaking your native language, recognizing an old friend, or sensing when your life may be in danger.

None of this is news to anybody who studies cognitive psychology has followed advances in artificial intelligence, aka AI, over the past several decades. Different modes of cognition have different styles, speeds, and spheres of application.

When we speak of “cognitive computing,” we’re generally referring to the ability of automated systems to handle the conscious, critical, logical, attentive, reasoning mode of thought that humans engage in when they, say, play “Jeopardy!” or try to master some rigorous academic discipline. This is the “slow” cognition that Nobel-winning psychologist/economist Daniel Kahneman discussed in recent IBM Colloquium speech.

As anybody who has ever watched an expert at work will attest, this “slow” thinking can move at lightning speed when the master is in his or her element. When a subject-domain specialist is expounding on their field of study, they often move rapidly from one brilliant thought to the next. It’s almost as if these thought-gems automatically flash into their mind without conscious effort.

This is the cognitive agility that Kahneman examined in his speech. He described the ability of humans to build skills, which involves mastering “System 2″ cognition (slow, conscious, reasoning-driven) so that it becomes “System 1″ (fast, unconscious, action-driven). Not just that, but an expert is able to switch between both modes of thought within the moment when it becomes necessary to rationally ponder some new circumstance that doesn’t match the automated mental template they’ve developed. Kahneman describes System 2 “slow thinking” as well-suited for probability-savvy correlation thinking, whereas System 1 “fast thinking” is geared to deterministic causal thinking.

Kahneman’s “System 2″ cognition–slow, rule-centric, and attention-dependent–is well-suited for acceleration and automation on big data platforms such as IBM Watson. After all, a machine can process a huge knowledge corpus, myriad fixed rules, and complex statistical models far faster than any mortal. Just as important, a big-data platform doesn’t have the limited attention span of a human; consequently, it can handle many tasks concurrently without losing its train of thought.

Also, Kahneman’s “System 1″ cognition–fast, unconscious, action-driven–is not necessarily something we need to hand to computers alone. We can accelerate it by facilitating data-driven interactive visualization by human beings, at any level of expertise. When a big-data platform drives a self-service business intelligence application such as IBM Cognos, it can help users to accelerate their own “System 1″ thinking by enabling them to visualize meaningful patterns in a flash without having to build statistical models, do fancy programming, or indulge in any other “System 2″ thought.

And finally, based on those two insights, it’s clear to me that cognitive computing is not simply limited to the Watsons and other big-data platforms of the world. Any well-architected big data, advanced analytics, or business intelligence platform is essentially a cognitive-computing platform. To the extent it uses machines to accelerate the slow “System 2″ cognition and/or provides self-service visualization tools to help people speed up their wetware’s “System 1″ thinking, it’s a cognitive-computing platform.

Now I will expand upon the official IBM definition of “cognitive computing” to put it in a larger frame of reference. As far as I’m concerned, the core criterion of cognitive computing is whether the system, however architected, has the net effect of speeding up any form of cognition, executing on hardware and/or wetware.

Niaz: How is Big Data Analytics changing the nature of building great products? What do you think about the future of products?

James: That’s a great question that I haven’t explored too much extent. My sense is that more “products” are in fact “services”–such as online media, entertainment, and gaming–that, as an integral capability, feed on the Big Data generated by its users. Companies tune the designs, interaction models, and user experiences of these productized services through Big Data analytics. To the extent that users respond or don’t respond to particular features of these services, that will be revealed in the data and will trigger continuous adjustments in product/service design. New features might be added on a probationary basis, to see how users respond, and just as quickly withdraw or ramped up in importance.

This new product development/refinement loop is often referred to as “real-world experiments.” The process of continuous, iterative, incremental experimentation both generates and depends on a steady feed of Big Data. It also requires data scientists to play a key role in the product-refinement cycle, in partnership with traditional product designers and engineers.  Leading-edge organizations have begun to emphasize real-world experiments as a fundamental best practice within their data-science, next-best-action, and process-optimization initiatives.

Essentially, real-world experiments put the data-science “laboratory” at the heart of the big data economy.  Under this approach, fine-tuning of everything–business model, processes, products, and experiences–becomes a never-ending series of practical experiments. Data scientists evolve into an operational function, running their experiments–often known as “A/B tests”–24×7 with the full support and encouragement of senior business executives.

The beauty of real-world experiments is that you can continuously and surreptitiously test diverse product models inline to your running business. Your data scientists can compare results across differentially controlled scenarios in a systematic, scientific manner. They can use the results of these in-production experiments – such as improvements in response, acceptance, satisfaction, and defect rates on existing products/services–to determine which work best with various customers under various circumstances.

Niaz: What is a big data product? How can someone make beautiful stuff with data?

James: What is a Big Data product? It’s any product or service that helps people to extract deep value from advanced analytics and trustworthy data at all scales, but especially at the extreme scales of volume (petabytes and beyond), velocity (continuous, streaming, real-time, low-latency), and/or variety (structured, semi-structured, unstructured, streaming, etc.). That definition encompasses products that provide the underlying data storage, database management, algorithms, metadata, modeling, visualization, integration, governance, security, management, and other necessary features to address these use cases. If you track back to my answer above relevant to “hybrid” architectures you’ll see a discussion of some of the core technologies.

Making “beautiful stuff with data”? That suggests advanced visualization to call out the key insights in the data. The best data visualizations provide functional beauty: they make the process of sifting through data easier, more pleasant, and more productive for end users, business analysts, and data scientists.

Niaz: Can you please tell us about building Data Driven culture that posters data driven innovation to build next big product?

James: A key element of any data-driven culture is establishing a data science center of excellence. Data scientists are the core developers in this new era of Big Data, advanced analytics, and cognitive computing.

Game-changing analytics applications don’t spring spontaneously from bare earth. You must plant the seeds through continuing investments in applied data science and, of course, in the big data analytics platforms and tools that bring it all to fruition. But you’ll be tilling infertile soil if you don’t invest in sustaining a data science center of excellence within your company. Applied data science is all about putting the people who drill the data in constant touch with those who understand the applications. In spite of the mythology surrounding geniuses who produce brilliance in splendid isolation, smart people really do need each other. Mutual stimulation and support are critical to the creative process, and science, in any form, is a restlessly creative exercise.

In establishing a center of excellence, you may go the formal or informal route. The formal approach is to institute ongoing process for data-science collaboration, education, and information sharing. As such, the core function of your center of excellence might be to bridge heretofore siloed data-science disciplines that need to engage more effectively. The informal path is to encourage data scientists to engage with each other using whatever established collaboration tools, communities, and confabs your enterprise already has in place. This is the model under which centers of excellence coalesce organically from ongoing conversations.

Creeping polarization, like general apathy, will kill your data science center of excellence if you don’t watch out. Don’t let the center of excellence, formal or informal, degenerate into warring camps of analytics professionals trying to hardsell their pet approaches as the one true religion. Centers of excellence must serve as a bridge, not a barrier, for communication, collegiality, and productivity in applied data science.

Niaz: As you know leaders and managers have always been challenged to get the right information to make good decisions. Now with the digital revolution and technological advancement, they have opportunities to access huge amount of data. How this trend will change management practice? What do you think about the future of decision making, strategy and running organizations?

James: Business agility is paramount in a turbulent world.  Big Data is changing the way that management responds to–and gets ahead–of changes in their markets, competitive landscape, and operational conditions.

Increasingly, I prefer to think of big data in the broader context of business agility. What’s most important is that your data platform has the agility to operate cost-effectively at any scale, speed, and scope of business that your circumstances demand.

In terms of scale of business, organizations operate at every scale from breathtakingly global to intensely personal. You should be able to acquire a low-volume data platform and modularly scale it out to any storage, processing, memory and I/O capacity you may need in the future. Your platform should elastically scale up and down as requirements oscillate. Your end-to-end infrastructure should also be able to incorporate platforms of diverse scales—petabyte, terabyte, gigabyte, etc.—with those platforms specialized to particular functions and all of them interoperating in a common fabric.

Where speed is concerned, businesses often have to keep pace with stop-and-start rhythms that oscillate between lightning fast and painfully slow. You should be able to acquire a low-velocity data platform and modularly accelerate it through incorporation of faster software, faster processors, faster disks, faster cache and more DRAM as your need for speed grows. You should be able to integrate your data platform with a stream computing platform for true real-time ingest, processing and delivery. And your platform should also support concurrent processing of diverse latencies, from batch to streaming, within a common fabric.

And on the matter of scope, businesses manage almost every type of human need, interaction and institution. You should be able to acquire a low-variety data platform—perhaps a RDBMS dedicated to marketing—and be able to evolve it as needs emerge into a multifunctional system of record supporting all business functions. Your data platform should have the agility to enable speedy inclusion of a growing variety of data types from diverse sources. It should have the flexibility to handle structured and unstructured data, as well as events, images, video, audio and streaming media with equal agility. It should be able to process the full range of data management, analytics and content management workloads. It should serve the full scope of users, devices and downstream applications.

Agile Big Data platforms can serve as the common foundation for all of your data requirements. Because, after all, you shouldn’t have to go big, fast, or all-embracing in your data platforms until you’re good and ready.

Niaz: In your opinion, given the current available Big Data technologies, what is the most difficult challenge in filtering big data to find useful information?

James: The most difficult challenge is in figuring out which data to ignore, and which data is trustworthy enough to serve as a basis for downstream decision-support and advanced analytics.

Most important, don’t always trust the “customer sentiment” that you social-media listening tools as if it were gospel. Yes, you care deeply about how your customers regard your company, your products, and your quality of service. You may be listening to social media to track how your customers—collectively and individually—are voicing their feelings. But do you bother to save and scrutinize every last tweet, Facebook status update, and other social utterance from each of your customers? And if you are somehow storing and analyzing that data—which is highly unlikely—are you linking the relevant bits of stored sentiment data to each customer’s official record in your databases?

If you are, you may be the only organization on the face of the earth that makes the effort. Many organizations implement tight governance only on those official systems of record on which business operations critically depend, such as customers, finances, employees, products, and so forth. For those data domains, data management organizations that are optimally run have stewards with operational responsibility for data quality, master data management, and information lifecycle management.

However, for many big data sources that have emerged recently, such stewardship is neither standard practice nor should it be routine for many new subject-matter data domains. These new domains refer to mainly unstructured data that you may be processing in your Hadoop clusters, stream-computing environments, and other big data platforms, such as social, event, sensor, clickstream, geospatial, and so on.

The key difference from system-of-record data is that many of the new domains are disposable to varying degrees and are not regarded as a single version of the truth about some real-world entity. Instead, data scientists and machine learning algorithms typically distill the unstructured feeds for patterns and subsequently discard the acquired source data, which quickly become too voluminous to retain cost-effectively anyway. Consequently, you probably won’t need to apply much, if any, governance and security to many of the recent sources.

Where social data is concerned, there are several reasons for going easy on data quality and governance. First of all, data quality requirements stem from the need for an officially sanctioned single version of the truth. But any individual social media message constituting the truth of how any specific customer or prospect feels about you is highly implausible. After all, people prevaricate, mislead, and exaggerate in every possible social context, and not surprisingly they convey the same equivocation in their tweets and other social media remarks. If you imagine that the social streams you’re filtering are rich founts of only honest sentiment, you’re unfortunately mistaken.

Second, social sentiment data rarely has the definitive, authoritative quality of an attribute—name, address, phone number—that you would include in or link to a customer record. In other words, few customers declare their feelings about brands and products in the form of tweets or Facebook updates that represent their semiofficial opinion on the topic. Even when people are bluntly voicing their opinions, the clarity of their statements is often hedged by the limitations of most natural human language. Every one of us, no matter how well educated, speaks in sentences that are full of ambiguity, vagueness, situational context, sarcasm, elliptical speech, and other linguistic complexities that may obscure the full truth of what we’re trying to say. Even highly powerful computational linguistic algorithms are challenged when wrestling these and other peculiarities down to crisp semantics.

Third, even if every tweet was the gospel truth about how a customer is feeling and all customers were amazingly articulate on all occasions, the quality of social sentiment usually emerges from the aggregate. In other words, the quality of social data lies in the usefulness of the correlations, trends, and other patterns you derive from it. Although individual data points can be of marginal value in isolation, they can be quite useful when pieced into a larger puzzle.

Consequently, there is little incremental business value from scrutinizing, retaining, and otherwise managing every single piece of social media data that you acquire. Typically, data scientists drill into it to distill key patterns, trends, and root causes, and you would probably purge most of it once it has served its core tactical purpose. This process generally takes a fair amount of mining, slicing, and dicing. Many social-listening tools, including the IBM® Cognos® Consumer Insight application, are geared to assessing and visualizing the trends, outliers, and other patterns in social sentiment. You don’t need to retain every single thing that your customers put on social media to extract the core intelligence that you seek, as in the following questions: Do they like us? How intensely? Is their positive sentiment improving over time? In fact, doing so might be regarded as encroaching on privacy, so purging most of that data once you’ve gleaned the broader patterns is advised.

Fourth, even outright customer lies propagated through social media can be valuable intelligence if we vet and analyze each effectively. After all, it’s useful knowing whether people’s words—”we love your product”—match their intentions—”we have absolutely no plans to ever buy your product”—as revealed through their eventual behavior—for example, buying your competitor’s product instead.

If we stay hip to this quirk of human nature, we can apply the appropriate predictive weights to behavioral models that rely heavily on verbal evidence, such as tweets, logs of interactions with call-center agents, and responses to satisfaction surveys. I like to think of these weights as a truthiness metric, courtesy of Stephen Colbert.

What we can learn from social sentiment data of dubious quality is the situational contexts in which some customer segments are likely to be telling the truth about their deep intentions. We can also identify the channels in which they prefer to reveal those truths. This process helps determine which sources of customer sentiment data to prioritize and which to ignore in various application contexts.

Last but not least, apply only strong governance to data that has a material impact on how you engage with customers, remembering that social data rarely meets that criterion. Customer records contain the key that determines how you target pitches to them, how you bill them, where you ship their purchases, and so forth. For these purposes, the accuracy, currency, and completeness of customers’ names, addresses, billing information, and other profile data are far more important than what they tweeted about the salesclerk in your Poughkeepsie branch last Tuesday. If you screw up the customer records, the adverse consequences for all concerned are far worse than if you misconstrue their sentiment about your new product as slightly positive, when in fact it’s deeply negative.

However, if you greatly misinterpret an aggregated pattern of customer sentiment, the business risks can be considerable. Customers’ aggregate social data helps you compile a comprehensive portrait of the behavioral tendencies and predispositions of various population segments. This compilation is essential market research that helps gauge whether many high-stakes business initiatives are likely to succeed. For example, you don’t want to invest in an expensive promotional campaign if your target demographic isn’t likely to back up their half-hearted statement that your new product is “interesting” by whipping out their wallets at the point of sale.

The extent to which you can speak about the quality of social sentiment data all comes down to relevance. Sentiment data is good only if it is relevant to some business initiative, such as marketing campaign planning or brand monitoring. It is also useful only if it gives you an acceptable picture of how customers are feeling and how they might behave under various future scenarios. Relevance means having sufficient customer sentiment intelligence, in spite of underlying data quality issues, to support whatever business challenge confronts you.

Niaz: How do you see data science evolving in the near future?

James: In the near future, many business analysts will enroll in data science training curricula to beef up their statistical analysis and modeling skills in order to stay relevant in this new age.

However, they will confront a formidable learning curve. To be an effective, well-rounded data scientist, you will need a degree, or something substantially like it, to prove you’re committed to this career. You will need to submit yourself to a structured curriculum to certify you’ve spent the time, money and midnight oil necessary for mastering this demanding discipline.

Sure, there are run-of-the-mill degrees in data-science-related fields, and then there are uppercase, boldface, bragging-rights “DEGREES.” To some extent, it matters whether you get that old data-science sheepskin from a traditional university vs. an online school vs. a vendor-sponsored learning program. And it matters whether you only logged a year in the classroom vs. sacrificed a considerable portion of your life reaching for the golden ring of a Ph.D. And it certainly matters whether you simply skimmed the surface of old-school data science vs. pursued a deep specialization in a leading-edge advanced analytic discipline.

But what matters most to modern business isn’t that every data scientist has a big honking doctorate. What matters most is that a substantial body of personnel has a common grounding in core curriculum of skills, tools and approaches. Ideally, you want to build a team where diverse specialists with a shared foundation can collaborate productively.

Big data initiatives thrive if all data scientists have been trained and certified on a curriculum with the following foundation: paradigms and practices, algorithms and modeling, tools and platforms, and applications and outcomes.

Classroom instruction is important, but a data-science curriculum that is 100 percent devoted to reading books, taking tests and sitting through lectures is insufficient. Hands-on laboratory work is paramount for a truly well-rounded data scientist. Make sure that your data scientists acquire certifications and degrees that reflect them actually developing statistical models that use real data and address substantive business issues.

A business-oriented data-science curriculum should produce expert developers of statistical and predictive models. It should not degenerate into a program that produces analytics geeks with heads stuffed with theory but whose diplomas are only fit for hanging on the wall.

Niaz: We have already seen the huge implication and remarkable results of Big Data from tech giants. Do you think Big Data can also have great role in solving social problems? Can we measure and connect all of our big and important social problems and design the sustainable solutions with the help of Big Data?

James: Of course. Big Data is already being used worldwide to address the most pressing problems confronting humanity on this planet. In terms of “measuring and connecting all our big and important social problems and designing sustainable solutions,” that’s a matter for collective human ingenuity. Big Data is a tool, not panacea.

Niaz: Can you please tell us about ‘Open Source Analytics’ for Big Data? What are the initiatives regarding open source that IBM’s Big Data group and others group (startups) have done or are planning?

James: The principal open-source community in the big data analytics industry are Apache Hadoop and R. IBM is an avid participant in both communities, and has incorporated these technologies into our solution portfolio.

Niaz: What are some of the concerns (privacy, security, regulation) that you think can dampen the promise of Big Data?

James: You’ve named three of them. Overall, businesses should embrace the concept of “privacy by design” – a systematic approach that takes privacy into account from the start – instead of trying to add protection after the fact. In addition, the sheer complexity of the technology and the learning curve of the technologies are a barrier to realizing their full promise. All of these factors introduce time, cost, and risk into the Big Data ROI equation.

Niaz: What are the new technologies you are mostly passionate about? What are going to be the next big things?

James: Where to start? I prefer that your readers follow my IBM Big Data Hub blog to see the latest things I’m passionate about.

Niaz: Last but not least, what are you advices for Big Data startups and for the people those who are working with Big Data?

James: Find your niche in the Big Data analytics industry ecosystem, go deep, and deliver innovation. It’s a big, growing, exciting industry. Brace yourself for constant change. Be prepared to learn (and unlearn) something new every day.

Niaz: Dear James, thank you very much for your invaluable time and also for sharing us your incredible ideas, insights, knowledge and experiences. We are wishing you very good luck for all of your upcoming great endeavors.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. Brian Keegan on Big Data

5. danah boyd on Future of Technology and Social Media

6. Irving Wladawsky-Berger on Evolution of Technology and Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

8. James Allworth on Disruptive Innovation

Irving Wladawsky-Berger: Evolution of Technology and Innovation

Editor’s Note: Dr. Irving Wladawsky-Berger retired from IBM on May 31, 2007 after 37 years with the company. As Chairman Emeritus, IBM Academy of Technology, he continues to participate in a number of IBM’s technical strategy and innovation initiatives. He is also Visiting Professor of Engineering Systems at MIT, where he is involved in multi-disciplinary research and teaching activities focused on how information technologies are helping transform business organizations and the institutions of society. You can read his full bio from here.

eTalk’s Niaz Uddin has interviewed Irving Wladawsky-Berger recently to gain insights about the evolution of Technology and Innovation which is given below.

Niaz: Dear Irving, thank you so much for joining us.  We are thrilled and honored to have you for eTalks .

Irving Wladawsky-Berger: Niaz, thank you for having me.

Niaz: You began your career in IBM as a researcher in 1970. You have retired from IBM on May 31, 2007 as a Vice President of Technical Strategy and Innovation. From the dawn of Supercomputing to the rise of Linux and Open Source, the Internet, Cloud Computing, Disruptive Innovation, Big Data and Smarter Planet; you have been involved with it all.  You have worked for 37 years for bringing sustainable technological innovations for IBM. Can you please give us a brief of the evolution of technology and innovation? What do you think about the technological trend that has been changing since you have joined in IBM?

Irving Wladawsky-Berger: Well,It has been changed radically since the time I started in 1970 until now, let say, after 30 years. At the time in 1970, there were no personal computers and needless to say there was no internet. Computers were expensive and people were able to use them in a time sharing mode. Usually you would be needed a contract to be able to operate a computer and it was relatively expensive at that time. So most of the innovation and research had to be done in a kind of big science lab environment, whether it’s at a university like MIT or an R&D lab in IBM. Now all that began to change when personal computers emerged in the 1980s and especially in the next decade in 1990s, because personal computers became much more powerful and much less expensive. And then we had the internet. Remember the internet was only really blocking to the world in the mid 90s. And all of a sudden, it was much easier for lots of people to have access to the proper technologies and to start doing all kind of entrepreneurial innovations. Before that it was very expensive and then with the internet they were able to distribute their offerings online directly to their customers. Previously, they needed distributor channels and it did cost a lot of money. That has changed even more in just the last few years because of the advent of Cloud Computing. People started to do entrepreneurial business. They don’t even need to buy computer equipment anymore. They have a laptop or a smart phone that they use to get access in the cloud. As a result the cost of operating business is getting lower. This is particularly important for emerging economy like India, Africa or Latin America. Because they don’t have that much access to capital as we do here in the United States. So the availability of the internet, cloud computing and mobile devices etc. is going to have a huge impact for entrepreneurialism especially in emerging economy.

Niaz: So what has surprised you most about the rise and spread of the internet over the past 15 years?

Irving Wladawsky-Berger: Wellyouknowwhen I started, before the mid 90s, I was very involved with the Internet but as part of supercomputing before then the internet was primarily used in research lab and universities. And it all started to change with the advent of World Wide Web as well as Web Browser.  It made everything much more accessible. It was so easier to use. Before browsers, it was primarily interfaced that engineers had to learn to use. It wasn’t really available to the majority of people. The internet probably like other disruptive technologies; we knew it was exciting, we knew some good things could happen. But most of us couldn’t anticipate how transformative it would become. As an example, the fact that it would so much transform the media industry,  the music industry, newspapers, video streaming etc. On the other side, some of distinct people were predicting of the internet in the near term, like ‘it would totally transform the economy. You don’t need revenue and cash anymore’. That was wrong. So some of the predictions were just wrong, just like ‘you don’t need revenue and cash anymore’. Because if you are running a business you need revenue, cash and profit. Some of the predictions have been taking a lot longer than people thought in the early days because you needed broadband and things like that. And then other changes happened faster than any of us anticipated. In just an interesting experience, to watch how unpredictable disruptive technologies are.

Niaz: Now what do you think about the future of internet? What significant changes are going to occur in near future?

Irving Wladawsky-Berger: First of all, I think broadband will keep advancing. And that’s being one of the most important changes. When I started using internet in the mid 90s, it was 16kb over a dial modem. Then few years later, it only went to 64kb over dial modem and then broadband came in. And it is getting better and better and better. Now in some countries, as you know, like South Korea, is extremely fast. And I think in US we don’t have that good broadband yet. But it is good to see it continues to be better.  Broadband wireless has come along. And that is very nice. I think the rise of mobile devices like Smart phones in the last few years, has the most important ways of accessing internet. And it has been an absolute phenomenon. And absolute phenomenon.  When the internet first showed off in the mid 90s, we were very worried that the internet was growing you needed to be able to have a PC and in those days time PCs were not that much inexpensive. You needed an internet service provider. That was not inexpensive either. So there was a strong digital divide even with the advanced economy like USA. I remember having a number of important meetings, while I was working in Washington in those days on the digital divide. All that had disappeared as you know mobile devices are so inexpensive. Just about everybody can afford it now.  But not all mobile devices are smart phones yet capable of accessing the internet. And I believe within few years, just about everybody in the world will be able to access the information, resource and application. That is going to be gigantic.  Finally, internet, broadband, cloud computing and disruptive innovations are going to bring changes that will be the most important change over the next few decades.

Niaz: As you know, Big Data has become a hot topic of tech industry. What do you think about Big Data?

Irving Wladawsky-Berger: Big Data is very interesting. And what it means is that we now have access to huge amount of real time data that can be totally analyzed and interpreted to give deep insight. Now I am involved with a new initiative of New York University called Center for Urban Science and Progress. A lot of the promise is to gather lot of information about transportation, energy uses, health and lots of other real time information in the city and being able to use it effectively to better manage the city and to make it more efficient. So now, we have access to big amount of data. But being able to manage those data, being able to run experiments and being able to make sense of data, you need to model. You need a hypothesis that you embedded in a model. Then you test your model against your data to see your model is true or not. If your model is true then the prediction you are making is correct. And if your model is not true, the predictions you are making is incorrect. Like for an example, you can get lots of health care data. But for finding the meaning, using those data efficiently, you have to have a good model. So in my mind big data is very important but more important which I called Data Science. Data Science is the ability to write model to use the data and get inside from what the data is telling and then put it into practice. And the data science is very new even big data itself is very new.  I think that it shows tremendous promise but we now have to build the next layers of data science in the discipline and that will be done discipline by discipline.

Niaz: Over the past twenty years you have been involved in a number of initiatives dealing with disruptive innovations. What do you think about disruptive innovation?

Irving Wladawsky-Berger: I think that the work of Clayton Christensen has been really excellent. People knew that there were disruptive technologies that may change but until Clay wrote his book Innovators Dilemma and I think his next book ‘Innovators Solution’ is even better. I use these books in the graduate course at MIT. These are two excellent books on innovation. People didn’t understand for example why it is so tough to manage disruptive innovation? How is it different from the regular sustaining innovation or incrementing innovation? What do the companies should do with sustaining or incrementing innovation vs. disruptive innovation? And so he framed it in an excellent way to show the differences and to provide the guidelines for companies what they should do and that what they should watch out for. I think he wrote ‘Innovators Dilemma’ around 1990s. Now even today, the reality is, many companies don’t appreciate how difficult it is to truly embrace disruptive innovation. If you go and ask companies about disruptive innovation, they would say they are doing disruptive innovation. But in reality they are just working with incrementing innovation.  But to really be embarrassing disruptive, it’s till culturally very difficult for many companies.

Niaz: What is cloud computing? What are the ideas behind cloud computing?

Irving Wladawsky-Berger: There are many definitions of cloud computing. There is no one definition. I think the reason is that cloud computing is not any one thing. I think that it’s really a new model of computing where the internet is the platform for that computing model. If you look at the history of computing, in the first phase, we had the central computing model and the mainframes in the data center were the main platform of that model. That model lasted from the beginning of the computing industry until let say mid 80s. Then the client server model came.  And in the client server model, the PCs were the central platform of that model. Now cloud computing is a model and it’s totally organized around the internet and it’s totally organized to make it possible to access hardware resources, storage resources, middleware resources, application resources and services over the internet . So cloud computing, when you think about it, the actual computer is totally distributed over the internet in the cloud.  Finally cloud computing is the most interesting model of computing built totally around the internet.

Niaz: How much disruption does cloud computing represent when compared with the Internet?

Irving Wladawsky-Berger: I think cloud is the evolution of the internet. I think cloud computing is a massive disruption. And it is a very big disruptive part of the internet, because it’s totally changing the way people can get access to application and to information. Instead of having them in your PC or in the computers in your firm, you can now easily get whatever you want from the cloud. And you can get it in much standardize ways. So cloud makes it much easier and much less expensive for everybody whether you are a big company or whether you are a small or medium size company or whether you are an individual to get access to very sophisticated applications. And you don’t have to know everything. Remember in the PC days, if you bought an application, you got a disk, you had to load it, then there were new versions and you had to manage those versions by yourself. It was such an advance way over the previous worlds. Everybody was happy. But it was very difficult to use. Cloud as you know the whole world of apps. If you need apps, you can go to apps store. And an app store is basically a cloud store. So you can easily get whatever you need from the app store. When an app has a new release it will tell you. You don’t have to know everything. You have to do anything. It all being engineered and that is making IT capabilities available to many more companies and people. So it’s very disruptive.

Niaz: What do you think about the future of startups which are competing with giants like IBM, Google, Amazon, Facebook?

Irving Wladawsky-Berger: That’s the history of the industry. You know, in the 80s, people said how anybody competes with IBM as IBM is such a big and powerful company. And the few years later, IBM was almost died because client server computing came in and all these companies like Sun Microsystems, Microsoft, Compaq; they almost killed IBM. And locally for me who was there it didn’t die. Then in 90s, you could say, how can anybody compete with Microsoft after windows came up, it was so powerful, it was everything. Google was nothing at the beginning. And here we are now. Every few years we ask this question, here is the most powerful company of the world and what can possibly happen to them?  And you know sometimes nothing happens to them. And they continue being more powerful. Sometimes, in the case of IBM, they reinvent themselves. And they stay very relevant. They are just no longer the most advanced company in the world, they are an important company. But In 70s and 80s it was the leader in the computing industry. I think many people wouldn’t say about IBM now. For competing and surviving in any industry you have to have a very good business model. And for entrepreneurial innovation, coming up with a great business model is the hardest and core challenge.

Niaz: Can you please tell us something about the ways of asking BIG questions to challenge the tradition and come up with disruptive innovation?

Irving Wladawsky-Berger: Niaz, you are asking a very good question because asking big questions, coming with new business idea or business model is very difficult. I would say, in the old days, lot of the ideas came from laboratory if I talk about IT industry. Today, the core of innovation is in the market place. How can you come up with a great new application or a great new solution that will find a market that will find customers who want it. You have to be much focused. You have to have some good ideas. You have to study the market. You have to understand who are likely to be your customers. You have to know who your competitors are going to be. If those competitors are going to be big like Google, Microsoft, Facebook, you have to know, if you are starting a new company, what do you have unique over those companies. But I think that in general the inspiration or new ideas is a combination of creativity and market place. You have to look at the market place and have to be inspired by marketplace. Here are some great ideas you have and bring light. I think I couldn’t able to give good answer. You are asking like ‘Where the great business ideas come from’. It’s like asking movie directors or composers, where do you get your creativity. It’s a similar question. There is no good answer to that.

Niaz: Thank you Irving. I am wishing you very good luck for your good health and all future projects.

Irving Wladawsky-Berger: You are welcome. It was very nice talking to you. And good luck to you Niaz.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. Brian Keegan on Big Data

5. danah boyd on Future of Technology and Social Media

6. James Allworth on Disruptive Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing