Innovation

James Kobielus: Big Data, Cognitive Computing and Future of Product

Editor’s Note: As IBM’s Big Data Evangelist, James Kobielus is IBM Senior Program Director, Product Marketing, Big Data Analytics Solutions. He is an industry veteran, a popular speaker and social media participant, and a thought leader in Big Data, Hadoop, Enterprise Data Warehousing, Advanced Analytics, Business Intelligence, Data Management, and Next Best Action Technologies. He works with IBM’s product management and marketing teams in Big Data. He has spoken at such leading industry events as IBM Information On Demand, IBM Big Data Integration and Governance, Hadoop Summit, Strata, and Forrester Business Process Forum. He has published several business technology books and is a very popular provider of original commentary on blogs and many social media.

To learn more about his research, works, ideas, theories and knowledge, please check this this this this this this and this out.

eTalk’s Niaz Uddin has interviewed James Kobielus recently to gain insights about his ideas, research and works in the field of Big Data which is given below.

Niaz: Dear James, thank you so much for joining us in the midst of your busy schedule. We are very thrilled and honored to have you at eTalks.

James: And I’m thrilled and honored that you asked me.

Niaz: You are a leading expert on Big Data, as well as on such enabling technologies as enterprise data warehousing, advanced analytics, Hadoop, cloud services, database management systems, business process management, business intelligence, and complex-event processing. At the beginning of our interview can you please tell us about Big Data? How does Big Data make sense of the new world?

James: Big Data refers to approaches for extracting deep value from advanced analytics and trustworthy data at all scales. At the heart of advanced analytics is data mining, which is all about using statistical analysis to find non-obvious patterns (segmentations, correlations, trends, propensities, etc.) within historical data sets.

Some might refer to advanced analytics as tools for “making sense” of this data in ways that are beyond the scope of traditional reporting and visualization. As we aggregate and mine a wider variety of data sources, we can find far more “sense”–also known as “insights”–that previously lay under the surface. Likewise, as we accumulate a larger volume of historical data from these sources and incorporate a wider variety of variables from them into our models, we can build more powerful predictive models of what might happen under various future circumstances. And if we can refresh this data rapidly with high-velocity high-quality feeds, while iterating and refining our models more rapidly, we can ensure that our insights reflect the latest, greatest data and analytics available.

That’s the power of Big Data: achieve more data-driven insights (aka “making sense”) by enabling our decision support tools to leverage the “3 Vs”: a growing Volume of stored data, higher Velocity of data feeds, and broader Variety of data sources.

Niaz: As you know, Big Data has already started to redefine search, media, computing, social media, products, services and so on. Availability of Data helping us analyzing trend and doing interesting things in more accurate and efficient ways than before. What are some of the most interesting uses of big data out there today?

James: Where do I start? There are interesting uses of Big Data in most industries and in most business functions.

I think cognitive computing applications of Big Data are among the most transformative tools in modern business.

Cognitive computing is a term that probably goes over the head of most of the general public. IBM defines it as the ability of automated systems to learn and interact naturally with people to extend what either man or machine could do on their own, thereby helping human experts drill through big data rapidly to make better decisions.

One way I like to describe cognitive computing is as the engine behind “conversational optimization.” In this context, the “cognition” that drives the “conversation” is powered by big data, advanced analytics, machine learning and agile systems of engagement. Rather than rely on programs that predetermine every answer or action needed to perform a function or set of tasks, cognitive computing leverages artificial intelligence and machine learning algorithms that sense, predict, infer and, if they drive machine-to-human dialogues, converse.

Cognitive computing performance improves over time as systems build knowledge and learn a domain’s language and terminology, its processes and its preferred methods of interacting. This is why it’s such a powerful conversation optimizer. The best conversations are deep in give and take, questioning and answering, tackling topics of keenest interest to the conversants. When one or more parties has deep knowledge and can retrieve it instantaneously within the stream of the moment, the conversation quickly blossoms into a more perfect melding of minds. That’s why it has been deployed into applications in healthcare, banking, education and retail that build domain expertise and require human-friendly interaction models.

IBM Watson is one of the most famous exemplars of the power of cognitive computing driving agile human-machine conversations.  In its famous “Jeopardy!” appearance, Watson illustrated how its Deep Question and Answer technology—which is cognitive computing to the core—can revolutionize the sort of highly patterned “conversation” characteristic of a TV quiz show. By having its Deep Q&A results rendered (for the sake of that broadcast) in a synthesized human voice, Watson demonstrated how it could pass (and surpass) any Turing test that tried to tell whether it was a computer rather than, say, Ken Jennings. After all, the Turing test is conversational at its very core.

What’s powering Watson’s Deep Q&A technology is an architecture that supports an intelligent system of engagement. Such an architecture is able to mimic real human conversation, in which the dialogue spans a broad, open domain of subject matter; uses natural human language; is able to process complex language with a high degree of accuracy, precision and nuance; and operates with speed-of-thought fluidity.

Where the “Jeopardy!” conversational test was concerned (and where the other participants were humans literally at the top of that game), Watson was super-optimized. However, in the real-world of natural human conversation, the notion of “conversation optimization” might seem, at first glance, like a pointy-headed pipedream par excellence. However, you don’t have to be an academic sociologist to realize that society, cultures and situational contexts impose many expectations, constraints and other rules to which our conversations and actions must conform (or face disapproval, ostracism, or worse). Optimizing our conversations is critical to surviving and thriving in human society.

Wouldn’t it be great to have a Watson-like Deep Q&A adviser to help us understand the devastating faux pas to avoid and the right bon mot to drop into any conversation while we’re in the thick of it? That’s my personal dream and I’ll bet that before long, with mobile and social coming into everything, it will be quite feasible (no, this is not a product announcement—just the dream of one IBMer). But what excites me even more (and is definitely not a personal pipedream), is IBM Watson Engagement Advisor, which we unveiled earlier this year. It is a cognitive-computing assistant that revolutionizes what’s possible in multichannel B2C conversations. The  solution’s “Ask Watson” feature uses Deep Q&A to greet customers, conduct contextual conversations on diverse topics, and ensure that the overall engagement is rich with answers, guidance and assistance.

Cognitive/conversational computing is also applicable to “next best action,” which is one of today’s hottest new focus areas in intelligent systems. At its heart, next best action refers to an intelligent infrastructure that optimizes agile engagements across many customer-facing channels, including portal, call center, point of sales, e-mail and social. With cognitive-computing infrastructure the silent assistant, customers engage in a never-ending whirligig of conversations with humans and, increasingly, with automated bots, recommendation engines and other non-human components that, to varying degrees, mimic real-human conversation.

Niaz: So do you think machine learning is the right way to analyze Big Data?

James: Machine learning is an important approach for extracting fresh insights from unstructured data in an automated fashion, but it’s not the only approach. For example, machine learning doesn’t eliminate the need for data scientists to build segmentation, regression, propensity, and other models for data mining and predictive analytics.

Fundamentally, machine learning is a productivity tool for data scientists, helping them to get smarter, just as machine learning algorithms can’t get smarter without some ongoing training by data scientists. Machine learning allows data scientists to train a model on an example data set, and then leverage algorithms that automatically generalize and learn both from that example and from fresh feeds of data. To varying degrees, you’ll see the terms “unsupervised learning,” “deep learning,” “computational learning,” “cognitive computing,” “machine perception,” “pattern recognition,” and “artificial intelligence” used in this same general context.

Machine learning doesn’t mean that the resultant learning is always superior to what human analysts might have achieved through more manual knowledge-discovery techniques. But you don’t need to believe that machines can think better than or as well as humans to see the value of machine learning. We gladly offload many cognitive processes to automated systems where there just aren’t enough flesh-and-blood humans to exercise their highly evolved brains on various analytics tasks.

Niaz:What are the available technologies out there those help profoundly to analyze data? Can you please briefly tell us about Big Data technologies and their important uses?

James: Once again, it’s a matter of “where do I start?” The range of Big Data analytics technologies is wide and growing rapidly. We live in the golden age of database and analytics innovation. Their uses are everywhere: in every industry, every business function, and every business process, both back-office and customer-facing.

For starters, Big Data is much more than Hadoop. Another big data “H”—hybrid—is becoming dominant, and Hadoop is an important (but not all-encompassing) component of it. In the larger evolutionary perspective, big data is evolving into a hybridized paradigm under which Hadoop, massively parallel processing enterprise data warehouses, in-memory columnar, stream computing, NoSQL, document databases, and other approaches support extreme analytics in the cloud.

Hybrid architectures address the heterogeneous reality of big data environments and respond to the need to incorporate both established and new analytic database approaches into a common architecture. The fundamental principle of hybrid architectures is that each constituent big data platform is fit-for-purpose to the role for which it’s best suited. These big data deployment roles may include any or all of the following: data acquisition, collection, transformation, movement, cleansing, staging, sandboxing, modeling, governance, access, delivery, archiving, and interactive exploration. In any role, a fit-for-purpose big data platform often supports specific data sources, workloads, applications, and users.

Hybrid is the future of big data because users increasingly realize that no single type of analytic platform is always best for all requirements. Also, platform churn—plus the heterogeneity it usually produces—will make hybrid architectures more common in big data deployments.

Hybrid deployments are already widespread in many real-world big data deployments. The most typical are the three-tier—also called “hub-and-spoke”—architectures. These environments may have, for example, Hadoop (e.g., IBM InfoSphere BigInsights) in the data acquisition, collection, staging, preprocessing, and transformation layer; relational-based MPP EDWs (e.g., IBM PureData System for Analytics) in the hub/governance layer; and in-memory databases (e.g., IBM Cognos TM1) in the access and interaction layer.

The complexity of hybrid architectures depends on range of sources, workloads, and applications you’re trying to support. In the back-end staging tier, you might need different preprocessing clusters for each of the disparate sources: structured, semi-structured, and unstructured.

In the hub tier, you may need disparate clusters configured with different underlying data platforms—RDBMS, stream computing, HDFS, HBase, Cassandra, NoSQL, and so on—-and corresponding metadata, governance, and in-database execution components.

And in the front-end access tier, you might require various combinations of in-memory, columnar, OLAP, dimensionless, and other database technologies to deliver the requisite performance on diverse analytic applications, ranging from operational BI to advanced analytics and complex event processing.

Niaz: That’s really amazing. How to you connect these two dots: Big Data Analytics and Cognitive Computing? How does this connection make sense?

James: The relationship between Cognitive computing and Big Data is simple. Cognitive computing is an advanced analytic approach that helps humans drill through the unstructured data within Big Data repositories more rapidly in order to see correlations, patterns, and insights more rapidly.

Think of cognitive computing as a “speed-of-thought accelerator.” Speed of thought is something we like to imagine operates at a single high-velocity setting. But that’s just not the case. Some modes of cognition are painfully slow, such as pondering the bewildering panoply of investment options available under your company’s retirement plan. But some other modes are instantaneous, such as speaking your native language, recognizing an old friend, or sensing when your life may be in danger.

None of this is news to anybody who studies cognitive psychology has followed advances in artificial intelligence, aka AI, over the past several decades. Different modes of cognition have different styles, speeds, and spheres of application.

When we speak of “cognitive computing,” we’re generally referring to the ability of automated systems to handle the conscious, critical, logical, attentive, reasoning mode of thought that humans engage in when they, say, play “Jeopardy!” or try to master some rigorous academic discipline. This is the “slow” cognition that Nobel-winning psychologist/economist Daniel Kahneman discussed in recent IBM Colloquium speech.

As anybody who has ever watched an expert at work will attest, this “slow” thinking can move at lightning speed when the master is in his or her element. When a subject-domain specialist is expounding on their field of study, they often move rapidly from one brilliant thought to the next. It’s almost as if these thought-gems automatically flash into their mind without conscious effort.

This is the cognitive agility that Kahneman examined in his speech. He described the ability of humans to build skills, which involves mastering “System 2″ cognition (slow, conscious, reasoning-driven) so that it becomes “System 1″ (fast, unconscious, action-driven). Not just that, but an expert is able to switch between both modes of thought within the moment when it becomes necessary to rationally ponder some new circumstance that doesn’t match the automated mental template they’ve developed. Kahneman describes System 2 “slow thinking” as well-suited for probability-savvy correlation thinking, whereas System 1 “fast thinking” is geared to deterministic causal thinking.

Kahneman’s “System 2″ cognition–slow, rule-centric, and attention-dependent–is well-suited for acceleration and automation on big data platforms such as IBM Watson. After all, a machine can process a huge knowledge corpus, myriad fixed rules, and complex statistical models far faster than any mortal. Just as important, a big-data platform doesn’t have the limited attention span of a human; consequently, it can handle many tasks concurrently without losing its train of thought.

Also, Kahneman’s “System 1″ cognition–fast, unconscious, action-driven–is not necessarily something we need to hand to computers alone. We can accelerate it by facilitating data-driven interactive visualization by human beings, at any level of expertise. When a big-data platform drives a self-service business intelligence application such as IBM Cognos, it can help users to accelerate their own “System 1″ thinking by enabling them to visualize meaningful patterns in a flash without having to build statistical models, do fancy programming, or indulge in any other “System 2″ thought.

And finally, based on those two insights, it’s clear to me that cognitive computing is not simply limited to the Watsons and other big-data platforms of the world. Any well-architected big data, advanced analytics, or business intelligence platform is essentially a cognitive-computing platform. To the extent it uses machines to accelerate the slow “System 2″ cognition and/or provides self-service visualization tools to help people speed up their wetware’s “System 1″ thinking, it’s a cognitive-computing platform.

Now I will expand upon the official IBM definition of “cognitive computing” to put it in a larger frame of reference. As far as I’m concerned, the core criterion of cognitive computing is whether the system, however architected, has the net effect of speeding up any form of cognition, executing on hardware and/or wetware.

Niaz: How is Big Data Analytics changing the nature of building great products? What do you think about the future of products?

James: That’s a great question that I haven’t explored too much extent. My sense is that more “products” are in fact “services”–such as online media, entertainment, and gaming–that, as an integral capability, feed on the Big Data generated by its users. Companies tune the designs, interaction models, and user experiences of these productized services through Big Data analytics. To the extent that users respond or don’t respond to particular features of these services, that will be revealed in the data and will trigger continuous adjustments in product/service design. New features might be added on a probationary basis, to see how users respond, and just as quickly withdraw or ramped up in importance.

This new product development/refinement loop is often referred to as “real-world experiments.” The process of continuous, iterative, incremental experimentation both generates and depends on a steady feed of Big Data. It also requires data scientists to play a key role in the product-refinement cycle, in partnership with traditional product designers and engineers.  Leading-edge organizations have begun to emphasize real-world experiments as a fundamental best practice within their data-science, next-best-action, and process-optimization initiatives.

Essentially, real-world experiments put the data-science “laboratory” at the heart of the big data economy.  Under this approach, fine-tuning of everything–business model, processes, products, and experiences–becomes a never-ending series of practical experiments. Data scientists evolve into an operational function, running their experiments–often known as “A/B tests”–24×7 with the full support and encouragement of senior business executives.

The beauty of real-world experiments is that you can continuously and surreptitiously test diverse product models inline to your running business. Your data scientists can compare results across differentially controlled scenarios in a systematic, scientific manner. They can use the results of these in-production experiments – such as improvements in response, acceptance, satisfaction, and defect rates on existing products/services–to determine which work best with various customers under various circumstances.

Niaz: What is a big data product? How can someone make beautiful stuff with data?

James: What is a Big Data product? It’s any product or service that helps people to extract deep value from advanced analytics and trustworthy data at all scales, but especially at the extreme scales of volume (petabytes and beyond), velocity (continuous, streaming, real-time, low-latency), and/or variety (structured, semi-structured, unstructured, streaming, etc.). That definition encompasses products that provide the underlying data storage, database management, algorithms, metadata, modeling, visualization, integration, governance, security, management, and other necessary features to address these use cases. If you track back to my answer above relevant to “hybrid” architectures you’ll see a discussion of some of the core technologies.

Making “beautiful stuff with data”? That suggests advanced visualization to call out the key insights in the data. The best data visualizations provide functional beauty: they make the process of sifting through data easier, more pleasant, and more productive for end users, business analysts, and data scientists.

Niaz: Can you please tell us about building Data Driven culture that posters data driven innovation to build next big product?

James: A key element of any data-driven culture is establishing a data science center of excellence. Data scientists are the core developers in this new era of Big Data, advanced analytics, and cognitive computing.

Game-changing analytics applications don’t spring spontaneously from bare earth. You must plant the seeds through continuing investments in applied data science and, of course, in the big data analytics platforms and tools that bring it all to fruition. But you’ll be tilling infertile soil if you don’t invest in sustaining a data science center of excellence within your company. Applied data science is all about putting the people who drill the data in constant touch with those who understand the applications. In spite of the mythology surrounding geniuses who produce brilliance in splendid isolation, smart people really do need each other. Mutual stimulation and support are critical to the creative process, and science, in any form, is a restlessly creative exercise.

In establishing a center of excellence, you may go the formal or informal route. The formal approach is to institute ongoing process for data-science collaboration, education, and information sharing. As such, the core function of your center of excellence might be to bridge heretofore siloed data-science disciplines that need to engage more effectively. The informal path is to encourage data scientists to engage with each other using whatever established collaboration tools, communities, and confabs your enterprise already has in place. This is the model under which centers of excellence coalesce organically from ongoing conversations.

Creeping polarization, like general apathy, will kill your data science center of excellence if you don’t watch out. Don’t let the center of excellence, formal or informal, degenerate into warring camps of analytics professionals trying to hardsell their pet approaches as the one true religion. Centers of excellence must serve as a bridge, not a barrier, for communication, collegiality, and productivity in applied data science.

Niaz: As you know leaders and managers have always been challenged to get the right information to make good decisions. Now with the digital revolution and technological advancement, they have opportunities to access huge amount of data. How this trend will change management practice? What do you think about the future of decision making, strategy and running organizations?

James: Business agility is paramount in a turbulent world.  Big Data is changing the way that management responds to–and gets ahead–of changes in their markets, competitive landscape, and operational conditions.

Increasingly, I prefer to think of big data in the broader context of business agility. What’s most important is that your data platform has the agility to operate cost-effectively at any scale, speed, and scope of business that your circumstances demand.

In terms of scale of business, organizations operate at every scale from breathtakingly global to intensely personal. You should be able to acquire a low-volume data platform and modularly scale it out to any storage, processing, memory and I/O capacity you may need in the future. Your platform should elastically scale up and down as requirements oscillate. Your end-to-end infrastructure should also be able to incorporate platforms of diverse scales—petabyte, terabyte, gigabyte, etc.—with those platforms specialized to particular functions and all of them interoperating in a common fabric.

Where speed is concerned, businesses often have to keep pace with stop-and-start rhythms that oscillate between lightning fast and painfully slow. You should be able to acquire a low-velocity data platform and modularly accelerate it through incorporation of faster software, faster processors, faster disks, faster cache and more DRAM as your need for speed grows. You should be able to integrate your data platform with a stream computing platform for true real-time ingest, processing and delivery. And your platform should also support concurrent processing of diverse latencies, from batch to streaming, within a common fabric.

And on the matter of scope, businesses manage almost every type of human need, interaction and institution. You should be able to acquire a low-variety data platform—perhaps a RDBMS dedicated to marketing—and be able to evolve it as needs emerge into a multifunctional system of record supporting all business functions. Your data platform should have the agility to enable speedy inclusion of a growing variety of data types from diverse sources. It should have the flexibility to handle structured and unstructured data, as well as events, images, video, audio and streaming media with equal agility. It should be able to process the full range of data management, analytics and content management workloads. It should serve the full scope of users, devices and downstream applications.

Agile Big Data platforms can serve as the common foundation for all of your data requirements. Because, after all, you shouldn’t have to go big, fast, or all-embracing in your data platforms until you’re good and ready.

Niaz: In your opinion, given the current available Big Data technologies, what is the most difficult challenge in filtering big data to find useful information?

James: The most difficult challenge is in figuring out which data to ignore, and which data is trustworthy enough to serve as a basis for downstream decision-support and advanced analytics.

Most important, don’t always trust the “customer sentiment” that you social-media listening tools as if it were gospel. Yes, you care deeply about how your customers regard your company, your products, and your quality of service. You may be listening to social media to track how your customers—collectively and individually—are voicing their feelings. But do you bother to save and scrutinize every last tweet, Facebook status update, and other social utterance from each of your customers? And if you are somehow storing and analyzing that data—which is highly unlikely—are you linking the relevant bits of stored sentiment data to each customer’s official record in your databases?

If you are, you may be the only organization on the face of the earth that makes the effort. Many organizations implement tight governance only on those official systems of record on which business operations critically depend, such as customers, finances, employees, products, and so forth. For those data domains, data management organizations that are optimally run have stewards with operational responsibility for data quality, master data management, and information lifecycle management.

However, for many big data sources that have emerged recently, such stewardship is neither standard practice nor should it be routine for many new subject-matter data domains. These new domains refer to mainly unstructured data that you may be processing in your Hadoop clusters, stream-computing environments, and other big data platforms, such as social, event, sensor, clickstream, geospatial, and so on.

The key difference from system-of-record data is that many of the new domains are disposable to varying degrees and are not regarded as a single version of the truth about some real-world entity. Instead, data scientists and machine learning algorithms typically distill the unstructured feeds for patterns and subsequently discard the acquired source data, which quickly become too voluminous to retain cost-effectively anyway. Consequently, you probably won’t need to apply much, if any, governance and security to many of the recent sources.

Where social data is concerned, there are several reasons for going easy on data quality and governance. First of all, data quality requirements stem from the need for an officially sanctioned single version of the truth. But any individual social media message constituting the truth of how any specific customer or prospect feels about you is highly implausible. After all, people prevaricate, mislead, and exaggerate in every possible social context, and not surprisingly they convey the same equivocation in their tweets and other social media remarks. If you imagine that the social streams you’re filtering are rich founts of only honest sentiment, you’re unfortunately mistaken.

Second, social sentiment data rarely has the definitive, authoritative quality of an attribute—name, address, phone number—that you would include in or link to a customer record. In other words, few customers declare their feelings about brands and products in the form of tweets or Facebook updates that represent their semiofficial opinion on the topic. Even when people are bluntly voicing their opinions, the clarity of their statements is often hedged by the limitations of most natural human language. Every one of us, no matter how well educated, speaks in sentences that are full of ambiguity, vagueness, situational context, sarcasm, elliptical speech, and other linguistic complexities that may obscure the full truth of what we’re trying to say. Even highly powerful computational linguistic algorithms are challenged when wrestling these and other peculiarities down to crisp semantics.

Third, even if every tweet was the gospel truth about how a customer is feeling and all customers were amazingly articulate on all occasions, the quality of social sentiment usually emerges from the aggregate. In other words, the quality of social data lies in the usefulness of the correlations, trends, and other patterns you derive from it. Although individual data points can be of marginal value in isolation, they can be quite useful when pieced into a larger puzzle.

Consequently, there is little incremental business value from scrutinizing, retaining, and otherwise managing every single piece of social media data that you acquire. Typically, data scientists drill into it to distill key patterns, trends, and root causes, and you would probably purge most of it once it has served its core tactical purpose. This process generally takes a fair amount of mining, slicing, and dicing. Many social-listening tools, including the IBM® Cognos® Consumer Insight application, are geared to assessing and visualizing the trends, outliers, and other patterns in social sentiment. You don’t need to retain every single thing that your customers put on social media to extract the core intelligence that you seek, as in the following questions: Do they like us? How intensely? Is their positive sentiment improving over time? In fact, doing so might be regarded as encroaching on privacy, so purging most of that data once you’ve gleaned the broader patterns is advised.

Fourth, even outright customer lies propagated through social media can be valuable intelligence if we vet and analyze each effectively. After all, it’s useful knowing whether people’s words—”we love your product”—match their intentions—”we have absolutely no plans to ever buy your product”—as revealed through their eventual behavior—for example, buying your competitor’s product instead.

If we stay hip to this quirk of human nature, we can apply the appropriate predictive weights to behavioral models that rely heavily on verbal evidence, such as tweets, logs of interactions with call-center agents, and responses to satisfaction surveys. I like to think of these weights as a truthiness metric, courtesy of Stephen Colbert.

What we can learn from social sentiment data of dubious quality is the situational contexts in which some customer segments are likely to be telling the truth about their deep intentions. We can also identify the channels in which they prefer to reveal those truths. This process helps determine which sources of customer sentiment data to prioritize and which to ignore in various application contexts.

Last but not least, apply only strong governance to data that has a material impact on how you engage with customers, remembering that social data rarely meets that criterion. Customer records contain the key that determines how you target pitches to them, how you bill them, where you ship their purchases, and so forth. For these purposes, the accuracy, currency, and completeness of customers’ names, addresses, billing information, and other profile data are far more important than what they tweeted about the salesclerk in your Poughkeepsie branch last Tuesday. If you screw up the customer records, the adverse consequences for all concerned are far worse than if you misconstrue their sentiment about your new product as slightly positive, when in fact it’s deeply negative.

However, if you greatly misinterpret an aggregated pattern of customer sentiment, the business risks can be considerable. Customers’ aggregate social data helps you compile a comprehensive portrait of the behavioral tendencies and predispositions of various population segments. This compilation is essential market research that helps gauge whether many high-stakes business initiatives are likely to succeed. For example, you don’t want to invest in an expensive promotional campaign if your target demographic isn’t likely to back up their half-hearted statement that your new product is “interesting” by whipping out their wallets at the point of sale.

The extent to which you can speak about the quality of social sentiment data all comes down to relevance. Sentiment data is good only if it is relevant to some business initiative, such as marketing campaign planning or brand monitoring. It is also useful only if it gives you an acceptable picture of how customers are feeling and how they might behave under various future scenarios. Relevance means having sufficient customer sentiment intelligence, in spite of underlying data quality issues, to support whatever business challenge confronts you.

Niaz: How do you see data science evolving in the near future?

James: In the near future, many business analysts will enroll in data science training curricula to beef up their statistical analysis and modeling skills in order to stay relevant in this new age.

However, they will confront a formidable learning curve. To be an effective, well-rounded data scientist, you will need a degree, or something substantially like it, to prove you’re committed to this career. You will need to submit yourself to a structured curriculum to certify you’ve spent the time, money and midnight oil necessary for mastering this demanding discipline.

Sure, there are run-of-the-mill degrees in data-science-related fields, and then there are uppercase, boldface, bragging-rights “DEGREES.” To some extent, it matters whether you get that old data-science sheepskin from a traditional university vs. an online school vs. a vendor-sponsored learning program. And it matters whether you only logged a year in the classroom vs. sacrificed a considerable portion of your life reaching for the golden ring of a Ph.D. And it certainly matters whether you simply skimmed the surface of old-school data science vs. pursued a deep specialization in a leading-edge advanced analytic discipline.

But what matters most to modern business isn’t that every data scientist has a big honking doctorate. What matters most is that a substantial body of personnel has a common grounding in core curriculum of skills, tools and approaches. Ideally, you want to build a team where diverse specialists with a shared foundation can collaborate productively.

Big data initiatives thrive if all data scientists have been trained and certified on a curriculum with the following foundation: paradigms and practices, algorithms and modeling, tools and platforms, and applications and outcomes.

Classroom instruction is important, but a data-science curriculum that is 100 percent devoted to reading books, taking tests and sitting through lectures is insufficient. Hands-on laboratory work is paramount for a truly well-rounded data scientist. Make sure that your data scientists acquire certifications and degrees that reflect them actually developing statistical models that use real data and address substantive business issues.

A business-oriented data-science curriculum should produce expert developers of statistical and predictive models. It should not degenerate into a program that produces analytics geeks with heads stuffed with theory but whose diplomas are only fit for hanging on the wall.

Niaz: We have already seen the huge implication and remarkable results of Big Data from tech giants. Do you think Big Data can also have great role in solving social problems? Can we measure and connect all of our big and important social problems and design the sustainable solutions with the help of Big Data?

James: Of course. Big Data is already being used worldwide to address the most pressing problems confronting humanity on this planet. In terms of “measuring and connecting all our big and important social problems and designing sustainable solutions,” that’s a matter for collective human ingenuity. Big Data is a tool, not panacea.

Niaz: Can you please tell us about ‘Open Source Analytics’ for Big Data? What are the initiatives regarding open source that IBM’s Big Data group and others group (startups) have done or are planning?

James: The principal open-source community in the big data analytics industry are Apache Hadoop and R. IBM is an avid participant in both communities, and has incorporated these technologies into our solution portfolio.

Niaz: What are some of the concerns (privacy, security, regulation) that you think can dampen the promise of Big Data?

James: You’ve named three of them. Overall, businesses should embrace the concept of “privacy by design” – a systematic approach that takes privacy into account from the start – instead of trying to add protection after the fact. In addition, the sheer complexity of the technology and the learning curve of the technologies are a barrier to realizing their full promise. All of these factors introduce time, cost, and risk into the Big Data ROI equation.

Niaz: What are the new technologies you are mostly passionate about? What are going to be the next big things?

James: Where to start? I prefer that your readers follow my IBM Big Data Hub blog to see the latest things I’m passionate about.

Niaz: Last but not least, what are you advices for Big Data startups and for the people those who are working with Big Data?

James: Find your niche in the Big Data analytics industry ecosystem, go deep, and deliver innovation. It’s a big, growing, exciting industry. Brace yourself for constant change. Be prepared to learn (and unlearn) something new every day.

Niaz: Dear James, thank you very much for your invaluable time and also for sharing us your incredible ideas, insights, knowledge and experiences. We are wishing you very good luck for all of your upcoming great endeavors.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. Brian Keegan on Big Data

5. danah boyd on Future of Technology and Social Media

6. Irving Wladawsky-Berger on Evolution of Technology and Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

8. James Allworth on Disruptive Innovation

Naeem Zafar: Entrepreneurship for the Better World

Editor’s Note: Naeem Zafar is the president and CEO of Bitzer Mobile, a company that simplifies enterprise mobility. On November 15, 2013 Oracle announced it has acquired Bitzer Mobile. As a member of the faculty of the Haas Business School at the University of California Berkeley, he teaches Entrepreneurship and Innovation in the MBA program. He is the founder of Startup-Advisor, which focuses on educating and advising entrepreneurs on all aspects of starting and running a company. His entrepreneurial experience includes working directly with six startups, and he has extensive experience in mentoring and coaching founders and CEOs.

Mr. Zafar holds a Bachelor of Science degree in electrical engineering from Brown University (magna cum laude), Rhode Island, and a master’s degree in electrical engineering from the University of Minnesota. He is a charter member of TiE .He is also a charter member of OPEN where he serves as the Board member.

You can read his full bio from here, here and here.

eTalk’s Niaz Uddin has interviewed Naeem Zafar recently to gain his ideas and insights about StartUp, social business and entrepreneurship for better world which is given below.

Q: You’re a successful entrepreneur. As a member of faculty of the Haas Business School at UC Berkeley, you teach entrepreneurship and innovation in the MBA program. At the beginning of our interview can you please tell us what exactly is entrepreneurship?

A: Entrepreneurship is a state of mind. It is a way to look at a situation and see how could you make a profitable venture out of it. It is very innate. People, educated or not in urban or rural setting, are just as likely to spot an opportunity and drive it to commercialization.  The likelihood is there just as it is for a Silicon Valley hotshot startup guy. So it transcends all boundaries of education, race and gender. It is a state of mind.

Q: You believe that entrepreneurship can be a powerful tool to alleviate poverty and extremism of the world and social businesses can fill the gap where public institutions often fall short. Can you please tell us more about that?

A: If you think about the definition of a business….its objective is to maximize shareholder return. So the shareholder who invests in the company has an expectation that the management should do whatever it can to maximize return; that is perfectly fine. We have seen tremendous companies and innovation come out of that model. But if there were a concept of setting up a company with the sole purpose of not  maximizing shareholders return but to address its social ill….. that can work for alleviating poverty.

It can be something as simple as the city doing lousy job of collecting garbage. Let’s say the garbage is not being collected on time which is very unpleasant as we know. We can set up a company so that there speedy pick up and disposal of garbage. The purpose of that company is to address this social ill. It is not to maximize shareholder profit. Imagine setting up the company with that objective and shareholders putting in money. This company’s objective is for this social ill to be addressed and not to maximize profit. Now, it is still a for profit company. It still pays market wages and hires the best people to address the issue but it is not trying to maximize profit.

This model which can be very rewarding for the shareholders as it is a new way of looking at solving many of the problems which governments are not well suited to solve. That’s called social business. I think the concept is a powerful one. It’s put forward by the Nobel laureate, Muhammad Yunus, in his 3rd book and I think it is a tremendous way for communities to organize and address issues which plagued them without having to wait for government to show up.

Q: How do you connect these three dots: social entrepreneurship, alleviating poverty and making a better world?

A: If you look at my previous answer I just connected the three dots for you.  Making a better world is about alleviating poverty and giving people a chance to participate in economic growth and well-being. Social businesses and entrepreneurship is a way for them to have that opportunity.

In the country that I grew up in you look for government to give you a good job. However, the government is not well equipped to provide a job for everybody. On the other hand, the private sector is well positioned. As we have seen in US, the private sector produced even submarines, bombs and fighter jets. This was quite shocking to me when I came to this country.

The government’s job is not to produce goods. Its job is to set policies and systems so that companies and entrepreneurs can thrive.

Q: How did you find the idea for Bitzer Mobile? Can you please briefly tell us about Bitzer Mobile?

A: Bitzer Mobile’s technical founder, Ali Ahmed, was working as a software architect for large companies in insurance and oil verticals for many years. He continued to recognize that people were struggling to allow employees mobile access to data.

Ali was having to solve the problem for every company in a unique way. So the idea was, why not come up with the way so that the employees can easily and securely access corporate data and be productive from wherever they happen to be. And that gave birth to Bitzer.

Q: As far as I believe for changing the world, we need to find complex, interesting  and  big problems of the world and then have to build great organizations that will sustain in the long run to keep solving those problems as well as to keep contributing for the betterment of the mother earth. Can you please tell us how can we find interesting, complex and big problems of this world?

A: First of all, I don’t agree with your definition. It is not about solving big problems. It is about solving problems. Problems of all sizes. Sometimes all you have to do is look around you. There are problems in your community, where you live, where you work. Solve those problems. Big ideas come from people trying to solve small problems which turn into great movements. So looking for the great problems to solve is not the only way and may not be most efficient way to do it either.

Q: What are your suggestions on finding interesting ideas and bringing the ideas to life to solve?

A: Interesting ideas to solve come from deep domain knowledge. It’s very difficult for entrepreneurs when they are young to come up with ideas as they can be light weight. The average age of an entrepreneur in America is 37. This means that many people are older than 37 when they start their company. So only if you worked in the industry for 5-10 years you really understand what issues are, what the problems are, and then you can see how you can solve them. So my advice is:  look around you, work in some industry, learn the hard skills. Then you will see the problem and you will be well equipped to solve them. This is how you address this issue.

Q: What are your takes on finding the right business model and identifying early customers?

A: To find the right business model and early customers is simple. You should be able to answer these two fundamental questions: what problem are you solving and who has this problem.  If you cannot concisely answer these two questions you don’t have clarity in your head. I insist that people should talk to 5 to 10 actual users and buyers of whatever product they’re planning to buy and try to understand what their pain is. If you cannot clearly articulate what pain your customers have do not start the company. Then discuss with customers what you are planning to do and if this would be interested in it. If you cannot generate this early customer interest, do not start the company.

And stop worrying about confidentiality. People have other problems to solve in their lives. They are not running to copy your idea. It is the execution of your idea that is the hard part. By bouncing these ideas off suitable customers and users and consistently getting positive feedback, you may be in a position to start the company and then they likely will buy it. Everything else will clarify itself during the course of this process.

Q: Can you please tell us about the legal process of starting a company?

A: Legal process depends on in which country you are starting the company in, what the local regulations are.  My book which is a legal guide for entrepreneurs goes into fair amount of details: What is the process, what options you have in the United States. So read the book. It’s available at naeemzafar.com.

Q: As you’ve seen during Internet bubble, there were so many companies founded and were committed to change the world. But with the changes of time around 90% of them got obsolete. And we ended up having some great companies. Now, there are also so many startups working with cloud computing, big data, wearable technologies, space, robotics and so on. The data shows most of them will also get obsolete as the success rate of startups is very low. But there are always some common characteristics, values,  philosophies and ideas that  keep some startups alive and helps to sustain in the long run. You have profound experience of seeing all the trends as you have been advising companies and working with great entrepreneurs in Silicon Valley. What are your suggestions on building the next big organization?

A: Aspect of building the next big organization is about solving a big problem. It is easy to spot what are the problems that need to be solved. All the trends you mention have tremendous potential.

Big data and business analytics can pinpoint precisely if you put a restaurant in the corner of this street and that street. They tell you what will be your monthly sales when you put it in the corner of that street and that street. So, the way businesses will be making decision could be based on not intuition but actual data.

If you read the book or watch the movie called Moneyball, it is about applying statistics to baseball. It is about how a mediocre team became the number one team by using big data. And that is applicable to every single business. So look for a big idea around you and build a great team with high caliber people. If you can put together a right market with the right team, you can build a lasting company too.

Q: How do you think about hiring remarkable people and let them scope to work on achieving vision that will change our world for good?

A: I think it’s good idea to hire remarkable people. You should do that. It’s not easy to do that. Remember the good people  will follow somebody which they can respect and whose vision they share. If you don’t have the passion and vision yourself why would A people, A players, best players follow you. Best players want to follow someone that they believe in. If you have that you shall attract the right team. And yes, you will be able to do great things. So step up to the stage and stage could be yours.

Q:  Whenever we talk about changing the world, thing that always comes first is changing ourselves. After changing our own life, we can go and change our family, then our society and then our country and then we can have a mission of changing the world to make it a better place to live in. But changing the world is hard, complex, challenging and hurting. You have come a long way and have already left a body of works to make this  world a bit more special. Can you please tell us about what your life has thought you in this amazing journey?

A: What my life has taught me is that it’s not a sprint. It is a marathon. So you have to create your own brand. You have to be genuine and honest and people will follow you . If you have  a vision that attracts people, you will have easy time attracting them.

So my advice to myself and other people around me is that if you’re a genuine person and a truthful person and you have a strong vision and can articulate it, you will have people willing to follow you. Once you have people willing to follow you then there is no challenge you cannot take tackle, no matter how big it is.

You will be able to overcome it over time and there are plenty of problems to follow around the world. But be true to yourself and always look for the team who is willing to follow you.

Q:  Last but not least, can you please give some advice to entrepreneurs who are on the mission of changing the world?

A: Changing the world is important and changing the world sometimes happens. But that is not the goal to start with. It is too big goal. It is too audacious and maybe even too arrogant to have this goal.

Martin Luther King did not have the goal of changing the world. He was just trying to change some laws so that black people could have equal rights. When Steve Jobs was starting Apple he wanted to do a music iPod. He was not trying to change the world. So I’m a little bit suspicious of your question because changing the world has come up multiple times.  Forget about changing the world. Do something meaningful for the people around you and your community. If you’re lucky enough it will have a big impact.  So think more practical and try to make local change. Stop worrying about changing the world – that will come later if you’re so lucky.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Peter Klein on Entrepreneurship, Economics and Education

2. Derek Sivers on  Entrepreneurship, CD Baby and Wood Egg

3. F. M. Scherer on Industrial Economy, Digital Economy and Innovation

4. Diego Comin on Entrepreneurship, Technology and Global Economic Development

5. Stephen Walt on Global Development

6. Juliana Rotich on Social Entrepreneurial Innovation

Trond A. Undheim: Entrepreneurship and Social Change

Editor’s Note: Trond A. Undheim, Ph.D.,  has over fifteen years of multi sector experience in strategy, policy, communications, academia, and entrepreneurship. Currently, he is a Senior Lecturer at MIT Sloan School of Management. Formerly, he was a Director of Standards Strategy and Policy at Oracle Corporation, with wide responsibilities in long-term business development, strategy, public policy and standardization globally and in Europe. Trond is an executive, speaker, entrepreneur, author, traveler and blogger. You can read his full bio from here.

eTalk’s Niaz Uddin has interviewed Trond A. Undheim recently to gain insights about Entrepreneurship and Social Change which is given below.

Niaz: Dear Trond, thank you so much for your time in the midst of your busy schedule. We are honored to have you at eTalks. You teach Global Economics and Management as a Senior Lecturer at MIT Sloan School of Management. You are a leading expert on strategy, technology policy, entrepreneurship and the role of technology in society. At the beginning of our interview can you please tell us about entrepreneurship?

Trond: Entrepreneurship is to see, seize and share an opportunity to change something for the better in a lasting, institutional way, by creating a company, entity, program or initiative which provides services, generates products or makes concepts that can be traded or enjoyed by many. That was a mouthful, I guess: entrepreneurship is about embracing risk, change, and convincing people—this is sometimes hard.

Niaz: What is the significance of entrepreneurship in global economy?

Trond: As the trading of physical commodities gradually shrinks, entrepreneurship is about to become the only valuable commodity in the global economy. The reason is—it is all about flexibility. All sources of comparative advantage are temporary. The time window for innovation is arguably getting somewhat shorter every minute. This being said, entrepreneurship takes many forms. It is not just about startups, and the culture of entrepreneurship is different in each country. In my work with Global Entrepreneurship Lab (G-Lab), at MIT Sloan School of Management, I have found that even as emerging markets are at different stages of development and each have their own culture, the desire to innovate is the same among young entrepreneurs everywhere. All they want and need is to see good examples in front of them. Our student teams help out with getting quicker through the process, escalating change throughout society. But it starts one-on-one. It must build up. So, as significant as entrepreneurship might be, it is a slow force.

Niaz: How are technology, innovation and entrepreneurship integrated with each other? How can this integration be a help for the global economy?

Trond: There is entrepreneurship without technology but it is less effective. There is technology without entrepreneurship but it is futile and short lived. There is innovation wherever there are people connecting the dots between entrepreneurship and technology.  Without integrating the three, there will be no global economy, only elite pockets of internationalization.

Niaz: Do you think technology, innovation and entrepreneurship could be the solution to Poverty? How?

Trond: Despite new solar cooking devices, peer lending schemes, or cell phone empowered social movements, there is no single solution to poverty. For too long, technology has been thought of as a panacea that solves all problems, but we are far from it. Technology opens certain opportunities and forecloses others. Moreover, even though it initially may seem technology transforms opportunities for everyone, it usually, in the end favors the established elite or those who have resources to take the most advantage of it. This is the reason there are still problems everywhere we look around us, despite what many call ‘technological progress’, ‘information age’ or ‘globalization’.

We have increased the differences between people, and hence the opportunity both to succeed and to fail, spectacularly. Herein lies the challenge of integration; the globally economy theoretically connects things, but someone needs to establish those connections and re-establish connections when broken. Innovative initiatives that mobilize people, share information, gather knowledge, discuss best practices, or create marketplaces of ideas, products and services across boundaries of time, place, resources, and ability, will definitely contribute to the poverty issue in various ways. However, the issue is too complex for one strain of innovation to transform it all. Change needs to trickle down. Change needs to spread out. Change needs to bubble up. Poverty is clearly a multi-faceted problem that will fascinate, frustrate and motivate smart people, organizations and institutions to act for decades to come.

Niaz: Throughout history, high tech industries mostly belong to developed countries. As a result, under developed and developing countries alike have lagged behind. Can you please suggest us some ways to help those countries to come up with proper strategies to get involved with high tech industry to contribute to the global economy?

Trond: High tech industries are fostered by individual initiative, investors who are willing to take risks, and by a willingness to go to or even create markets where there yet are none. However, as small ecosystems of high tech entrepreneurship start forming even in countries that are not yet on the radar as emerging economies, each time, it gets easier. The challenge is to get enough launch momentum. Typically, what we see is that entrepreneurs, given such challenges, either are funded from outside the country by particularly risk prone or long perspective persons or institutions, or are a result of family money. Only in a few cases will angel investors emerge on their own, since they typically are former high tech entrepreneurs themselves. One strategy is for government incentives to stabilize and attract expats back to contribute. Another is to focus attention on particular locations around a strong university. A third is to build the products at home but use the born global concept to immediately try to act on the global market, or more realistically, one selected foreign market.

Niaz: You worked at Oracle Corporation as the director of standards strategy and policy, where you lead global business development, drove standardization, and influenced government policy in the EU. What do you think about the core challenges of entrepreneurs of third world countries have in order to come up with great ideas to build global technological business as well as to contribute in global economy?

Trond: The core challenge is to acquire the right set of skills and grasp the attention of funders and potential customers early enough, and before your money (and motivation) run out.  Moreover, another tough challenge is to convince the establishment that ideas matter, which means people around the entrepreneur—the first clients and investors must not just nod to existing power structures. They may need to be prepared to accept causing a bit of a stir. Entrepreneurship is a dangerous force to those not prepared to change or to those with vested interests to defend, such as established ways of doing things, monopoly markets, successful products, or healthy revenue streams that may be threatened by a new entrant, however small.

In terms of standardization, entrepreneurs should keep in mind that one thing is to have a novel idea, but a whole other thing is to be able to enact infrastructure change across a whole new market. To do that, you need to think in terms of standards, following standards, shaping standards, creating new standards that people will go along with. It is a negotiation game. You either join or try to create an ecosystem and then try to make it surround you and your customers. You cannot go it alone. Even Oracle learned that, early on, as that company was a startup facing the giant IBM. Oracle picked up the importance of having a database standard and built a great product around it. Look at where it is today. Larry Ellison can create a Japanese lake in California, own luxurious boats, and buy a Hawaiian island. Not a bad life to some. But, frankly, I think entrepreneurship is about much more than the money you create. It is about the relationships you build and the pride you get out of creating something new and at the same time something lasting.

Niaz: How to overcome those challenges?

Trond: I think the best way to overcome such challenges is to enlist team members who have experience from abroad. That way, you can bring change along with you. The other thing is to align with the forces for change within the country. You cannot turn everyone, but you actually only need to turn one-by-one. Every entrepreneur has heard this, and everyone knows what it means: be prepared not to take no for an answer. Beyond that, you need to find something that is actually doable. There are many good ideas out there but not all are doable. Doable for you, that is, in your situation. Make sure you have a good story. Storytelling can overcome most challenges. Even dictators, monopolists, and old money love a good story.

Niaz: You have also served as the national expert of e-government in the European Commission, where you created ePractice.eu, the world’s most successful best practice initiative in e-government, e-health, and e-inclusion. Can you please give as a brief of these terms: e-government, e-health, and e-inclusion?

Trond: E-government is when public services are reorganized and ideally improved or made cheaper or more convenient using ICT, although that is a tall order. E-health applies ICT to citizen/patient interaction, health-service providers, institution-to-institution transmission of data, or all of the above. E-inclusion aims at reducing gaps in ICT usage in order to improve economic performance, employment opportunities, quality of life, social participation, and cohesion.

Niaz: What is the response to the ePractice.eu initiative? What are the significant changes that have occurred because ePractice.eu?

Trond: ePractice.eu blends online and offline interaction on good practices in using ICT for services of public interest. It brings a varied set of around 100,000 stakeholders together, government policy makers, consultants, the ICT industry, NGOs etc. So far, it contains 1626 self –submitted cases from 35 countries around the world, For the EU, it has radically improved information and knowledge sharing. It has achieved significant momentum. Joining the community has tangible value, people attend workshops, contribute views, share, and learn. It is a true knowledge community, virtual and physical.

Niaz: What are the steps could be taken by the policy makers of third world country to get the maximum benefits of e-government, e-health, and e-inclusion?

Trond: As the UN e-government survey reveals each year, there are indeed gaps between nations’ internet readiness. This is unfortunate but something we all need to take into account. The issue is not just access to the internet, but what content is accessible once you are on the internet and which skills you have to make sure you can benefit and contribute. The challenge is multifaceted: education, training, specific skills, infrastructure, and content. Even the countries who have invested a lot of resources occasionally, some would say too often, get it wrong. This stuff is not simple. You need awareness across the supply and delivery chains.

Niaz: You have published your book ‘Leadership From Below’. Can you please give us a brief of ‘Leadership From Below’?

Trond: Leadership From Below, for me, is two things. A perspective on leadership: No need for a position in a hierarchy to have influence. A perspective on life: lead when you need.  There are many books out there right now tapping into the fact that the web seemingly has lowered barriers to lead. However, what I am saying is not that. There are still barriers. Technology is not really the point here, although it can help (and hurt). The point is to reconfigure the notion of what it actually means to lead. It simply has nothing to do with somebody giving you power from above (despite what those who elect the pope might think). True power can only emerge from below, from trusted relationships. Even God Almighty in Christendom was of the opinion that it was wiser to send his son Jesus to earth to convince people of the state of things than to simply tell them with a roar from above.  Even smart CEOs realize this. They know they are accountable to the Board, to shareholders, and to society at large (well, at least some CEOs think this way).

Leaders at all levels need to reflect upon what it takes to achieve real, lasting influence. Using force always has a cost. In fact, getting your way always has a cost, especially if it is recognized that you benefit from it. Instead, leaders need to embrace the somewhat slower, but surer process of involving peers in small-scale change efforts that have ripple effects across teams, organizations, and societies.

So, leadership from below is not simply a message to a new generation of leaders, or to small-scale leaders. It is the essence of true leadership. Leadership from below is not just a trend. In fact it is a stable feature of any society but it has recently become trendy. Oh, and one more thing, I did not write the book to say we should not accept any authority. My view is not anti-hierarchy, but a-hierarchical, or beyond hierarchies. I say: Follow when you can. Lead when you need.  Finally, since I wrote the book back in 2002, I have reflected a bit more and taken in some criticism, too. As it turns out, hierarchy remains systemic part of society. The reason is complexity. Things are getting complicated out there. The other is delegation. People love to delegate. Once you delegate, you give up power.

Niaz: What is the set of advice you would like to leave behind for technology geeks, innovators and entrepreneurs?

Trond: I wanted to leave a little piece of advice from my research on strategy failures in high tech entrepreneurship. First of all, it seems too few of us are willing to take a serious look at negative outcomes. This is unfortunate because there is a lot of learning to be had. But since those stories are often buried (although I am about to uncover some), every time you hear of a success story, try to find out what challenges have been overcome to get there. You will soon find that it is often those who have overcome the greatest challenges who succeed in the long term. Why, well, because they have also learned resilience.

If you want to learn more about this, follow my research on strategic outcomes in Cleantech firms. Essentially, we know that a lot of cleantech companies have failed over the last decade. There are many reasons why, but for the benefit of humanity, we need to ensure that some succeed and clean up our planet before it is too late. This is my agenda. It turns out both governments, multinationals, VCs, and entrepreneurs are interested in my work. We should indeed learn more from failure and we should talk about it. There is no shame in failing as long as you can reflect around how to do things different next time, or tell others about the perils of the unforeseeable unforeseen.

Niaz: Thank you so much for sharing us your ideas. I am wishing you good luck for all of your endeavors.

Trond: You are very welcome. It was a pleasure to speak with you, Niaz, and best of luck in your exciting entrepreneurial endeavor, eTalks. What a great concept: asking a set of great questions to people and change agents across the globe over email and letting them answer these questions on their own time without the pressure of a word limit or timeline. This is perhaps one of the keys to the future of communication: letting people speak. Sounds simple but it rarely happens.

_  _  _  _  ___  _  _  _  _

Further Reading:

01. Philip Kotler on Marketing for Better World

02. Hugh Mac­Leod on Creativity and Art

03. Daniel Pink on To Sell is Human

04. Naeem Zafar on Entrepreneurship for the Better World

05. Derek Sivers on  Entrepreneurship, CD Baby and Wood Egg

06. Jeff Haden on Pursuing Excellence

07. Rita McGrath on Strategy in Volatile and Uncertain Environments

08. Gautam Mukunda on Leadership

09. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

Brian Keegan: Big Data

Editor’s Note: Brian Keegan is a post-doctoral research fellow in Computational Social Science with David Lazer at Northeastern University. He defended his Ph.D. in the Media, Technology, and Society program at Northwestern University’s School of Communication.  He also attended the Massachusetts Institute of Technology and received bachelors degrees in Mechanical Engineering and Science, Technology, and Society in 2006.

His research employs a variety of large-scale behavioral data sets such as Wikipedia article revision histories, massively-multiplayer online game behavioral logs, and user interactions in a crowd-sourced T-shirt design community. He uses methods in network analysis, multilevel statistics, simulation, and content analysis. To learn more about him, please visit his official website Brianckeegan.com.

eTalk’s Niaz Uddin has interviewed Brian Keegan recently to gain his ideas and insights about Big Data, Data Science and Analytics which is given below.

Niaz: Brian we are really excited to have you to talk about Big Data. Let start from the beginning. How do you define Big Data?

Brian: Thank you Niaz for having me. Well, a common joke in the community is that “big data” is anything that makes Excel crash. That’s neither fair to Microsoft because the dirty secret of data science is that you can get pretty far using Excel nor is it fair to researchers whose data could hypothetically fit in Excel, but are so complicated that it would make no sense to try in the first place.

Big data is distinct from traditional industry and academic approaches to data analysis because of what are called the three Vs: volume, variety, velocity.

      • Volume is what we think of immediately – server farms full of terabytes of user data waiting to be analyzed. This data doesn’t fit into a single machine’s memory, hard drive, or even a traditional database. The size of the data makes analyzing with traditional tools really hard which is why new tools are being created.
      • Second, there’s variety that reflects the fact that data aren’t just lists of numbers, but include complex social relationships, collections of text documents, and sensors. The scope of the data means that all these different kinds of data have different structures, granularity, and errors which need to be cleaned and integrated before you can start to look for relationships among them. Cleaning data is fundamentally unsexy and grueling work, but if you put garbage into a model, all you get garbage back out. Making sure all these diverse kinds of data are playing well with each other and the models you run on them is crucial.
      • Finally, there’s velocity that reflects the fact that data are not only being created in real-time, but people want to act on the incoming information in real time as well. This means the analysis also has to happen in real time which is quite different than the old days where a bunch of scientists could sit around for weeks testing different kinds of models on data collected months or years ago before writing a paper or report that takes still more months before its published. APIs, dashboards, and alerts are part of big data because they make data available fast.

Niaz: Can you please provide us some examples?

Brian: Data that is big is definitely not new. The US Census two centuries ago still required collecting and analyzing millions of data points collected by hand. Librarians and archivists have always struggled with how to organize, find, and share information on millions of physical documents like books and journals. Physicists have been grappling with big data for decades where the data is literally astronomical. Biologists sequencing the genome needed ways to manipulate and compare data involving billions of base pairs.

While “data that was big” existed before computers, the availability of cheap computation has accelerated and expanded our ability to collect, process, and analyze data that is big. So while we now think of things like tweets or financial transactions as “big data” because these industries have rushed to adopt or are completely dependent upon computation, it’s important to keep in mind that lots of big data exist outside of social media, finance, and e-commerce and that’s where a lot of opportunities and challenges still exist.

Niaz: What are some of the possible use cases for big data analytic? What are the major companies producing gigantic amount of Data?

Brian: Most people think of internet companies like Google, Facebook, Twitter, LinkedIn, FourSquare, Netflix, Amazon, Yelp, Wikipedia, and OkCupid when they think of big data. These companies are definitely the pioneers of coming up with the algorithms, platforms, and other tools like PageRank, map-reduce, user-generated content, recommender systems that require combining millions of data points to provide fast and relevant content.

    • Companies like Crimson Hexagon mine Twitter and other social media streams for their clients to detect patterns of novel phrases or changes in the the sentiment associated with keywords and products. This can let their clients know if people are having problems with a product or if a new show is generating a lot of buzz despite mediocre ratings.
    • The financial industry uses big data not only for high-frequency trading based on combining signals from across the market, but also evaluating credit risks of customers by combining various data sets. Retailers like Target and WalMart have large analytics teams that examine consumer transactions for behavioral patterns so they know what products to feature. Telecommunications companies like AT&T or Verizon collect call data records produced by every cell phone on their networks that lets them know your location over time so they can improve coverage. Industrial companies like GE and Boeing put more and more sensors into their products so that they can monitor performance and anticipate maintenance.
    • Finally, one of the largest producers and consumers of big data is the government. Law enforcement agencies publish data about crime and intelligence agencies monitor communication data from suspects. The Bureau of Labor Statistics, Federal Reserve, and World Bank collect and publish extremely rich and useful economic time series data. Meteorologists collect and analyze large amounts of data to make weather forecasts.

Niaz: Why has big data become so important now?

Brian: Whether it was business, politics, or military, decisions were (and continue to be) made under uncertainty about history or context because getting timely and relevant data was basically impossible. Directors didn’t know what customers were saying about their product, politicians didn’t know the issues constituents were talking about, and officers faced a fog of war. Ways of getting data were often slow and/or suspect: for example, broadcast stations used to price advertising time by paying a few dozen people in a city to keep journals of what stations they remember hearing every day. Looking back now, this seems like an insane way not only collect data but also make decisions based on obviously unreliable data, but it’s how things were done for decades because there was no better way of measuring what people were doing. The behavioral traces we leave in tweets and receipts are not only much finer-grained and reliable, but also encompass a much larger and more representative sample of people and their behaviors.

Data lets decision makers know and respond to what the world really looks like instead of going on their gut. More data usually gives a more accurate view, but too much data can also overwhelm and wash out the signal with noise. The job of data scientists less trying to find a single needle in a haystack and more like collecting as much hay as possible to be sure there’s a few needles in there before sorting through the much bigger haystack. In other words, data might be collected for one goal, but it can also be repurposed for other goals and follow-on questions that come along to provide new insights. More powerful computers, algorithms, and platforms make assembling and sorting through these big haystacks much easier than before.

Niaz: Recently I have seen IBM has started to work with Big Data. What roles do companies like IBM play in this area?

Brian: IBM is just one of many companies that are racing “upstream” to analyze data on larger and more complex systems like an entire city by aggregating tweets, traffic, surveillance cameras, electricity consumption, emergency services which feed into each other. IBM is an example of an organization that has shifted from providing value from transforming raw materials into products like computers to transforming raw data into unexpected insights about how a system works — or doesn’t. The secret sauce is collecting existing data, building new data collection systems, and developing statistical models and platforms that are able to work in the big data domain of volume, variety, and velocity that traditional academic training doesn’t equip people.

Niaz: What are the benefits of Big Data to Business? How it is influencing innovation and business?

Brian: Consider the market capitalization of three major tech companies on a per capita basis: Microsoft makes software and hardware as well as running web services like Bing based on big data and is worth about $2.5 million per employee, Google mostly makes software and runs web services and is worth about $4.6 million per employee, and Facebook effectively just runs a web service of its social network site and is worth about $19 million per employee. These numbers may outliers or unreliable for a variety of reasons, but the trend suggests that organizations like Facebook focused solely on data produce more value per employee.

This obviously isn’t a prescription for every company — ExxonMobil, WalMart, GE, and Berkshire produce value in fundamentally different ways. But Facebook did find a way to capture and analyze data about the world — our social relationships and preferences — that was previously hidden. There are other processes happening beyond the world of social media that currently go uncaptured, but the advent of new sensors and opportunities for collecting data that will become ripe for the picking. Mobile phones in developing countries will reveal patterns of human mobility that could transform finance, transportation, and health care. RFIDs on groceries and other products could reveal patterns transportation and consumption that could reduce wasted food while opening new markets. Smart meters and grids could turn the tide against global climate change while lowering energy costs. Politicians could be made more accountable and responsive through crowd sourced fundraising and analysis of regulatory disclosures. The list of data out there waiting to be collected and analyzed boggles the mind.

Niaz: How do you define a Data Scientist? What are your suggestions you have for those who want to become a data scientist?

Brian: A data scientist needs familiarity with a wide set of skills, so much so that it’s impossible for them to be expert in all of them.

      • First, data scientists need the computational skills from learning a programming language like Python or Java so that they can acquire, cleanup, and manipulate data from databases and APIs, hack together different programs developed by people who are far more expert in network analysis or natural language processing, and use difficult tools like MySQL and Hadoop. There’s no point-and-click program out there with polished tutorials that does everything you’ll need from end-to-end. Data scientists spend a lot of time writing code, working at the command line, and reading technical documentation but there are tons of great resources like StackOverflow, GitHub, free online classes, and active and friendly developer communities where people are happy to share code and solutions.
      • Second, data scientists need statistical skills at both a theoretical and methodological level. This is the hardest part and favors people who have backgrounds in math and statistics, computer and information sciences, physical sciences and engineering, or quantitative social sciences. Theoretically, they need to know why some kinds of analyses should be run on some kinds but not other kinds of data and what the limitations of one kind of model are compared to others. Methodologically, data scientists need to actually be able to run these analyses using statistical software like R, interpret the output of the analyses, and do the statistical diagnostics to make sure all the assumptions that are baked into a model are actually behaving properly.
      • Third, data scientists need some information visualization and design skills so they can communicate their findings in an effective way with charts or interactive web pages for exploration. This means learning to use packages like ggplot in R or matplotlib in Python for statistical distributions, d3 in Javascript for interactive web visualizations, or Gephi for network visualizations.

All of the packages I mentioned are open-source which also reflects the culture in the data science community; expensive licenses for software or services are very suspect because others should be able to easily replicate and build upon your analysis and findings.

Niaz: Finally, what do you think about the impact of Big Data in our everyday life?

Brian: Big Data is a dual-use technology that can satisfy multiple goals, some of which may be valuable and others which may be unsavory. On one hand it can help entrepreneurs be more nimble and open new markets or researchers make new insights about how the world works, on the other hand, the Arab Spring suggested it can also reinforce the power of repressive regimes to monitor dissidents or unsavory organizations to do invasive personalized marketing.

Danah Boyd and Kate Crawford have argued persuasively about how the various possibilities of big data to address societal ills or undermine social structure obscure the very real but subtle changes that are happening right now that replace existing theory and knowledge, cloak subjectivity with quantitative objectivity, confuse bigger data with better data, separate data from context and meaning, raise real ethical questions, and create or reinforce inequalities.

Big data also raises complicated questions about who has access to data. On one hand, privacy is a paramount concern as organizations shouldn’t be collecting or sharing data about individuals without their consent. On the other hand, there’s also the expectation that data should be shared with other researchers so they can validate findings. Furthermore, data should be preserved and archived so that it is not lost to future researchers who want to compare or study changes over time.

Niaz: Brian, Thank you so much for giving me time in the midst of your busy schedule. It is really great to know the details of Big Data from you. I am wishing you good luck with your study, research, projects and works.

Brian: You are welcome. Good luck to you too.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. James Kobielus on Big Data, Cognitive Computing and Future of Product

5. danah boyd on Future of Technology and Social Media

6. Irving Wladawsky-Berger on Evolution of Technology and Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

8. James Allworth on Disruptive Innovation