Big Data

Aubrey de Grey: Aging and Overcoming Death

Editor’s Note: Dr. Aubrey de Grey is a true maverick. He challenges the most basic assumption underlying the human condition – that aging is inevitable. He argues instead that aging is a disease – one that can be cured if it’s approached as “an engineering problem.”

He is a biomedical gerontologist based in Cambridge, UK, and is the Chief Science Officer of SENS Foundation, a non-profit charity dedicated to combating the aging process. He is also Editor-in-Chief of Rejuvenation Research, the world’s only peer-reviewed journal focused on intervention in aging. His research interest encompass the causes of all cellular side-effects of metabolism (“damage”) that constitute mammalian aging and the design of interventions to repair and /or obviate that damage. You can read his full bio from here here here and here.

eTalk’s Niaz Uddin has interviewed Aubrey de Grey recently to gain insights about his ideas, research and works in the field of aging which is given below.

Niaz: Dear Aubrey, I know you are a very busy man and I really appreciate you for taking time out of your schedule to join me. We are very thrilled and honored to have you at eTalks.

Aubrey: My pleasure.

Niaz: At the beginning of our interview, could you please say a few words about your background and the positions that you hold today?

Aubrey: I was initially trained as a computer scientist, but I switched to the biology of aging at around 30 when I discovered, to my astonishment, that very few researchers were really working on doing anything about aging. Currently I’m the Chief Science Officer of SENS Research Foundation, a California-based biomedical research charity focused on developing the strategy for defeating aging that I proposed back in 2000.

Niaz: That’s really interesting. What did first attract you to the idea of physical immortality?

Aubrey: First, let’s be totally clear that I don’t work on “immortality”, or any variations on that theme. I work on health: I want to let people stay fully healthy, i.e. functioning both physically and mentally as well as a young adult, at any age. Once this is achieved, it is very likely that there will be a dramatic side-benefit in terms of how long people live – but that’s what it is, a side-benefit. I do not work on longevity for longevity’s sake. So, to answer what your question should have been: what attracted me to the crusade to bring aging under medical control was simply that it was obviously humanity’s worst problem but hardly anyone was working on it.

Niaz: What’s so wrong with getting old? Is getting old the biggest health crisis facing the world?

Aubrey: The way you phrase the question incorporates most of the answer. Most people have a totally distorted idea of what aging is: they think of it as distinct from the diseases of old age, and as something natural and inevitable, like the passage of time. So “getting old” is used pretty much interchangeably as either getting chronologically old or getting frail. WTF?! We don’t ask what’s so wrong with getting Alzheimer’s, so it makes no sense to ask what’s so wrong with going downhill in all ways.

Niaz: You’re a true maverick and you challenge the most basic assumption underlying the human condition — that aging is inevitable. You argue instead that aging is a disease – one that can be cured if it’s approached as “an engineering problem.” Before we focus on your efforts to understand the aging process, perhaps we should first say a few words about aging itself. Why do organisms age, and die?

Aubrey: Aging is far less mysterious than most people assume. In its essence, aging of a living organism is no different than aging of a simple, man-made machine – which should be no surprise, since after all the body is a machine (whatever one’s view may be as regards any non-physical elements that combine with the body to form the human being). Thus, it’s totally reasonable – I would say obvious, but apparently it isn’t obvious to everyone – to look at how we already succeed in extending the healthy longevity of cars or aeroplanes waaay beyond how long they were designed to last, and apply the same principles to human aging. And those principles come down, in a nutshell to just one idea: preventative maintenance, i.e. repairing pre-symptomatic damage before symptoms emerge.

Niaz: So does the process of aging serve some evolutionary purpose — and if it does, will we run into trouble if we attempt to counteract it?

Aubrey: It does not. From the 1880s or so until the 1950s it was thought that aging helped species to be more nimble in responding evolutionarily to changing environments, but then Medawar pointed out that mortality from causes that aren’t related to age is so high in the wild that there are too few frail individuals to drive natural selection for aging even if in principle it would be a good thing for the species. Medawar’s observation was somewhat over simplistic, but today almost all gerontologists agree that his basic idea was correct and that there are no “genes for aging”.

Niaz: You are the Chief Science Officer of the SENS Foundation. What that acronym stands for and what the organization does? 

Aubrey: Strategies for Engineered Negligible Senescence, but I know that’s a bit of a mouthful. We do biomedical research to develop regenerative medicine against aging, i.e rejuvenation biotechnology that will restore people’s physical and mental function (and appearance, yes!) to that of a young adult.

Niaz: What’s been the most striking piece of data to support your hypotheses?

Aubrey: That’s not really the right question: I don’t have a “hypothesis”. What I have is a technological plan – a proposal for how to manipulate an aspect of nature – whereas hypotheses are conjectures about how nature works in the absence of manipulation. The reason I need to make this ostensibly nit-picking distinction is that pioneering technology does not proceed by the accumulation of data: rather, it consists of a leap of faith that putting established technologies together will deliver more than the sum of the parts. So we (and others) have certainly been making great progress in developing the component technologies that will in due course combine to defeat aging, but calling those advances “support for a hypothesis” is a misuse of terms.

Niaz: As you know, now we are living in an exciting era of bioinformatics and big data. What do you think about the role of bioinformatics and big data in this field?

Aubrey: The relevance of big data to biomedical gerontology is pretty much the same as throughout biology. It speeds up a huge variety of bench experiments, but it doesn’t derive many big ideas itself.

Niaz: Some people regard aging research, and efforts to extend lifespan, with suspicion. Why do you think that is? What is your response to those concerns?

Aubrey: It’s embodied in your question: people who recoil at this work do so because they regard “aging research” and “efforts to extend lifespan” as synonymous, when in fact “aging research” and “efforts to eliminate age-related disease and disability” are synonymous. The tragedy is that this misconception is so entrenched: even though gerontoogists have been correcting this error since decades before I came along, but no one wants to hear it, probably mainly because they don’t want to get their hopes up. I think this is finally changing now, but I’m not slowin down my advocacy efforts.

Niaz: You regard cancer as the greatest potential threat to your longevity program, but couldn’t mutant viruses represent an even greater threat?

Aubrey: Viruses are a huge issue, but they are small (they don’t have many genes), whereas cancer has the entire human genome at its mutational disposal. Pandemics are a problem mainly because we aren’t putting enough money into vaccine development: if we can just get our priorities right, the chances of any pandemic really taking off are infinitesimal.

Niaz: What are the other key problems in aging research?

Aubrey: Well, basically most non-SENS research revolves around identifying simple interventions (drugs, genes, diet) that can in some harmonious unitary way slow aging down. I support such research, because it may in many cases make a dent in aging far sooner than SENS will – but its impact will be far less than what SENS will do once it exists. As such, the way to save the most lives and alleviate the most suffering is to pursue both approaches.

Niaz: One of the important consequences of successful SENS research is that we will no longer lose creative, inventive individuals and their priceless gifts to humanity. It will really be exciting. You have assigned $13 million dollars out if $16.5 million dollars that you inherited from your mom to SENS. In addition, you have dedicated your life, all your time and money to this mission. Do you think you’re going to be successful as well as going to find out the ways to overcome death? What is the timeframe?

Aubrey: As a researcher, I intrinsically accept that I don’t know whether my work will succeed, but I am sufficiently motivated by the knowledge that it MAY succeed. I don’t think of myself as a betting man, but in that sense I suppose I am. As for timeframes, I think there is a 50% (at least) chance that this research will get us to what I’ve called “longevity escape velocity” within 20-25 years.

Niaz: WOW! That’s going to be incredible. Can the planet cope with people living so long?

Aubrey: People are incredibly bad at understanding the influences of the trajrctory of global population and how it would be altered by the defeat of aging, which is why we are funding a very prestigious group in Denver to analyse it authoritatively. The short answer is yes, we believe that the planet can certainly cope, partly because the currently-observed falling fertility rates and rising age at childbirth will continue, but also becaue new technologies such as renewable energy and nuclear fusion will greatly increase the planet’s carrying capacity.

Niaz: Google’s CEO, Larry Page, said: “Illness and aging affect all our families. With some longer term, moonshot thinking around healthcare and biotechnology, I believe we can improve millions of lives.” And very recently Google has announced a new company called Calico that will focus on health and well-being, in particular the challenge of aging and associated diseases.  What do you think about this move by Google?

Aubrey: It’s the single best piece of news in all the time I’ve been working in this field. Even though Calico is taking its time to determine its research priorities, I’m very confident that it will make huge contributions to hastening the defeat of aging.

Niaz: Now, as the editor of the journal of rejuvenation, obviously you have a lot of information coming across your desk all the time. I was wondering is there any particular research that excites you at the moment?

Aubrey: I really don’t want to single anything out. SENS is a divide-and-conquer strategy, and all its strands are moving forward very promisingly.

Niaz: You are exceptionally well connected with other scientists. I have seen you at TEDMED 2012 Conference. About how many scientific conferences do you attend each year? What is your main means of becoming acquainted with other scientists?

Aubrey: I give about 50 talks a year, at conferences, universities and elsewhere. I meet scientists there, of course, but also by contacts based on reading publications. In that regard I’m no different than any other scientist.

Niaz: What are your goals for the next decade?

Aubrey: To become obsolete. My goal is that by 2020 or so there will be people involved in this mission who are much better than me at all the tasks I’m good at and that currently the mission relies on me to perform.

Niaz: Is there anything else you would like for readers of eTalks to know about your work?

Aubrey: The main thing I want to communicate is that shortage of funding is delaying the defeat of aging by many years. My current estimate is that we could be going about three times faster if funding were not limiting – and the tragedy is that even a ten-fold increase, to something like $100m per year (way under 1% of the NIH’s butget), would pretty much eliminate that slowdown. We have a solid plan, and we have the world’s best researchers waiting and eager to get on and implement. All we need is the resources to let them get on with it.

Niaz: Dear Aubrey, thanks a lot for giving us time and sharing us your invaluable ides. We are wishing you very good luck for your tremendous success. Please take very good care of yourself.

Aubrey: My pleasure. Many thanks for the invitation.

Ending Note: It’s been more than a decade since Dr. Aubrey de Grey has established the principles behind SENS. You can visit sens.org and look at the summary of the principles over there. Also, of course, Dr. Aubrey recommends you his book “Ending Aging” which covers the strategy in lots of detail.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. Brian Keegan on Big Data

5. danah boyd on Future of Technology and Social Media

6. Irving Wladawsky-Berger on Evolution of Technology and Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

8. James Allworth on Disruptive Innovation

9. James Kobielus on Big Data, Cognitive Computing and Future of Product

James Kobielus: Big Data, Cognitive Computing and Future of Product

Editor’s Note: As IBM’s Big Data Evangelist, James Kobielus is IBM Senior Program Director, Product Marketing, Big Data Analytics Solutions. He is an industry veteran, a popular speaker and social media participant, and a thought leader in Big Data, Hadoop, Enterprise Data Warehousing, Advanced Analytics, Business Intelligence, Data Management, and Next Best Action Technologies. He works with IBM’s product management and marketing teams in Big Data. He has spoken at such leading industry events as IBM Information On Demand, IBM Big Data Integration and Governance, Hadoop Summit, Strata, and Forrester Business Process Forum. He has published several business technology books and is a very popular provider of original commentary on blogs and many social media.

To learn more about his research, works, ideas, theories and knowledge, please check this this this this this this and this out.

eTalk’s Niaz Uddin has interviewed James Kobielus recently to gain insights about his ideas, research and works in the field of Big Data which is given below.

Niaz: Dear James, thank you so much for joining us in the midst of your busy schedule. We are very thrilled and honored to have you at eTalks.

James: And I’m thrilled and honored that you asked me.

Niaz: You are a leading expert on Big Data, as well as on such enabling technologies as enterprise data warehousing, advanced analytics, Hadoop, cloud services, database management systems, business process management, business intelligence, and complex-event processing. At the beginning of our interview can you please tell us about Big Data? How does Big Data make sense of the new world?

James: Big Data refers to approaches for extracting deep value from advanced analytics and trustworthy data at all scales. At the heart of advanced analytics is data mining, which is all about using statistical analysis to find non-obvious patterns (segmentations, correlations, trends, propensities, etc.) within historical data sets.

Some might refer to advanced analytics as tools for “making sense” of this data in ways that are beyond the scope of traditional reporting and visualization. As we aggregate and mine a wider variety of data sources, we can find far more “sense”–also known as “insights”–that previously lay under the surface. Likewise, as we accumulate a larger volume of historical data from these sources and incorporate a wider variety of variables from them into our models, we can build more powerful predictive models of what might happen under various future circumstances. And if we can refresh this data rapidly with high-velocity high-quality feeds, while iterating and refining our models more rapidly, we can ensure that our insights reflect the latest, greatest data and analytics available.

That’s the power of Big Data: achieve more data-driven insights (aka “making sense”) by enabling our decision support tools to leverage the “3 Vs”: a growing Volume of stored data, higher Velocity of data feeds, and broader Variety of data sources.

Niaz: As you know, Big Data has already started to redefine search, media, computing, social media, products, services and so on. Availability of Data helping us analyzing trend and doing interesting things in more accurate and efficient ways than before. What are some of the most interesting uses of big data out there today?

James: Where do I start? There are interesting uses of Big Data in most industries and in most business functions.

I think cognitive computing applications of Big Data are among the most transformative tools in modern business.

Cognitive computing is a term that probably goes over the head of most of the general public. IBM defines it as the ability of automated systems to learn and interact naturally with people to extend what either man or machine could do on their own, thereby helping human experts drill through big data rapidly to make better decisions.

One way I like to describe cognitive computing is as the engine behind “conversational optimization.” In this context, the “cognition” that drives the “conversation” is powered by big data, advanced analytics, machine learning and agile systems of engagement. Rather than rely on programs that predetermine every answer or action needed to perform a function or set of tasks, cognitive computing leverages artificial intelligence and machine learning algorithms that sense, predict, infer and, if they drive machine-to-human dialogues, converse.

Cognitive computing performance improves over time as systems build knowledge and learn a domain’s language and terminology, its processes and its preferred methods of interacting. This is why it’s such a powerful conversation optimizer. The best conversations are deep in give and take, questioning and answering, tackling topics of keenest interest to the conversants. When one or more parties has deep knowledge and can retrieve it instantaneously within the stream of the moment, the conversation quickly blossoms into a more perfect melding of minds. That’s why it has been deployed into applications in healthcare, banking, education and retail that build domain expertise and require human-friendly interaction models.

IBM Watson is one of the most famous exemplars of the power of cognitive computing driving agile human-machine conversations.  In its famous “Jeopardy!” appearance, Watson illustrated how its Deep Question and Answer technology—which is cognitive computing to the core—can revolutionize the sort of highly patterned “conversation” characteristic of a TV quiz show. By having its Deep Q&A results rendered (for the sake of that broadcast) in a synthesized human voice, Watson demonstrated how it could pass (and surpass) any Turing test that tried to tell whether it was a computer rather than, say, Ken Jennings. After all, the Turing test is conversational at its very core.

What’s powering Watson’s Deep Q&A technology is an architecture that supports an intelligent system of engagement. Such an architecture is able to mimic real human conversation, in which the dialogue spans a broad, open domain of subject matter; uses natural human language; is able to process complex language with a high degree of accuracy, precision and nuance; and operates with speed-of-thought fluidity.

Where the “Jeopardy!” conversational test was concerned (and where the other participants were humans literally at the top of that game), Watson was super-optimized. However, in the real-world of natural human conversation, the notion of “conversation optimization” might seem, at first glance, like a pointy-headed pipedream par excellence. However, you don’t have to be an academic sociologist to realize that society, cultures and situational contexts impose many expectations, constraints and other rules to which our conversations and actions must conform (or face disapproval, ostracism, or worse). Optimizing our conversations is critical to surviving and thriving in human society.

Wouldn’t it be great to have a Watson-like Deep Q&A adviser to help us understand the devastating faux pas to avoid and the right bon mot to drop into any conversation while we’re in the thick of it? That’s my personal dream and I’ll bet that before long, with mobile and social coming into everything, it will be quite feasible (no, this is not a product announcement—just the dream of one IBMer). But what excites me even more (and is definitely not a personal pipedream), is IBM Watson Engagement Advisor, which we unveiled earlier this year. It is a cognitive-computing assistant that revolutionizes what’s possible in multichannel B2C conversations. The  solution’s “Ask Watson” feature uses Deep Q&A to greet customers, conduct contextual conversations on diverse topics, and ensure that the overall engagement is rich with answers, guidance and assistance.

Cognitive/conversational computing is also applicable to “next best action,” which is one of today’s hottest new focus areas in intelligent systems. At its heart, next best action refers to an intelligent infrastructure that optimizes agile engagements across many customer-facing channels, including portal, call center, point of sales, e-mail and social. With cognitive-computing infrastructure the silent assistant, customers engage in a never-ending whirligig of conversations with humans and, increasingly, with automated bots, recommendation engines and other non-human components that, to varying degrees, mimic real-human conversation.

Niaz: So do you think machine learning is the right way to analyze Big Data?

James: Machine learning is an important approach for extracting fresh insights from unstructured data in an automated fashion, but it’s not the only approach. For example, machine learning doesn’t eliminate the need for data scientists to build segmentation, regression, propensity, and other models for data mining and predictive analytics.

Fundamentally, machine learning is a productivity tool for data scientists, helping them to get smarter, just as machine learning algorithms can’t get smarter without some ongoing training by data scientists. Machine learning allows data scientists to train a model on an example data set, and then leverage algorithms that automatically generalize and learn both from that example and from fresh feeds of data. To varying degrees, you’ll see the terms “unsupervised learning,” “deep learning,” “computational learning,” “cognitive computing,” “machine perception,” “pattern recognition,” and “artificial intelligence” used in this same general context.

Machine learning doesn’t mean that the resultant learning is always superior to what human analysts might have achieved through more manual knowledge-discovery techniques. But you don’t need to believe that machines can think better than or as well as humans to see the value of machine learning. We gladly offload many cognitive processes to automated systems where there just aren’t enough flesh-and-blood humans to exercise their highly evolved brains on various analytics tasks.

Niaz:What are the available technologies out there those help profoundly to analyze data? Can you please briefly tell us about Big Data technologies and their important uses?

James: Once again, it’s a matter of “where do I start?” The range of Big Data analytics technologies is wide and growing rapidly. We live in the golden age of database and analytics innovation. Their uses are everywhere: in every industry, every business function, and every business process, both back-office and customer-facing.

For starters, Big Data is much more than Hadoop. Another big data “H”—hybrid—is becoming dominant, and Hadoop is an important (but not all-encompassing) component of it. In the larger evolutionary perspective, big data is evolving into a hybridized paradigm under which Hadoop, massively parallel processing enterprise data warehouses, in-memory columnar, stream computing, NoSQL, document databases, and other approaches support extreme analytics in the cloud.

Hybrid architectures address the heterogeneous reality of big data environments and respond to the need to incorporate both established and new analytic database approaches into a common architecture. The fundamental principle of hybrid architectures is that each constituent big data platform is fit-for-purpose to the role for which it’s best suited. These big data deployment roles may include any or all of the following: data acquisition, collection, transformation, movement, cleansing, staging, sandboxing, modeling, governance, access, delivery, archiving, and interactive exploration. In any role, a fit-for-purpose big data platform often supports specific data sources, workloads, applications, and users.

Hybrid is the future of big data because users increasingly realize that no single type of analytic platform is always best for all requirements. Also, platform churn—plus the heterogeneity it usually produces—will make hybrid architectures more common in big data deployments.

Hybrid deployments are already widespread in many real-world big data deployments. The most typical are the three-tier—also called “hub-and-spoke”—architectures. These environments may have, for example, Hadoop (e.g., IBM InfoSphere BigInsights) in the data acquisition, collection, staging, preprocessing, and transformation layer; relational-based MPP EDWs (e.g., IBM PureData System for Analytics) in the hub/governance layer; and in-memory databases (e.g., IBM Cognos TM1) in the access and interaction layer.

The complexity of hybrid architectures depends on range of sources, workloads, and applications you’re trying to support. In the back-end staging tier, you might need different preprocessing clusters for each of the disparate sources: structured, semi-structured, and unstructured.

In the hub tier, you may need disparate clusters configured with different underlying data platforms—RDBMS, stream computing, HDFS, HBase, Cassandra, NoSQL, and so on—-and corresponding metadata, governance, and in-database execution components.

And in the front-end access tier, you might require various combinations of in-memory, columnar, OLAP, dimensionless, and other database technologies to deliver the requisite performance on diverse analytic applications, ranging from operational BI to advanced analytics and complex event processing.

Niaz: That’s really amazing. How to you connect these two dots: Big Data Analytics and Cognitive Computing? How does this connection make sense?

James: The relationship between Cognitive computing and Big Data is simple. Cognitive computing is an advanced analytic approach that helps humans drill through the unstructured data within Big Data repositories more rapidly in order to see correlations, patterns, and insights more rapidly.

Think of cognitive computing as a “speed-of-thought accelerator.” Speed of thought is something we like to imagine operates at a single high-velocity setting. But that’s just not the case. Some modes of cognition are painfully slow, such as pondering the bewildering panoply of investment options available under your company’s retirement plan. But some other modes are instantaneous, such as speaking your native language, recognizing an old friend, or sensing when your life may be in danger.

None of this is news to anybody who studies cognitive psychology has followed advances in artificial intelligence, aka AI, over the past several decades. Different modes of cognition have different styles, speeds, and spheres of application.

When we speak of “cognitive computing,” we’re generally referring to the ability of automated systems to handle the conscious, critical, logical, attentive, reasoning mode of thought that humans engage in when they, say, play “Jeopardy!” or try to master some rigorous academic discipline. This is the “slow” cognition that Nobel-winning psychologist/economist Daniel Kahneman discussed in recent IBM Colloquium speech.

As anybody who has ever watched an expert at work will attest, this “slow” thinking can move at lightning speed when the master is in his or her element. When a subject-domain specialist is expounding on their field of study, they often move rapidly from one brilliant thought to the next. It’s almost as if these thought-gems automatically flash into their mind without conscious effort.

This is the cognitive agility that Kahneman examined in his speech. He described the ability of humans to build skills, which involves mastering “System 2″ cognition (slow, conscious, reasoning-driven) so that it becomes “System 1″ (fast, unconscious, action-driven). Not just that, but an expert is able to switch between both modes of thought within the moment when it becomes necessary to rationally ponder some new circumstance that doesn’t match the automated mental template they’ve developed. Kahneman describes System 2 “slow thinking” as well-suited for probability-savvy correlation thinking, whereas System 1 “fast thinking” is geared to deterministic causal thinking.

Kahneman’s “System 2″ cognition–slow, rule-centric, and attention-dependent–is well-suited for acceleration and automation on big data platforms such as IBM Watson. After all, a machine can process a huge knowledge corpus, myriad fixed rules, and complex statistical models far faster than any mortal. Just as important, a big-data platform doesn’t have the limited attention span of a human; consequently, it can handle many tasks concurrently without losing its train of thought.

Also, Kahneman’s “System 1″ cognition–fast, unconscious, action-driven–is not necessarily something we need to hand to computers alone. We can accelerate it by facilitating data-driven interactive visualization by human beings, at any level of expertise. When a big-data platform drives a self-service business intelligence application such as IBM Cognos, it can help users to accelerate their own “System 1″ thinking by enabling them to visualize meaningful patterns in a flash without having to build statistical models, do fancy programming, or indulge in any other “System 2″ thought.

And finally, based on those two insights, it’s clear to me that cognitive computing is not simply limited to the Watsons and other big-data platforms of the world. Any well-architected big data, advanced analytics, or business intelligence platform is essentially a cognitive-computing platform. To the extent it uses machines to accelerate the slow “System 2″ cognition and/or provides self-service visualization tools to help people speed up their wetware’s “System 1″ thinking, it’s a cognitive-computing platform.

Now I will expand upon the official IBM definition of “cognitive computing” to put it in a larger frame of reference. As far as I’m concerned, the core criterion of cognitive computing is whether the system, however architected, has the net effect of speeding up any form of cognition, executing on hardware and/or wetware.

Niaz: How is Big Data Analytics changing the nature of building great products? What do you think about the future of products?

James: That’s a great question that I haven’t explored too much extent. My sense is that more “products” are in fact “services”–such as online media, entertainment, and gaming–that, as an integral capability, feed on the Big Data generated by its users. Companies tune the designs, interaction models, and user experiences of these productized services through Big Data analytics. To the extent that users respond or don’t respond to particular features of these services, that will be revealed in the data and will trigger continuous adjustments in product/service design. New features might be added on a probationary basis, to see how users respond, and just as quickly withdraw or ramped up in importance.

This new product development/refinement loop is often referred to as “real-world experiments.” The process of continuous, iterative, incremental experimentation both generates and depends on a steady feed of Big Data. It also requires data scientists to play a key role in the product-refinement cycle, in partnership with traditional product designers and engineers.  Leading-edge organizations have begun to emphasize real-world experiments as a fundamental best practice within their data-science, next-best-action, and process-optimization initiatives.

Essentially, real-world experiments put the data-science “laboratory” at the heart of the big data economy.  Under this approach, fine-tuning of everything–business model, processes, products, and experiences–becomes a never-ending series of practical experiments. Data scientists evolve into an operational function, running their experiments–often known as “A/B tests”–24×7 with the full support and encouragement of senior business executives.

The beauty of real-world experiments is that you can continuously and surreptitiously test diverse product models inline to your running business. Your data scientists can compare results across differentially controlled scenarios in a systematic, scientific manner. They can use the results of these in-production experiments – such as improvements in response, acceptance, satisfaction, and defect rates on existing products/services–to determine which work best with various customers under various circumstances.

Niaz: What is a big data product? How can someone make beautiful stuff with data?

James: What is a Big Data product? It’s any product or service that helps people to extract deep value from advanced analytics and trustworthy data at all scales, but especially at the extreme scales of volume (petabytes and beyond), velocity (continuous, streaming, real-time, low-latency), and/or variety (structured, semi-structured, unstructured, streaming, etc.). That definition encompasses products that provide the underlying data storage, database management, algorithms, metadata, modeling, visualization, integration, governance, security, management, and other necessary features to address these use cases. If you track back to my answer above relevant to “hybrid” architectures you’ll see a discussion of some of the core technologies.

Making “beautiful stuff with data”? That suggests advanced visualization to call out the key insights in the data. The best data visualizations provide functional beauty: they make the process of sifting through data easier, more pleasant, and more productive for end users, business analysts, and data scientists.

Niaz: Can you please tell us about building Data Driven culture that posters data driven innovation to build next big product?

James: A key element of any data-driven culture is establishing a data science center of excellence. Data scientists are the core developers in this new era of Big Data, advanced analytics, and cognitive computing.

Game-changing analytics applications don’t spring spontaneously from bare earth. You must plant the seeds through continuing investments in applied data science and, of course, in the big data analytics platforms and tools that bring it all to fruition. But you’ll be tilling infertile soil if you don’t invest in sustaining a data science center of excellence within your company. Applied data science is all about putting the people who drill the data in constant touch with those who understand the applications. In spite of the mythology surrounding geniuses who produce brilliance in splendid isolation, smart people really do need each other. Mutual stimulation and support are critical to the creative process, and science, in any form, is a restlessly creative exercise.

In establishing a center of excellence, you may go the formal or informal route. The formal approach is to institute ongoing process for data-science collaboration, education, and information sharing. As such, the core function of your center of excellence might be to bridge heretofore siloed data-science disciplines that need to engage more effectively. The informal path is to encourage data scientists to engage with each other using whatever established collaboration tools, communities, and confabs your enterprise already has in place. This is the model under which centers of excellence coalesce organically from ongoing conversations.

Creeping polarization, like general apathy, will kill your data science center of excellence if you don’t watch out. Don’t let the center of excellence, formal or informal, degenerate into warring camps of analytics professionals trying to hardsell their pet approaches as the one true religion. Centers of excellence must serve as a bridge, not a barrier, for communication, collegiality, and productivity in applied data science.

Niaz: As you know leaders and managers have always been challenged to get the right information to make good decisions. Now with the digital revolution and technological advancement, they have opportunities to access huge amount of data. How this trend will change management practice? What do you think about the future of decision making, strategy and running organizations?

James: Business agility is paramount in a turbulent world.  Big Data is changing the way that management responds to–and gets ahead–of changes in their markets, competitive landscape, and operational conditions.

Increasingly, I prefer to think of big data in the broader context of business agility. What’s most important is that your data platform has the agility to operate cost-effectively at any scale, speed, and scope of business that your circumstances demand.

In terms of scale of business, organizations operate at every scale from breathtakingly global to intensely personal. You should be able to acquire a low-volume data platform and modularly scale it out to any storage, processing, memory and I/O capacity you may need in the future. Your platform should elastically scale up and down as requirements oscillate. Your end-to-end infrastructure should also be able to incorporate platforms of diverse scales—petabyte, terabyte, gigabyte, etc.—with those platforms specialized to particular functions and all of them interoperating in a common fabric.

Where speed is concerned, businesses often have to keep pace with stop-and-start rhythms that oscillate between lightning fast and painfully slow. You should be able to acquire a low-velocity data platform and modularly accelerate it through incorporation of faster software, faster processors, faster disks, faster cache and more DRAM as your need for speed grows. You should be able to integrate your data platform with a stream computing platform for true real-time ingest, processing and delivery. And your platform should also support concurrent processing of diverse latencies, from batch to streaming, within a common fabric.

And on the matter of scope, businesses manage almost every type of human need, interaction and institution. You should be able to acquire a low-variety data platform—perhaps a RDBMS dedicated to marketing—and be able to evolve it as needs emerge into a multifunctional system of record supporting all business functions. Your data platform should have the agility to enable speedy inclusion of a growing variety of data types from diverse sources. It should have the flexibility to handle structured and unstructured data, as well as events, images, video, audio and streaming media with equal agility. It should be able to process the full range of data management, analytics and content management workloads. It should serve the full scope of users, devices and downstream applications.

Agile Big Data platforms can serve as the common foundation for all of your data requirements. Because, after all, you shouldn’t have to go big, fast, or all-embracing in your data platforms until you’re good and ready.

Niaz: In your opinion, given the current available Big Data technologies, what is the most difficult challenge in filtering big data to find useful information?

James: The most difficult challenge is in figuring out which data to ignore, and which data is trustworthy enough to serve as a basis for downstream decision-support and advanced analytics.

Most important, don’t always trust the “customer sentiment” that you social-media listening tools as if it were gospel. Yes, you care deeply about how your customers regard your company, your products, and your quality of service. You may be listening to social media to track how your customers—collectively and individually—are voicing their feelings. But do you bother to save and scrutinize every last tweet, Facebook status update, and other social utterance from each of your customers? And if you are somehow storing and analyzing that data—which is highly unlikely—are you linking the relevant bits of stored sentiment data to each customer’s official record in your databases?

If you are, you may be the only organization on the face of the earth that makes the effort. Many organizations implement tight governance only on those official systems of record on which business operations critically depend, such as customers, finances, employees, products, and so forth. For those data domains, data management organizations that are optimally run have stewards with operational responsibility for data quality, master data management, and information lifecycle management.

However, for many big data sources that have emerged recently, such stewardship is neither standard practice nor should it be routine for many new subject-matter data domains. These new domains refer to mainly unstructured data that you may be processing in your Hadoop clusters, stream-computing environments, and other big data platforms, such as social, event, sensor, clickstream, geospatial, and so on.

The key difference from system-of-record data is that many of the new domains are disposable to varying degrees and are not regarded as a single version of the truth about some real-world entity. Instead, data scientists and machine learning algorithms typically distill the unstructured feeds for patterns and subsequently discard the acquired source data, which quickly become too voluminous to retain cost-effectively anyway. Consequently, you probably won’t need to apply much, if any, governance and security to many of the recent sources.

Where social data is concerned, there are several reasons for going easy on data quality and governance. First of all, data quality requirements stem from the need for an officially sanctioned single version of the truth. But any individual social media message constituting the truth of how any specific customer or prospect feels about you is highly implausible. After all, people prevaricate, mislead, and exaggerate in every possible social context, and not surprisingly they convey the same equivocation in their tweets and other social media remarks. If you imagine that the social streams you’re filtering are rich founts of only honest sentiment, you’re unfortunately mistaken.

Second, social sentiment data rarely has the definitive, authoritative quality of an attribute—name, address, phone number—that you would include in or link to a customer record. In other words, few customers declare their feelings about brands and products in the form of tweets or Facebook updates that represent their semiofficial opinion on the topic. Even when people are bluntly voicing their opinions, the clarity of their statements is often hedged by the limitations of most natural human language. Every one of us, no matter how well educated, speaks in sentences that are full of ambiguity, vagueness, situational context, sarcasm, elliptical speech, and other linguistic complexities that may obscure the full truth of what we’re trying to say. Even highly powerful computational linguistic algorithms are challenged when wrestling these and other peculiarities down to crisp semantics.

Third, even if every tweet was the gospel truth about how a customer is feeling and all customers were amazingly articulate on all occasions, the quality of social sentiment usually emerges from the aggregate. In other words, the quality of social data lies in the usefulness of the correlations, trends, and other patterns you derive from it. Although individual data points can be of marginal value in isolation, they can be quite useful when pieced into a larger puzzle.

Consequently, there is little incremental business value from scrutinizing, retaining, and otherwise managing every single piece of social media data that you acquire. Typically, data scientists drill into it to distill key patterns, trends, and root causes, and you would probably purge most of it once it has served its core tactical purpose. This process generally takes a fair amount of mining, slicing, and dicing. Many social-listening tools, including the IBM® Cognos® Consumer Insight application, are geared to assessing and visualizing the trends, outliers, and other patterns in social sentiment. You don’t need to retain every single thing that your customers put on social media to extract the core intelligence that you seek, as in the following questions: Do they like us? How intensely? Is their positive sentiment improving over time? In fact, doing so might be regarded as encroaching on privacy, so purging most of that data once you’ve gleaned the broader patterns is advised.

Fourth, even outright customer lies propagated through social media can be valuable intelligence if we vet and analyze each effectively. After all, it’s useful knowing whether people’s words—”we love your product”—match their intentions—”we have absolutely no plans to ever buy your product”—as revealed through their eventual behavior—for example, buying your competitor’s product instead.

If we stay hip to this quirk of human nature, we can apply the appropriate predictive weights to behavioral models that rely heavily on verbal evidence, such as tweets, logs of interactions with call-center agents, and responses to satisfaction surveys. I like to think of these weights as a truthiness metric, courtesy of Stephen Colbert.

What we can learn from social sentiment data of dubious quality is the situational contexts in which some customer segments are likely to be telling the truth about their deep intentions. We can also identify the channels in which they prefer to reveal those truths. This process helps determine which sources of customer sentiment data to prioritize and which to ignore in various application contexts.

Last but not least, apply only strong governance to data that has a material impact on how you engage with customers, remembering that social data rarely meets that criterion. Customer records contain the key that determines how you target pitches to them, how you bill them, where you ship their purchases, and so forth. For these purposes, the accuracy, currency, and completeness of customers’ names, addresses, billing information, and other profile data are far more important than what they tweeted about the salesclerk in your Poughkeepsie branch last Tuesday. If you screw up the customer records, the adverse consequences for all concerned are far worse than if you misconstrue their sentiment about your new product as slightly positive, when in fact it’s deeply negative.

However, if you greatly misinterpret an aggregated pattern of customer sentiment, the business risks can be considerable. Customers’ aggregate social data helps you compile a comprehensive portrait of the behavioral tendencies and predispositions of various population segments. This compilation is essential market research that helps gauge whether many high-stakes business initiatives are likely to succeed. For example, you don’t want to invest in an expensive promotional campaign if your target demographic isn’t likely to back up their half-hearted statement that your new product is “interesting” by whipping out their wallets at the point of sale.

The extent to which you can speak about the quality of social sentiment data all comes down to relevance. Sentiment data is good only if it is relevant to some business initiative, such as marketing campaign planning or brand monitoring. It is also useful only if it gives you an acceptable picture of how customers are feeling and how they might behave under various future scenarios. Relevance means having sufficient customer sentiment intelligence, in spite of underlying data quality issues, to support whatever business challenge confronts you.

Niaz: How do you see data science evolving in the near future?

James: In the near future, many business analysts will enroll in data science training curricula to beef up their statistical analysis and modeling skills in order to stay relevant in this new age.

However, they will confront a formidable learning curve. To be an effective, well-rounded data scientist, you will need a degree, or something substantially like it, to prove you’re committed to this career. You will need to submit yourself to a structured curriculum to certify you’ve spent the time, money and midnight oil necessary for mastering this demanding discipline.

Sure, there are run-of-the-mill degrees in data-science-related fields, and then there are uppercase, boldface, bragging-rights “DEGREES.” To some extent, it matters whether you get that old data-science sheepskin from a traditional university vs. an online school vs. a vendor-sponsored learning program. And it matters whether you only logged a year in the classroom vs. sacrificed a considerable portion of your life reaching for the golden ring of a Ph.D. And it certainly matters whether you simply skimmed the surface of old-school data science vs. pursued a deep specialization in a leading-edge advanced analytic discipline.

But what matters most to modern business isn’t that every data scientist has a big honking doctorate. What matters most is that a substantial body of personnel has a common grounding in core curriculum of skills, tools and approaches. Ideally, you want to build a team where diverse specialists with a shared foundation can collaborate productively.

Big data initiatives thrive if all data scientists have been trained and certified on a curriculum with the following foundation: paradigms and practices, algorithms and modeling, tools and platforms, and applications and outcomes.

Classroom instruction is important, but a data-science curriculum that is 100 percent devoted to reading books, taking tests and sitting through lectures is insufficient. Hands-on laboratory work is paramount for a truly well-rounded data scientist. Make sure that your data scientists acquire certifications and degrees that reflect them actually developing statistical models that use real data and address substantive business issues.

A business-oriented data-science curriculum should produce expert developers of statistical and predictive models. It should not degenerate into a program that produces analytics geeks with heads stuffed with theory but whose diplomas are only fit for hanging on the wall.

Niaz: We have already seen the huge implication and remarkable results of Big Data from tech giants. Do you think Big Data can also have great role in solving social problems? Can we measure and connect all of our big and important social problems and design the sustainable solutions with the help of Big Data?

James: Of course. Big Data is already being used worldwide to address the most pressing problems confronting humanity on this planet. In terms of “measuring and connecting all our big and important social problems and designing sustainable solutions,” that’s a matter for collective human ingenuity. Big Data is a tool, not panacea.

Niaz: Can you please tell us about ‘Open Source Analytics’ for Big Data? What are the initiatives regarding open source that IBM’s Big Data group and others group (startups) have done or are planning?

James: The principal open-source community in the big data analytics industry are Apache Hadoop and R. IBM is an avid participant in both communities, and has incorporated these technologies into our solution portfolio.

Niaz: What are some of the concerns (privacy, security, regulation) that you think can dampen the promise of Big Data?

James: You’ve named three of them. Overall, businesses should embrace the concept of “privacy by design” – a systematic approach that takes privacy into account from the start – instead of trying to add protection after the fact. In addition, the sheer complexity of the technology and the learning curve of the technologies are a barrier to realizing their full promise. All of these factors introduce time, cost, and risk into the Big Data ROI equation.

Niaz: What are the new technologies you are mostly passionate about? What are going to be the next big things?

James: Where to start? I prefer that your readers follow my IBM Big Data Hub blog to see the latest things I’m passionate about.

Niaz: Last but not least, what are you advices for Big Data startups and for the people those who are working with Big Data?

James: Find your niche in the Big Data analytics industry ecosystem, go deep, and deliver innovation. It’s a big, growing, exciting industry. Brace yourself for constant change. Be prepared to learn (and unlearn) something new every day.

Niaz: Dear James, thank you very much for your invaluable time and also for sharing us your incredible ideas, insights, knowledge and experiences. We are wishing you very good luck for all of your upcoming great endeavors.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. Brian Keegan on Big Data

5. danah boyd on Future of Technology and Social Media

6. Irving Wladawsky-Berger on Evolution of Technology and Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

8. James Allworth on Disruptive Innovation

Ely Kahn: Big Data, Startup and Entrepreneurship

Editor’s Note: Ely Kahn is the Co-founder and VP of Business Development for Sqrrl, a Big Data Startup. Previously, Ely served in a variety of positions in the Federal Government, including Director of Cybersecurity at the National Security Staff in White House, Deputy Chief of Staff at the National Protection Programs Directorate in the Department of Homeland Security, and Director of Risk Management and Strategic Innovation in the Transportation Security Administration. Before his service in the Federal Government, Ely was a management consultant with Booz Allen Hamilton. Ely has a BA from Harvard University and a MBA from the Wharton School at the University of Pennsylvania.

You can find him on Twitter and LinkedIn. Learn more about his Big Data Startup Sqrrl [here]

eTalk’s Niaz Uddin has interviewed Ely Kahn recently to gain insights about Big Data, Startup and Entrepreneurship which is given below.

Niaz: Dear Ely, thank you so much for joining us in the midst of your busy schedule. We are very thrilled to have you at eTalks.

Ely: My pleasure. Thank you for having me.

Niaz: You’re a former management consultant and senior government official who turned Big Data Entrepreneur. At the beginning of our interview, can you please tell us something about entrepreneurship? What is entrepreneurship? Why are you an entrepreneur?

Ely: While in government, I viewed myself as an “intrapreneur”, and I focused on developing new public sector programs that could disrupt traditional ways of doing business.  Moving to private sector entrepreneurship was a natural evolution for me.  Entrepreneurship takes all different forms, but the type of entrepreneurship that is most interesting to me is modeled around Clayton Christensen’s theory of “Disruptive Innovation.”

Niaz: You have a BA from Harvard University and a MBA from the Wharton School at the University of Pennsylvania. You’ve served in a variety of positions in the Federal Government and before your service in the Federal Government; you were a management consultant with Booz Allen Hamilton. How have you transformed your career into entrepreneurship and why? What’s the most exciting thing about entrepreneurship to you?

Ely: Innovation has been a key theme in all my jobs so far and cuts across consulting, government, and startups.  However, business school was actually an incredibly valuable tool for making the transition from government to a technology startup.  More than anything, it was two years that allowed me to explore different startup ideas in a very low risk environment.

The most exciting thing about entrepreneurship for me is the continuous learning environment.  Every week it seems I am picking up something new across a wide variety of functional areas, including sales, marketing, business development, product management, and finance.

Niaz: You’re the Co-founder and VP of Business Development of Sqrrl, a Big Data company. How did the idea Sqrrl come up and how have you started?

Ely: Sqrrl’s technology has its roots in the National Security Agency (NSA) and that technology is called Accumulo.  Accumulo powers many of NSA’s analytic programs.  I was introduced to the NSA engineers that helped create Accumulo while I was in business school, and from there I started to put together the business plan and investor pitch to commercialize Accumulo.

Niaz: At this point, can you please kindly tell us a bit of funding? Who are the core investors at Sqrrl?

Ely: We have two world-class investors:  Atlas Venture and Matrix Partners.  We closed a $2M seed round with them in August 2012.

Niaz: So everything you are doing at Sqrrl is all about Big Data and Big Data Products. Can you please tell us what is Big Data?

Ely: Big Data is generally referred to as data that cannot be processed using traditional database technologies because of the volume, velocity, and variety of data.  Big Data typically includes tera- and petabytes of structured, semi-structured, and unstructured data, and examples are sensor data, social media, clickstreams, and log files.

Niaz: Why do you think Big Data is the next big opportunity for all of us?

Ely: Big Data technologies like Hadoop and Accumulo enable companies to analyze datasets that were previously too expensive or burdensome to process.  This analysis can become new forms of competitive advantage or can open up completely new lines of business.

Niaz: How do you define Big Data Product? Can you please give us some examples of Big Data products?

Ely: Big Data products span a wide range of technologies, including storage, databases, analytical tools, and visualization platforms.  Two classes of Big Data technologies that are of particular importance are Hadoop vendors and NoSQL database vendors.  Hadoop + NoSQL enable organizations to process petabytes of multi-structured data in real-time.

Niaz: How will Big Data products change the perception of building products?

Ely: Many Big Data products are still “crossing the chasm” from early adopters to mainstream users.  However, these products have the potential to bring the power of massive parallel computing to many companies.  Historically, these types of capabilities have been the domain of massive web companies like Google and Facebook or large government agencies like the NSA.

Niaz: Now can you please briefly tell us about Sqrrl?

Ely: Sqrrl is the provider of a Big Data platform that powers secure, real-time applications.  Our technology leverages both Apache Hadoop and Apache Accumulo, which are open source software technologies.

Niaz: What are your core products and who are the main customers of Sqrrl?

Ely: Our technology offering is called Sqrrl Enterprise and it enables organizations to securely bring their data together on a single platform and easily build real-time applications that leverage this data.  Some of the use cases for Sqrrl Enterprise include serving as the platform for applications that detect insider threats in financial services companies or serving as the platform for predictive medicine in healthcare companies.

Niaz: You’ve started at August 2012. How’s company doing now?

Ely: The company is doing great.  We now have about 20 employees and a number of customers in a variety of industries.

Niaz: What is your vision at Sqrrl?

Ely: Our vision is to enable organizations to “securely analyze everything.”  Our Big Data platform helps organizations perform analytics on massive amounts of data and often times this data has very strict privacy or security requirements on it.

Niaz: How big is Big Data industry?

Ely: According to the analyst firm Wikibon “the Big Data market is projected to reach $18.1 billion in 2013… [and] on pace to exceed $47 billion by 2017.”

Niaz: What do you think about the other Big Data startups? How’s Big Data community doing?

Ely: There is an amazing ecosystem of Big Data startups that are doing some amazingly innovative things.  I am paying particular close attention to startups focused on machine learning and data visualization, as these are complementary areas to our product.

Niaz: Well, we all know that starting a company is not an easy task for us. So, can you please put in the picture what are the difficulties of starting a company we may face?

Ely: The thing that is fascinating about doing a startup is that there is a never ending series of challenges:  raising funding, hiring, finding product-market fit, customer acquisition and retention, and the list goes on.  The key is to be continuously prioritizing where to spend your time.

Niaz: What have you learned by starting a company?

Ely: I have learned many things, but the lesson that I am continuously learning is to be resilient.  Startups are inevitably filled with small failures, but the key is to quickly learn from them to avoid any large failures.

Niaz: What are the mistakes an entrepreneur can make in the early stage?

Ely: I think the biggest mistake that an entrepreneur can make is being afraid to make mistakes.  Early stage entrepreneurs need to be continuously running experiments to find product-market fit.

Niaz: Can you please share some of your life lessons for our readers?

Ely: Stay humble.  Entrepreneurship requires both luck and skill, and I think people sometime mistake luck for skill.

Niaz: Thank you so much for joining us and sharing your invaluable ideas, insights and knowledge. We are wishing you very good luck for the greater success of Sqrrl.

Ely: Many thanks.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. James Allworth on Disruptive Innovation

2. Viktor Mayer-Schönberger on Big Data Revolution

3. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

4. Brian Keegan on Big Data

5. Irving Wladawsky-Berger on Evolution of Technology and Innovation

Gerd Leonhard: Big Data and the Future of Media

Editor’s Note: Gerd Leonhard is a well-known Futurist and Author of 5 books, a highly influential Keynote Speaker, Think-Tank Leader & Adviser, and – since late 2011 – the Founder of GreenFuturists.com. Wall Street Journal called him ‘one of the leading Media Futurists in the World’. He is well-known as the Co-author of the influential book ‘The Future of Music’ (Berklee Press, 2005), and as the author of ‘The End of Control’ (2007), ‘Music 2.0’ (2008), ‘Friction is Fiction’ (2009, Lulu Publishing), and ‘The Future of Content’ (Kindle-only, 2011). His new book is “From Ego to Eco”.

Gerd is a fellow of the Royal Society for the Arts (London), a member of the World Future Society, and a visiting professor at the Fundação Dom Cabral in Brazil. A native German, he now resides in Basel, Switzerland.

You can read his bio and learn more about his works from here, here, here, here and here.

eTalk’s Niaz Uddin has interviewed Gerd Leonhard recently to gain his ideas and insights about  Big Data and the Future of Media, Marketing and Technology which is given below.

Niaz: Dear Leonhard thanks for joining us in the midst of your busy schedule. We are honored and thrilled to have you at eTalks.

Gerd: My pleasure.

Niaz: At the beginning of our interview, can you please tell us about Big Data? How is Big Data revolutionizing our life and work?

Gerd: I define Big Data as the result of exponentially increasing velocity, variety, volume, virality and value.  All of us are generating increasingly large amounts of data, whether it’s by using Google, or sharing a location, rating a site, tweeting, facebooking, photo uploading etc. – and this is primarily driven by the Social-Local-Mobile SoLoMo revolution on the Internet. Once this data can be harnessed – and safeguarded i.e. refined and permitted – it will allow for substantial cost savings (such as with networked public cars, logistics, p2p energy etc.) as well as for pretty dramatic new values such as prediction and anticipation offerings like Google Now and Siri.

Niaz: Why does Big Data excite you most?

Gerd: The possibility of generating real user value from the raw data, i.e. faster understanding of complex issues, realtime, customized news and content feeds, and an overall dramatically improved digital content experiences.  The downside – as we have just recently discovered see my post on this, that we may all become permanently naked and subject to whatever data obsession governments may follow.

Niaz: What do you think about Big Data Products?

Gerd: They are more like services, platforms and experiences then they are products — but we are still very nascent with this; kind of like the beginning of Search, 10 or so years ago.  Every major technology company, every internet portal and every media company is now diving into Big Data as the next big thing – and this connects to Social Media of course, and to The Internet of Things (IoT)

Niaz: How the futures of products (Big Data Products) are going to be changed?

Gerd:  Big Data, unlike Big Oil, will be all about ecosystems, about creating win-win-win solutions, about interdependence and mutual respect i.e. permission and trust. Unless we have that worked out, it will fail.

Niaz:  As you know, Big Data has started revolutionizing almost everything. Marketing is changing significantly. Can you please tell us about the impact of Big Data in Marketing?

Gerd: Basically, IF users allow marketers to track them i.e. if there is a ‘like’ relationship, than big data feeds are a goldmine for marketers – everything will be 99% track able, customized and personalized.  Again, IF value is there for the users, this is dream come true for marketers. The main focus will be on securing and maintaining TRUST – which is why the PRISM debacle is such an issue

Niaz: What do you think about the future of Marketing?

Gerd: We wont need Marketing as we know it. It will all be about sense-making, curation, experiences, added values, timeliness and conversations (see my HBR piece)

Niaz: If you go for buying foods, soda or any house hold things you will have so many good alternatives. Considering broader area, if you go for buying smart phone, computers or even cars, you will have so many good alternatives too. But living in such an exciting era, we just have only one good search engine, only one good micro blogging site, only one good social network and only one good professional network. Are we going through any crisis?

Gerd: I think we have a multitude of platforms and services – innovation is moving much too fast!

Niaz: How can we recover this crisis? What is the future of this trend?

Gerd: The only way forward is to create some kind of ‘sustainable capitalism’ based on hyper-collaboration and new, interdependent ecosystems of money, media, energy, food and data (See ego to eco).

Niaz: As you know, as long as we are watching adds on social media or web, we are no longer human beings, we become products. Social media companies are making billions of dollars but they are not making their consumers wealthy and not even enriching the life of consumers. What are the core problems of our social media?

Gerd: As the saying goes: if you don’t pay you ARE the product. This is not per se a problem – unless we lose control of our bargains. Too much too fast too deep can become a real problem for the human brain, as well, so… deteching will increase as well.

Niaz: How do you see the world of social media evolving over the next 10 years?

Gerd: All the web is social, mobile, local – there will be no difference in online and offline in less than 7 years, and the same goes for ‘social’.

Niaz: By this time, Google has become very gigantic. It controls almost all information available on the Internet. It shows us that information it wants to. So many people have already started to believe that Google is going to control the whole world. We have also seen Google to use its monopoly power. Google’s search algorithms “decide” what is relevant and valuable. What do you think about Google’s monopoly? What could be better for the whole world?

Gerd: If Google behaves like a monolith and stops earning our trust it will die very quickly as we will feel be betrayed – this is why WE control these big web companies, in the end.

Niaz: Can you please tell us about real-time approach?

Gerd: Everything is going real-time because of mobile internet, cameras, social media, big data — in many ways a torrent of noise, in other ways a treasure trove. We will need better filters and curators.

Niaz: What are the impacts of real-time approach in everything we do now?

Gerd: Basically if it’s not realtime we won’t care.

Niaz: Dear Gerd, we are really grateful for giving us time and sharing us priceless ideas, insights and experience for eTalks community. We are wishing you very good luck for all of your upcoming endeavors.

Gerd: Thank you Niaz.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. James Allworth on Disruptive Innovation

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. Brian Keegan on Big Data

5. danah boyd on Future of Technology and Social Media

6. Irving Wladawsky-Berger on Evolution of Technology and Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

danah boyd: Future of Technology and Social Media

Editor’s Note: danah boyd is a Senior Researcher at Microsoft Research, a Research Assistant Professor in Media, Culture, and Communication at New York University, a Visiting Researcher at Harvard Law School, a Fellow at Harvard’s Berkman Center, and an Adjunct Associate Professor at the University of New South Wales.

To read her full bio, please click here, here and here.

eTalk’s Niaz Uddin has interviewed danah boyd recently to gain her ideas and insights on Future of Technology and Social Media which is given below.

Niaz: Dear Danah, thank you so much for giving us some time in the midst of your busy schedule.

Danah: You’re welcome Niaz.

Niaz: As you know, we have already passed two decades of Internet bubble burst. By this time, we have got Google, Amazon, Facebook, LinkedIn, Apple and some other great companies. At the same time, our economy is transforming into digital economy. What are the revolutionary changes going to be occurred in the upcoming decades?

Danah: Decades? I think that the most interesting technological transformations are going to come from bioinformatics and the health sector.  I think that we’re at the earliest stage of this process, but I’m looking forward to see where it goes.

Niaz: What do you think about the future of Internet and social media?

Danah: In terms of social media, I think we’re in a lull of innovation.  This always happens when too many people are focused on a particular arena.  The focus is on perfecting, consolidating, and small iterations. I don’t think it’s possible to say what’s coming around the corner that’s a true breakthrough.  If I knew, I’d be helping build it. <grin>

Niaz: How do you define ‘Big Data’? What does excite you most about ‘Big Data’?

Danah: If you haven’t read this, you should  read ‘Critical Questions for Big Data‘.

Kate and I define “Big Data” as a cultural, technological, and scholarly phenomenon that rests on the interplay of technology, analysis, and mythology.  The latter is the most important here.  As a phenomenon, “Big Data” has nothing to do with bigness, but everything to do with the belief that lots of data and math can solve all of the world’s problems.

I’m excited to see more people engaging with math and data, but I think it’s critical that folks never forget that interpretation requires more than math.  It’s in the interpretation that knowledge – and biases – lie.

Niaz: Thanks again for joining us. We hope to get you again for a detailed interview.

Danah: You are welcome. Sure, we will sit another time.

Ending Note: danah boyd is currently very busy with her on going projects and research works. She got a little time to talk to us.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. Brian Keegan on Big Data

5. Aubrey de Grey on Aging and Overcoming Death

6. Irving Wladawsky-Berger on Evolution of Technology and Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

8. James Allworth on Disruptive Innovation

9. James Kobielus on Big Data, Cognitive Computing and Future of Product

Irving Wladawsky-Berger: Evolution of Technology and Innovation

Editor’s Note: Dr. Irving Wladawsky-Berger retired from IBM on May 31, 2007 after 37 years with the company. As Chairman Emeritus, IBM Academy of Technology, he continues to participate in a number of IBM’s technical strategy and innovation initiatives. He is also Visiting Professor of Engineering Systems at MIT, where he is involved in multi-disciplinary research and teaching activities focused on how information technologies are helping transform business organizations and the institutions of society. You can read his full bio from here.

eTalk’s Niaz Uddin has interviewed Irving Wladawsky-Berger recently to gain insights about the evolution of Technology and Innovation which is given below.

Niaz: Dear Irving, thank you so much for joining us.  We are thrilled and honored to have you for eTalks .

Irving Wladawsky-Berger: Niaz, thank you for having me.

Niaz: You began your career in IBM as a researcher in 1970. You have retired from IBM on May 31, 2007 as a Vice President of Technical Strategy and Innovation. From the dawn of Supercomputing to the rise of Linux and Open Source, the Internet, Cloud Computing, Disruptive Innovation, Big Data and Smarter Planet; you have been involved with it all.  You have worked for 37 years for bringing sustainable technological innovations for IBM. Can you please give us a brief of the evolution of technology and innovation? What do you think about the technological trend that has been changing since you have joined in IBM?

Irving Wladawsky-Berger: Well,It has been changed radically since the time I started in 1970 until now, let say, after 30 years. At the time in 1970, there were no personal computers and needless to say there was no internet. Computers were expensive and people were able to use them in a time sharing mode. Usually you would be needed a contract to be able to operate a computer and it was relatively expensive at that time. So most of the innovation and research had to be done in a kind of big science lab environment, whether it’s at a university like MIT or an R&D lab in IBM. Now all that began to change when personal computers emerged in the 1980s and especially in the next decade in 1990s, because personal computers became much more powerful and much less expensive. And then we had the internet. Remember the internet was only really blocking to the world in the mid 90s. And all of a sudden, it was much easier for lots of people to have access to the proper technologies and to start doing all kind of entrepreneurial innovations. Before that it was very expensive and then with the internet they were able to distribute their offerings online directly to their customers. Previously, they needed distributor channels and it did cost a lot of money. That has changed even more in just the last few years because of the advent of Cloud Computing. People started to do entrepreneurial business. They don’t even need to buy computer equipment anymore. They have a laptop or a smart phone that they use to get access in the cloud. As a result the cost of operating business is getting lower. This is particularly important for emerging economy like India, Africa or Latin America. Because they don’t have that much access to capital as we do here in the United States. So the availability of the internet, cloud computing and mobile devices etc. is going to have a huge impact for entrepreneurialism especially in emerging economy.

Niaz: So what has surprised you most about the rise and spread of the internet over the past 15 years?

Irving Wladawsky-Berger: Wellyouknowwhen I started, before the mid 90s, I was very involved with the Internet but as part of supercomputing before then the internet was primarily used in research lab and universities. And it all started to change with the advent of World Wide Web as well as Web Browser.  It made everything much more accessible. It was so easier to use. Before browsers, it was primarily interfaced that engineers had to learn to use. It wasn’t really available to the majority of people. The internet probably like other disruptive technologies; we knew it was exciting, we knew some good things could happen. But most of us couldn’t anticipate how transformative it would become. As an example, the fact that it would so much transform the media industry,  the music industry, newspapers, video streaming etc. On the other side, some of distinct people were predicting of the internet in the near term, like ‘it would totally transform the economy. You don’t need revenue and cash anymore’. That was wrong. So some of the predictions were just wrong, just like ‘you don’t need revenue and cash anymore’. Because if you are running a business you need revenue, cash and profit. Some of the predictions have been taking a lot longer than people thought in the early days because you needed broadband and things like that. And then other changes happened faster than any of us anticipated. In just an interesting experience, to watch how unpredictable disruptive technologies are.

Niaz: Now what do you think about the future of internet? What significant changes are going to occur in near future?

Irving Wladawsky-Berger: First of all, I think broadband will keep advancing. And that’s being one of the most important changes. When I started using internet in the mid 90s, it was 16kb over a dial modem. Then few years later, it only went to 64kb over dial modem and then broadband came in. And it is getting better and better and better. Now in some countries, as you know, like South Korea, is extremely fast. And I think in US we don’t have that good broadband yet. But it is good to see it continues to be better.  Broadband wireless has come along. And that is very nice. I think the rise of mobile devices like Smart phones in the last few years, has the most important ways of accessing internet. And it has been an absolute phenomenon. And absolute phenomenon.  When the internet first showed off in the mid 90s, we were very worried that the internet was growing you needed to be able to have a PC and in those days time PCs were not that much inexpensive. You needed an internet service provider. That was not inexpensive either. So there was a strong digital divide even with the advanced economy like USA. I remember having a number of important meetings, while I was working in Washington in those days on the digital divide. All that had disappeared as you know mobile devices are so inexpensive. Just about everybody can afford it now.  But not all mobile devices are smart phones yet capable of accessing the internet. And I believe within few years, just about everybody in the world will be able to access the information, resource and application. That is going to be gigantic.  Finally, internet, broadband, cloud computing and disruptive innovations are going to bring changes that will be the most important change over the next few decades.

Niaz: As you know, Big Data has become a hot topic of tech industry. What do you think about Big Data?

Irving Wladawsky-Berger: Big Data is very interesting. And what it means is that we now have access to huge amount of real time data that can be totally analyzed and interpreted to give deep insight. Now I am involved with a new initiative of New York University called Center for Urban Science and Progress. A lot of the promise is to gather lot of information about transportation, energy uses, health and lots of other real time information in the city and being able to use it effectively to better manage the city and to make it more efficient. So now, we have access to big amount of data. But being able to manage those data, being able to run experiments and being able to make sense of data, you need to model. You need a hypothesis that you embedded in a model. Then you test your model against your data to see your model is true or not. If your model is true then the prediction you are making is correct. And if your model is not true, the predictions you are making is incorrect. Like for an example, you can get lots of health care data. But for finding the meaning, using those data efficiently, you have to have a good model. So in my mind big data is very important but more important which I called Data Science. Data Science is the ability to write model to use the data and get inside from what the data is telling and then put it into practice. And the data science is very new even big data itself is very new.  I think that it shows tremendous promise but we now have to build the next layers of data science in the discipline and that will be done discipline by discipline.

Niaz: Over the past twenty years you have been involved in a number of initiatives dealing with disruptive innovations. What do you think about disruptive innovation?

Irving Wladawsky-Berger: I think that the work of Clayton Christensen has been really excellent. People knew that there were disruptive technologies that may change but until Clay wrote his book Innovators Dilemma and I think his next book ‘Innovators Solution’ is even better. I use these books in the graduate course at MIT. These are two excellent books on innovation. People didn’t understand for example why it is so tough to manage disruptive innovation? How is it different from the regular sustaining innovation or incrementing innovation? What do the companies should do with sustaining or incrementing innovation vs. disruptive innovation? And so he framed it in an excellent way to show the differences and to provide the guidelines for companies what they should do and that what they should watch out for. I think he wrote ‘Innovators Dilemma’ around 1990s. Now even today, the reality is, many companies don’t appreciate how difficult it is to truly embrace disruptive innovation. If you go and ask companies about disruptive innovation, they would say they are doing disruptive innovation. But in reality they are just working with incrementing innovation.  But to really be embarrassing disruptive, it’s till culturally very difficult for many companies.

Niaz: What is cloud computing? What are the ideas behind cloud computing?

Irving Wladawsky-Berger: There are many definitions of cloud computing. There is no one definition. I think the reason is that cloud computing is not any one thing. I think that it’s really a new model of computing where the internet is the platform for that computing model. If you look at the history of computing, in the first phase, we had the central computing model and the mainframes in the data center were the main platform of that model. That model lasted from the beginning of the computing industry until let say mid 80s. Then the client server model came.  And in the client server model, the PCs were the central platform of that model. Now cloud computing is a model and it’s totally organized around the internet and it’s totally organized to make it possible to access hardware resources, storage resources, middleware resources, application resources and services over the internet . So cloud computing, when you think about it, the actual computer is totally distributed over the internet in the cloud.  Finally cloud computing is the most interesting model of computing built totally around the internet.

Niaz: How much disruption does cloud computing represent when compared with the Internet?

Irving Wladawsky-Berger: I think cloud is the evolution of the internet. I think cloud computing is a massive disruption. And it is a very big disruptive part of the internet, because it’s totally changing the way people can get access to application and to information. Instead of having them in your PC or in the computers in your firm, you can now easily get whatever you want from the cloud. And you can get it in much standardize ways. So cloud makes it much easier and much less expensive for everybody whether you are a big company or whether you are a small or medium size company or whether you are an individual to get access to very sophisticated applications. And you don’t have to know everything. Remember in the PC days, if you bought an application, you got a disk, you had to load it, then there were new versions and you had to manage those versions by yourself. It was such an advance way over the previous worlds. Everybody was happy. But it was very difficult to use. Cloud as you know the whole world of apps. If you need apps, you can go to apps store. And an app store is basically a cloud store. So you can easily get whatever you need from the app store. When an app has a new release it will tell you. You don’t have to know everything. You have to do anything. It all being engineered and that is making IT capabilities available to many more companies and people. So it’s very disruptive.

Niaz: What do you think about the future of startups which are competing with giants like IBM, Google, Amazon, Facebook?

Irving Wladawsky-Berger: That’s the history of the industry. You know, in the 80s, people said how anybody competes with IBM as IBM is such a big and powerful company. And the few years later, IBM was almost died because client server computing came in and all these companies like Sun Microsystems, Microsoft, Compaq; they almost killed IBM. And locally for me who was there it didn’t die. Then in 90s, you could say, how can anybody compete with Microsoft after windows came up, it was so powerful, it was everything. Google was nothing at the beginning. And here we are now. Every few years we ask this question, here is the most powerful company of the world and what can possibly happen to them?  And you know sometimes nothing happens to them. And they continue being more powerful. Sometimes, in the case of IBM, they reinvent themselves. And they stay very relevant. They are just no longer the most advanced company in the world, they are an important company. But In 70s and 80s it was the leader in the computing industry. I think many people wouldn’t say about IBM now. For competing and surviving in any industry you have to have a very good business model. And for entrepreneurial innovation, coming up with a great business model is the hardest and core challenge.

Niaz: Can you please tell us something about the ways of asking BIG questions to challenge the tradition and come up with disruptive innovation?

Irving Wladawsky-Berger: Niaz, you are asking a very good question because asking big questions, coming with new business idea or business model is very difficult. I would say, in the old days, lot of the ideas came from laboratory if I talk about IT industry. Today, the core of innovation is in the market place. How can you come up with a great new application or a great new solution that will find a market that will find customers who want it. You have to be much focused. You have to have some good ideas. You have to study the market. You have to understand who are likely to be your customers. You have to know who your competitors are going to be. If those competitors are going to be big like Google, Microsoft, Facebook, you have to know, if you are starting a new company, what do you have unique over those companies. But I think that in general the inspiration or new ideas is a combination of creativity and market place. You have to look at the market place and have to be inspired by marketplace. Here are some great ideas you have and bring light. I think I couldn’t able to give good answer. You are asking like ‘Where the great business ideas come from’. It’s like asking movie directors or composers, where do you get your creativity. It’s a similar question. There is no good answer to that.

Niaz: Thank you Irving. I am wishing you very good luck for your good health and all future projects.

Irving Wladawsky-Berger: You are welcome. It was very nice talking to you. And good luck to you Niaz.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. Brian Keegan on Big Data

5. danah boyd on Future of Technology and Social Media

6. James Allworth on Disruptive Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

Brian Keegan: Big Data

Editor’s Note: Brian Keegan is a post-doctoral research fellow in Computational Social Science with David Lazer at Northeastern University. He defended his Ph.D. in the Media, Technology, and Society program at Northwestern University’s School of Communication.  He also attended the Massachusetts Institute of Technology and received bachelors degrees in Mechanical Engineering and Science, Technology, and Society in 2006.

His research employs a variety of large-scale behavioral data sets such as Wikipedia article revision histories, massively-multiplayer online game behavioral logs, and user interactions in a crowd-sourced T-shirt design community. He uses methods in network analysis, multilevel statistics, simulation, and content analysis. To learn more about him, please visit his official website Brianckeegan.com.

eTalk’s Niaz Uddin has interviewed Brian Keegan recently to gain his ideas and insights about Big Data, Data Science and Analytics which is given below.

Niaz: Brian we are really excited to have you to talk about Big Data. Let start from the beginning. How do you define Big Data?

Brian: Thank you Niaz for having me. Well, a common joke in the community is that “big data” is anything that makes Excel crash. That’s neither fair to Microsoft because the dirty secret of data science is that you can get pretty far using Excel nor is it fair to researchers whose data could hypothetically fit in Excel, but are so complicated that it would make no sense to try in the first place.

Big data is distinct from traditional industry and academic approaches to data analysis because of what are called the three Vs: volume, variety, velocity.

      • Volume is what we think of immediately – server farms full of terabytes of user data waiting to be analyzed. This data doesn’t fit into a single machine’s memory, hard drive, or even a traditional database. The size of the data makes analyzing with traditional tools really hard which is why new tools are being created.
      • Second, there’s variety that reflects the fact that data aren’t just lists of numbers, but include complex social relationships, collections of text documents, and sensors. The scope of the data means that all these different kinds of data have different structures, granularity, and errors which need to be cleaned and integrated before you can start to look for relationships among them. Cleaning data is fundamentally unsexy and grueling work, but if you put garbage into a model, all you get garbage back out. Making sure all these diverse kinds of data are playing well with each other and the models you run on them is crucial.
      • Finally, there’s velocity that reflects the fact that data are not only being created in real-time, but people want to act on the incoming information in real time as well. This means the analysis also has to happen in real time which is quite different than the old days where a bunch of scientists could sit around for weeks testing different kinds of models on data collected months or years ago before writing a paper or report that takes still more months before its published. APIs, dashboards, and alerts are part of big data because they make data available fast.

Niaz: Can you please provide us some examples?

Brian: Data that is big is definitely not new. The US Census two centuries ago still required collecting and analyzing millions of data points collected by hand. Librarians and archivists have always struggled with how to organize, find, and share information on millions of physical documents like books and journals. Physicists have been grappling with big data for decades where the data is literally astronomical. Biologists sequencing the genome needed ways to manipulate and compare data involving billions of base pairs.

While “data that was big” existed before computers, the availability of cheap computation has accelerated and expanded our ability to collect, process, and analyze data that is big. So while we now think of things like tweets or financial transactions as “big data” because these industries have rushed to adopt or are completely dependent upon computation, it’s important to keep in mind that lots of big data exist outside of social media, finance, and e-commerce and that’s where a lot of opportunities and challenges still exist.

Niaz: What are some of the possible use cases for big data analytic? What are the major companies producing gigantic amount of Data?

Brian: Most people think of internet companies like Google, Facebook, Twitter, LinkedIn, FourSquare, Netflix, Amazon, Yelp, Wikipedia, and OkCupid when they think of big data. These companies are definitely the pioneers of coming up with the algorithms, platforms, and other tools like PageRank, map-reduce, user-generated content, recommender systems that require combining millions of data points to provide fast and relevant content.

    • Companies like Crimson Hexagon mine Twitter and other social media streams for their clients to detect patterns of novel phrases or changes in the the sentiment associated with keywords and products. This can let their clients know if people are having problems with a product or if a new show is generating a lot of buzz despite mediocre ratings.
    • The financial industry uses big data not only for high-frequency trading based on combining signals from across the market, but also evaluating credit risks of customers by combining various data sets. Retailers like Target and WalMart have large analytics teams that examine consumer transactions for behavioral patterns so they know what products to feature. Telecommunications companies like AT&T or Verizon collect call data records produced by every cell phone on their networks that lets them know your location over time so they can improve coverage. Industrial companies like GE and Boeing put more and more sensors into their products so that they can monitor performance and anticipate maintenance.
    • Finally, one of the largest producers and consumers of big data is the government. Law enforcement agencies publish data about crime and intelligence agencies monitor communication data from suspects. The Bureau of Labor Statistics, Federal Reserve, and World Bank collect and publish extremely rich and useful economic time series data. Meteorologists collect and analyze large amounts of data to make weather forecasts.

Niaz: Why has big data become so important now?

Brian: Whether it was business, politics, or military, decisions were (and continue to be) made under uncertainty about history or context because getting timely and relevant data was basically impossible. Directors didn’t know what customers were saying about their product, politicians didn’t know the issues constituents were talking about, and officers faced a fog of war. Ways of getting data were often slow and/or suspect: for example, broadcast stations used to price advertising time by paying a few dozen people in a city to keep journals of what stations they remember hearing every day. Looking back now, this seems like an insane way not only collect data but also make decisions based on obviously unreliable data, but it’s how things were done for decades because there was no better way of measuring what people were doing. The behavioral traces we leave in tweets and receipts are not only much finer-grained and reliable, but also encompass a much larger and more representative sample of people and their behaviors.

Data lets decision makers know and respond to what the world really looks like instead of going on their gut. More data usually gives a more accurate view, but too much data can also overwhelm and wash out the signal with noise. The job of data scientists less trying to find a single needle in a haystack and more like collecting as much hay as possible to be sure there’s a few needles in there before sorting through the much bigger haystack. In other words, data might be collected for one goal, but it can also be repurposed for other goals and follow-on questions that come along to provide new insights. More powerful computers, algorithms, and platforms make assembling and sorting through these big haystacks much easier than before.

Niaz: Recently I have seen IBM has started to work with Big Data. What roles do companies like IBM play in this area?

Brian: IBM is just one of many companies that are racing “upstream” to analyze data on larger and more complex systems like an entire city by aggregating tweets, traffic, surveillance cameras, electricity consumption, emergency services which feed into each other. IBM is an example of an organization that has shifted from providing value from transforming raw materials into products like computers to transforming raw data into unexpected insights about how a system works — or doesn’t. The secret sauce is collecting existing data, building new data collection systems, and developing statistical models and platforms that are able to work in the big data domain of volume, variety, and velocity that traditional academic training doesn’t equip people.

Niaz: What are the benefits of Big Data to Business? How it is influencing innovation and business?

Brian: Consider the market capitalization of three major tech companies on a per capita basis: Microsoft makes software and hardware as well as running web services like Bing based on big data and is worth about $2.5 million per employee, Google mostly makes software and runs web services and is worth about $4.6 million per employee, and Facebook effectively just runs a web service of its social network site and is worth about $19 million per employee. These numbers may outliers or unreliable for a variety of reasons, but the trend suggests that organizations like Facebook focused solely on data produce more value per employee.

This obviously isn’t a prescription for every company — ExxonMobil, WalMart, GE, and Berkshire produce value in fundamentally different ways. But Facebook did find a way to capture and analyze data about the world — our social relationships and preferences — that was previously hidden. There are other processes happening beyond the world of social media that currently go uncaptured, but the advent of new sensors and opportunities for collecting data that will become ripe for the picking. Mobile phones in developing countries will reveal patterns of human mobility that could transform finance, transportation, and health care. RFIDs on groceries and other products could reveal patterns transportation and consumption that could reduce wasted food while opening new markets. Smart meters and grids could turn the tide against global climate change while lowering energy costs. Politicians could be made more accountable and responsive through crowd sourced fundraising and analysis of regulatory disclosures. The list of data out there waiting to be collected and analyzed boggles the mind.

Niaz: How do you define a Data Scientist? What are your suggestions you have for those who want to become a data scientist?

Brian: A data scientist needs familiarity with a wide set of skills, so much so that it’s impossible for them to be expert in all of them.

      • First, data scientists need the computational skills from learning a programming language like Python or Java so that they can acquire, cleanup, and manipulate data from databases and APIs, hack together different programs developed by people who are far more expert in network analysis or natural language processing, and use difficult tools like MySQL and Hadoop. There’s no point-and-click program out there with polished tutorials that does everything you’ll need from end-to-end. Data scientists spend a lot of time writing code, working at the command line, and reading technical documentation but there are tons of great resources like StackOverflow, GitHub, free online classes, and active and friendly developer communities where people are happy to share code and solutions.
      • Second, data scientists need statistical skills at both a theoretical and methodological level. This is the hardest part and favors people who have backgrounds in math and statistics, computer and information sciences, physical sciences and engineering, or quantitative social sciences. Theoretically, they need to know why some kinds of analyses should be run on some kinds but not other kinds of data and what the limitations of one kind of model are compared to others. Methodologically, data scientists need to actually be able to run these analyses using statistical software like R, interpret the output of the analyses, and do the statistical diagnostics to make sure all the assumptions that are baked into a model are actually behaving properly.
      • Third, data scientists need some information visualization and design skills so they can communicate their findings in an effective way with charts or interactive web pages for exploration. This means learning to use packages like ggplot in R or matplotlib in Python for statistical distributions, d3 in Javascript for interactive web visualizations, or Gephi for network visualizations.

All of the packages I mentioned are open-source which also reflects the culture in the data science community; expensive licenses for software or services are very suspect because others should be able to easily replicate and build upon your analysis and findings.

Niaz: Finally, what do you think about the impact of Big Data in our everyday life?

Brian: Big Data is a dual-use technology that can satisfy multiple goals, some of which may be valuable and others which may be unsavory. On one hand it can help entrepreneurs be more nimble and open new markets or researchers make new insights about how the world works, on the other hand, the Arab Spring suggested it can also reinforce the power of repressive regimes to monitor dissidents or unsavory organizations to do invasive personalized marketing.

Danah Boyd and Kate Crawford have argued persuasively about how the various possibilities of big data to address societal ills or undermine social structure obscure the very real but subtle changes that are happening right now that replace existing theory and knowledge, cloak subjectivity with quantitative objectivity, confuse bigger data with better data, separate data from context and meaning, raise real ethical questions, and create or reinforce inequalities.

Big data also raises complicated questions about who has access to data. On one hand, privacy is a paramount concern as organizations shouldn’t be collecting or sharing data about individuals without their consent. On the other hand, there’s also the expectation that data should be shared with other researchers so they can validate findings. Furthermore, data should be preserved and archived so that it is not lost to future researchers who want to compare or study changes over time.

Niaz: Brian, Thank you so much for giving me time in the midst of your busy schedule. It is really great to know the details of Big Data from you. I am wishing you good luck with your study, research, projects and works.

Brian: You are welcome. Good luck to you too.

_  _  _  _  ___  _  _  _  _

Further Reading:

1. Viktor Mayer-Schönberger on Big Data Revolution

2. Gerd Leonhard on Big Data and the Future of Media, Marketing and Technology

3. Ely Kahn on Big Data, Startup and Entrepreneurship

4. James Kobielus on Big Data, Cognitive Computing and Future of Product

5. danah boyd on Future of Technology and Social Media

6. Irving Wladawsky-Berger on Evolution of Technology and Innovation

7. Horace Dediu on Asymco, Apple and Future of Computing

8. James Allworth on Disruptive Innovation