Breaking Down Barriers
- Hits: 3696
Three decades ago, Joseph M. Juran coined the term "big Q" to capture a fundamental change that was occurring in the field of quality management. He labeled the narrow focus on product quality "little q" and called the new practice "big Q" because it focused on improvement of all organizational processes.
Big Q has drastically broadened the quality umbrella and fundamentally improved the effectiveness of quality management practices and philosophies.1, 2
Today, the world is experiencing another transformation from small to big: from small data to big data. The nature of big data presents positive and negative impacts to our society, and quality professionals must learn how to survive and succeed in the big data world.
What are big data?
Big data are a collection of data sets that are too big and too complex to be processed using traditional database and data processing tools.3 The characteristics of these giants can be summarized by three Vs:4, 5
Volume. The amount of data stored in the world is growing exponentially. It reached 1.2 zettabytes (one zettabyte equals 1021 bytes) in 2013 and will be at an estimated volume of eight zettabytes by 2015. Yes, that is the number eight followed by 21 zeros. At the same time, the cost of data storage is dropping in the same pattern—from about $1 million per gigabyte (one gigabyte equals to 109 bytes) in the 1980s to 10 cents per gigabyte by 2010.
Velocity. The increasing rate at which data travel has followed a trend similar to that of volume growth. Data are generated, collected, stored and processed with increasing speed to meet the demand for data.
Variety. The sources of big data are everywhere. Databases, documents, emails, phone records, meters, sensors, images, audio and video files, and financial transactions are examples of sources. Increasingly, the actions you take, the words you speak or type, the websites you visit, the locations where you stay and the people you meet are recorded and stored somewhere.6
The power of big data
Even at the dawn of their era, big data have already flexed their huge muscles on every aspect of our lives, accelerating economic growth, scientific progress and international development.
In the big data world, far more data can be processed, and in some cases, all the data relating to a particular phenomenon can be analyzed. Furthermore, big data allow a look into subcategories in a way traditional sampling analysis can never achieve.
At the center of big data is prediction—applying math to huge data sets to infer probabilities. Big data analysis is playing an ever-increasing role in decision making in private and public sectors. Table 1 lists some application examples of big data analysis, just to show the tip of the iceberg.7, 8
Big data now are raw business materials that create a new form of economic value. In fact, big data have become a fountain of value because data are reused again and again for different purposes (the secondary use) after the first use (the primary use).9
The dark side of big data
Big data, if not managed properly, may pose a great threat to privacy, human volition and democracy.
The recent debate on the National Security Agency’s surveillance program is not going to end anytime soon, and it fuels a broader discussion on how to protect the privacy of ordinary citizens in the big data era.10 Not only do big data increase the risks to privacy, but they also change the character of the risks. Because big data are available for secondary use, the effectiveness of the traditional ways of protecting privacy—individual notice and consent, opting out and anonymity—have been largely lost.11
A growing number of parole boards in the United States are using the predictions from big data analysis as factors in making decisions on granting inmates parole. More cities are using predictive analytics to select locations and individuals that should be subject to extra scrutiny.
It would be dangerous if the decisions on punishment or scrutiny were based mainly on the probability of crimes that have not been committed. The fundamental problems of such a system could go far beyond the law enforcement field, including everything from employers firing employees, banks denying mortgage applications or a wife divorcing her husband, just because of a high probability of a bad act that has yet to happen.12
If the output of big data analysis is allowed to rule by placing unmerited faith in predictive analytics, democracy as we know it could be in danger.
Change your mindset
So, what do big data mean to quality professionals? Are you ready to meet the challenges of this transformation that is fundamentally changing the way you live and work?
You must change the way you think about sample size, exactitude and causality. The first mindset change is to move from some to all. In the small data world, you’re forced to settle for small data sets because it is impossible to collect and analyze all data. To make the sample statistics represent the whole population, you collect sample data as randomly as possible.
Achieving total randomness, however, is difficult, if not impossible, and random sampling does not scale easily to break results into subgroups. Random sampling also risks missing some important information that does not appear frequently. Bias may slip into the sampling process, for example, in answering questionnaires.
In the big data world, you can collect much more data or even all data (N = all). Using all data makes it possible to find connections and to explore details and subgroups. Big data give you freedom to check many hypotheses and to examine data closely from different angles. Big data also reduce bias associated with sampling because data are collected quietly when people are conducting their day-to-day activities.13
The second mindset change is to relax exactitude for probability. In the small data world, clarity and certainty are often demanded as much as possible. Entering the world of big data, you needn’t worry about individual data points biasing the overall analysis because you can rely on a large amount of data to make predictions. In fact, by relaxing exactitude, you are able to collect more data that will improve the accuracy of the prediction outcomes and help get a more complete sense of reality.
Yes, big data are messy as long as the tools used to collect and analyze information are imperfect, but obsession with exactitude in the big data era is often counterproductive because it wastes resources and hinders the effort of collecting and analyzing more data. Instead of treating inexact attitude as a problem, consider it a part of the reality.14
The third mindset change is to loosen up causality for correlation. In the small data world, choices of proxies for correlation analysis are often based on some hypotheses, and their accuracy is examined by correlation analysis. This is a slow and expensive process that is often clouded with prejudice and intuition.
Investigating real causality based on small data sets is often not practical and frequently serves as a shortcut to confirm existing knowledge and beliefs. In the big data age, vast data are available, and powerful computing power can quickly identify the optimal proxies. Correlation analysis based on big data provides probability, not certainty, and tells us what, not why. In many cases, this fast and cheap non causal analysis is good enough, and it aids causality study as it provides likely causes up front.15
Get more data
Data are becoming the new oil that fuel economic engines. One day, the value of data will appear on corporate balance sheets as a new asset class. Those on the big data value chain—the data holders, data specialists and institutions with a big data mindset—will benefit from big data, but ultimately, most value lies in the data themselves. This is because data always speak—there’s always something to learn. Value can always be extracted from data’s primary and secondary uses.
In a world that "data-fies" everything, organizations that master big data have a chance to outperform their competitors and widen their leads. Small but nimble players can enjoy scale without mass, while mid-sized data holders will be under great pressure to fight for their survival.16
Quality professionals must help their organizations get more data and aid decision making with big data analysis, as demonstrated by the examples in Table 1.
You can gain more internal data by automating and "data-fying" business processes. Big data should be used not just to generate reports, but also to distill patterns for predictive analytics.
By carrying out data mining on public databases owned by governments, the outcomes of predictive analytics can be used not only for governmental purposes, but also for a nongovernmental organization’s benefit.
Organizations can buy data from private data holders, and share and merge data with other data holders through intermediaries.17 Under some business arrangements, all contributors to the merged data sets can extract value from them.
Finally, organizations can exploit more value from data sets by reusing them for different purposes, by combining different data sets into a new one, by designing extensibility into big data sets for multiple uses and by harvesting data exhaust—the data shed as a byproduct of people’s actions and movements.18 Data can, if necessary, be sold to data buyers, and an organization can charge fees to those who use its data.
Upgrade skills
In the small data world, people base decisions on a combination of facts, reflections and educated guesses. The latent knowledge accumulated throughout life experiences plays a critical role in the traditional decision-making process. This is going to change in the big data world, where decisions are made, or at least confirmed, by big data analysis.
Quality professionals must learn the skills associated with big data, such as statistics, rudimentary predictive modeling and basic computer programming, to leverage it in the decision-making process.
For statistics, tools for correlation and regression are most important to grasp.19-21 The Pearson product-moment coefficient is the most common correlation method that measures the linear relationship between two variables, while the Spearman’s rank correlation coefficient is a nonparametric measure of statistical dependence between two variables. In the regression area, least square is a commonly used method, and nonlinear, orthogonal and logistic regressions are other regression procedures.
In terms of predictive modeling, a decision tree is a method you may want to learn first because it is most popular among big data scientists due to its balance of simplicity with effectiveness. It is not nearly as complex as you might think.
Figure 1 is the decision tree chart used by Chase Bank to predict the risk of prepayment by individual mortgage holders.22, 23 It is basically a flow chart-type structure with rectangles representing a test on an attribute, and each branch coming out of that test representing an outcome of the test. Each branch leads to a new test until it reaches a decision (in this case, the decision is the risk of prepayment).
To predict the risk of prepayment of a mortgage holder, each case tumbles down the tree from top to bottom, going through a series of tests until it reaches its destination. For example, the model predicts that Mary Bowser, a mortgage holder, has a 25.6% propensity to prepay her mortgage. The prediction is based on her data: an interest rate of 8.8%, a mortgage of $100,000 and a loan/value ratio of 80%. The path of the predictive analytics on her case is highlighted in blue in Figure 1.
Other predictive modeling tools that can be used include artificial neural networks, loglinear regression, support vector machines and TreeNet.24 If you want to learn predictive modeling, read some online information and books or take a free online course.25-28
Quality professionals must learn how to work cohesively with big data scientists—the specialists in data analysis, artificial intelligence and statistics. Their skill sets are complementary to those of quality professionals, and they bring something new to the game: They usually have developed a professional habit of letting the data speak without prejudgment and prejudice.
You also can keep current on government policies and regulations aimed at protecting the society from the risks of big data and at preventing the creation of data barons. Victor Mayer-Schönberger and Kenneth Cukier, authors of Big Data: A Revolution That Will Transform How We Live, Work and Think, envision the following developments in government policies and regulations on big data:
Hold data users accountable for what they do.
Implement "differential privacy" to blur some data sets so a data query will get only approximate results.
Guarantee human agency so significant decisions made by government and organizations that affect people will be based on people’s behaviors and actions, not simply on the predictions from big data analysis.
Build a new profession called algorithmist (or another suitable name) to take on the tasks of monitoring and auditing accountability, traceability and fair competition.29
Explore hinterland
In the big data world, the experience of quality professionals will lose some of its value, and much of the traditional job functions as a quality guard will be performed by computers. Quality professionals must venture into the spaces that big data can’t predict: to dream, to think outside of the box, to adventure and to invent. They must become the right-brainers who are capable of supplementing their technical skills with abilities that are high concept (design, story and symphony) and high touch (empathy, play and meaning).30
Design is about discovering patterns and opportunities and creating beauty. A person with story ability knows how to craft convincing narratives. Symphony is an analogy referring to combining different ideas into something new. Empathy is about identifying with and understanding other people’s emotions and feelings. Play means that you find joy in your life and elicit delight in others. Meaning refers to a pursuit of purpose and meaning.31
Sailing into uncharted waters with a new mindset, powerful data sets and upgraded skills, quality professionals will meet the challenges of big data with confidence and flourish in this new age. This is an important aspect of big quality in the 21st century.
References and notes
- B. Solomon, "It’s All About ‘Big Q,’" The Big Q Blog, Juran Institute, Jan. 21,2011, www.juran.com/blog/?p=188.
- Russell T. Westcott, The Certified Manager of Quality/Organizational Excellence Handbook, third edition, ASQ Quality Press, 2005.
- Wikipedia, "Big Data," http://en.wikipedia.org/wiki/big_data.
- Gartner, "Gartner Says Solving ‘Big Data’ Challenge Involves More Than Just Managing Volumes of Data," Jan. 27, 2011, www.gartner.com/newsroom/id/1731916.
- Big Data Now: 2012 Edition, O’Reilly Media Inc., 2012.
- Ibid.
- Eric Siegel, Predictive Analytics: The Power to Predict Who Will Click, Buy or Die, Wiley, 2013.
- Victor Mayer-Schönberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work and Think, Eamon Dolan/Houghton Mifflin Harcourt, 2013.
- Ibid.
- For details about the U.S. National Security Agency’s (NSA) surveillance program, see Bryan Walsh, "The NSA’s Big Data Problem," Time, June 2013, pp. 23-25.
- Mayer-Schönberger, Big Data, see reference 8.
- Ibid.
- Ibid.
- Ibid.
- Ibid.
- Ibid.
- Ibid.
- Ibid.
- T.M. Kubiak, The Certified Six Sigma Master Black Belt, ASQ Quality Press, 2012.
- Robert S. Witte and John S. Witte, Statistics, ninth edition, Wiley, 2009.
- Minitab 16 software, regression and correlation sections.
- Siegel, "Predictive Analytics," see reference 7.
- Ali Moazami and Shaolin Li, "Mortgage Business Transformation Program Using CART-based Joint Risk Modeling," Salford Systems Data Mining Conference, 2005, http://docs.salford-systems.com/MoazamiLi.pdcase sensitive).
- Siegel, Predictive Analytics, see reference 7.
- Robert Nisbit, John Elder and Gary Miner, Handbook of Statistical Analysis and Data Mining Applications, Academic Press, 2009.
- Tom M. Mitchell, Machine Learning: Science/Engineering/Math, McGraw-Hill, 1997.
- Saed Sayad, "An Introduction to Data Mining,"http://www.saedsayad.com.
- Stanford University offers a free online course at www.coursera.org/course/ml.
- Mayer-Schönberger, Big Data, see reference 8.
- Daniel H. Pink, A Whole New Mind: Why Right-Brainers Will Rule the Future, Riverhead Trade, 2006.
- Ibid.
Article Credits: QP