----- 大数据分析:经理人实用指南
Introduction So What Is Big Data? Growing Interest in Decision Making What This Book Addresses The Conversation about Big Data Technological Change as a Driver of Big Data The Central Question: So What? Our Goals as Authors References The Mother of Invention's Triplets: Moore's Law, the Proliferation of Data, and Data Storage Technology Moore's Law Parallel Computing, Between and Within Machines Quantum Computing Recap of Growth in Computing Power Storage, Storage Everywhere Grist for the Mill: Data Used and Unused Agriculture Automotive Marketing in the Physical World Online Marketing Asset Reliability and Efficiency Process Tracking and Automation Toward a Definition of Big Data Putting Big Data in Context Key Concepts of Big Data and Their Consequences Summary References. Hadoop Power through Distribution Cost Effectiveness of Hadoop Not Every Problem Is a Nail Some Technical Aspects Troubleshooting Hadoop Running Hadoop Hadoop File System MapReduce Pig and Hive Installation Current Hadoop Ecosystem Hadoop Vendors Cloudera Amazon Web Services (AWS) Hortonworks IBM Intel MapR Microsoft To Run Pig Latin Using Powershell Pivotal References HBase and Other Big Data Databases Evolution from Flat File to the Three V's Flat File Hierarchical Database Network Database Relational Database Object-Oriented Databases Relational-Object Databases Transition to Big Data Databases What Is Different bbout HBase? What Is Bigtable? What Is MapReduce? What Are the Various Modalities for Big Data Databases? Graph Databases How Does a Graph Database Work? What is the Performance of a Graph Database? Document Databases Key-Value Databases Column-Oriented Databases HBase Apache Accumulo References Machine Learning Machine Learning Basics Classifying with Nearest Neighbors Naive Bayes Support Vector Machines Improving Classification with Adaptive Boosting Regression Logistic Regression Tree-Based Regression K-Means Clustering Apriori Algorithm Frequent Pattern-Growth Principal Component Analysis (PCA) Singular Value Decomposition Neural Networks Big Data and MapReduce Data Exploration Spam Filtering Ranking Predictive Regression Text Regression Multidimensional Scaling Social Graphing References Statistics Statistics, Statistics Everywhere Digging into the Data Standard Deviation: The Standard Measure of Dispersion The Power of Shapes: Distributions Distributions: Gaussian Curve Distributions: Why Be Normal? Distributions: The Long Arm of the Power Law The Upshot? Statistics Are not Bloodless Fooling Ourselves: Seeing What We Want to See in the Data We Can Learn Much from an Octopus Hypothesis Testing: Seeking a Verdict Two-Tailed Testing Hypothesis Testing: A Broad Field Moving on to Specific Hypothesis Tests Regression and Correlation p Value in Hypothesis Testing: A Successful Gatekeeper? Specious Correlations and Overfitting the Data A Sample of Common Statistical Software Packages Minitab SPSS R SAS Big Data Analytics Hadoop Integration Angoss Statistica Capabilities Summary References Google Big Data Giants Google Go Android Google Product Offerings Google Analytics Advertising and Campaign Performance Analysis and Testing Facebook Ning Non-United States Social Media Tencent Line Sina Weibo Odnoklassniki Vkontakte Nimbuzz Ranking Network Sites Negative Issues with Social Networks Amazon Some Final Words References Geographic Information Systems (GIS) GIS Implementations A GIS Example GIS Tools GIS Databases References Discovery Faceted Search versus Strict Taxonomy First Key Ability: Breaking Down Barriers Second Key Ability: Flexible Search and Navigation Underlying Technology The Upshot Summary References Data Quality Know Thy Data and Thyself Structured, Unstructured, and Semistructured Data Data Inconsistency: An Example from This Book The Black Swan and Incomplete Data How Data Can Fool Us Ambiguous Data Aging of Data or Variables Missing Variables May Change the Meaning Inconsistent Use of Units and Terminology Biases Sampling Bias Publication Bias Survivorship Bias Data as a Video, Not a Snapshot: Different Viewpoints as a Noise Filter What Is My Toolkit for Improving My Data? Ishikawa Diagram Interrelationship Digraph Force Field Analysis Data-Centric Methods Troubleshooting Queries from Source Data Troubleshooting Data Quality beyond the Source System Using Our Hidden Resources Summary References Benefits Data Serendipity Converting Data Dreck to Usefulness Sales Returned Merchandise Security Medical Travel Lodging Vehicle Meals Geographical Information Systems New York City Chicago CLEARMAP Baltimore San Francisco Los Angeles Tucson, Arizona, University of Arizona, and COPLINK Social Networking Education General Educational Data Legacy Data Grades and other Indicators Testing Results Addresses, Phone Numbers, and More Concluding Comments References Concerns Part Two: Basic Principles of National Application Collection Limitation Principle Data Quality Principle Purpose Specification Principle Use Limitation Principle Security Safeguards Principle Openness Principle Individual Participation Principle Accountability Principle Logical Fallacies Affirming the Consequent Denying the Antecedent Ludic Fallacy Cognitive Biases Confirmation Bias Notational Bias Selection/Sample Bias Halo Effect Consistency and Hindsight Biases Congruence Bias Von Restorff Effect Data Serendipity Converting Data Dreck to Usefulness Sales Merchandise Returns Security CompStat Medical Travel Lodging Vehicle Meals Social Networking Education Making Yourself Harder to Track Misinformation Disinformation Reducing/Eliminating Profiles Social Media Self Redefinition Identity Theft Facebook Concluding Comments References Epilogue Michael Porter's Five Forces Model Bargaining Power of Customers Bargaining Power of Suppliers Threat of New Entrants Others The OODA Loop Implementing Big Data Nonlinear, Qualitative Thinking Closing References
{{comment.content}}