Practical Statistics for Data Scientists

50 Essential Concepts

Practical Statistics for Data Scientists

Statistical methods are a key part of of data science, yet very few data scientists have any formal statistics training. Courses and books on basic statistics rarely cover the topic from a data science perspective. This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R programming language, and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you’ll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that “learn” from data Unsupervised learning methods for extracting meaning from unlabeled data

Practical Statistics for Data Scientists

50+ Essential Concepts Using R and Python

Practical Statistics for Data Scientists

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not. Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format. With this book, you’ll learn: Why exploratory data analysis is a key preliminary step in data science How random sampling can reduce bias and yield a higher-quality dataset, even with big data How the principles of experimental design yield definitive answers to questions How to use regression to estimate outcomes and detect anomalies Key classification techniques for predicting which categories a record belongs to Statistical machine learning methods that "learn" from data Unsupervised learning methods for extracting meaning from unlabeled data

Data Mining for Business Analytics

Concepts, Techniques and Applications in Python

Data Mining for Business Analytics

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python presents an applied approach to data mining concepts and methods, using Python software for illustration Readers will learn how to implement a variety of popular data mining algorithms in Python (a free and open-source software) to tackle business problems and opportunities. This is the sixth version of this successful text, and the first using Python. It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis. It also includes: A new co-author, Peter Gedeck, who brings both experience teaching business analytics courses using Python, and expertise in the application of machine learning methods to the drug-discovery process A new section on ethical issues in data mining Updates and new material based on feedback from instructors teaching MBA, undergraduate, diploma and executive courses, and from their students More than a dozen case studies demonstrating applications for the data mining techniques described End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented A companion website with more than two dozen data sets, and instructor materials including exercise solutions, PowerPoint slides, and case solutions Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python is an ideal textbook for graduate and upper-undergraduate level courses in data mining, predictive analytics, and business analytics. This new edition is also an excellent reference for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology. “This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at the least a definitive manual on the subject.” —Gareth M. James, University of Southern California and co-author (with Witten, Hastie and Tibshirani) of the best-selling book An Introduction to Statistical Learning, with Applications in R

Essential Statistics for the Pharmaceutical Sciences

Essential Statistics for the Pharmaceutical Sciences

"... this text takes a novel approach... The style... is not as dry as other statistics texts, and so should not be intimidating even to a relative newcomer to the subject... The layout is easy to navigate, there are chapter aims, summaries and “key point boxes” throughout." -The Pharmaceutical Journal, 2008 This text is a clear, accessible introduction to the key statistical techniques employed for the analysis of data within this subject area. Written in a concise and logical manner, the book explains why statistics are necessary and discusses the issues that experimentalists need to consider. The reader is carefully taken through the whole process, from planning an experiment to interpreting the results, avoiding unnecessary calculation methodology. The most commonly used statistical methods are described in terms of their purpose, when they should be used and what they mean once they have been performed. Numerous examples are provided throughout the text, all within a pharmaceutical context, with key points highlighted in summary boxes to aid student understanding. Essential Statistics for the Pharmaceutical Sciences takes a new and innovative approach to statistics with an informal style that will appeal to the reader who finds statistics a challenge! This book is an invaluable introduction to statistics for any science student. It is an essential text for students taking biomedical or pharmaceutical-based science degrees and also a useful guide for researchers.

Resampling Methods

A Practical Guide to Data Analysis. Second Edition

Resampling Methods

"Most introductory statistics books ignore or give little attention to resampling methods, and thus another generation learns the less than optimal methods of statistical analysis. Good attempts to remedy this situation by writing an introductory text that focuses on resampling methods, and he does it well."- Ron C. Fryxell, Albion College"...The wealth of the bibliography covers a wide range of disciplines."---Dr. Dimitris Karlis, Athens University of EconomicsThis thoroughly revised second edition is a practical guide to data analysis using the bootstrap, cross-validation, and permutation tests. It is an essential resource for industrial statisticians, statistical consultants, and research professionals in science, engineering, and technology.Only requiring minimal mathematics beyond algebra, it provides a table-free introduction to data analysis utilizing numerous exercises, practical data sets, and freely available statistical shareware.Topics and Features:* Offers more practical examples plus an additional chapter dedicated to regression and data mining techniques and their limitations* Uses resampling approach to introduction statistics* A practical presentation that covers all three sampling methods: bootstrap, density-estimation, and permutations* Includes systematic guide to help one select the correct procedure for a particular application* Detailed coverage of all three statistical methodologies: classification, estimation, and hypothesis testing* Suitable for classroom use and individual, self-study purposes* Numerous practical examples using popular computer programs such as SAS(r), Stata(r), and StatXact(r)* Useful appendixes with computer programs and code to develop individualized methods* Downloadable freeware from author's website: http://users.oco.net/drphilgood/resamp.htmWith its accessible style and intuitive topic development, the book is an excellent basic resource for the power, simplicity, and versatility of the bootstrap, cross-validation, and permutation tests. Students, professionals, and researchers will find it a prarticularly useful handbook for modern resampling methods and their applications.

Geostatistics Explained

An Introductory Guide for Earth Scientists

Geostatistics Explained

This reader-friendly introduction to geostatistics demystifies complex concepts and makes formulas and statistical tests easy to apply. With wide-ranging examples from topics across the Earth and environmental sciences, and worked examples at the end of each chapter, this book can be used for undergraduate courses or for self-study and reference.

Influenza Models

Prospects for Development and Use

Influenza Models

Kilbourne (1973) described the student of influenza as "continually looking back over his shoulder and asking 'what happened?', in the hope that understanding of past events will alert him to the catastrophies ofthe future". Experience suggests the futility of such a hope, since the most predictable feature of influenza is its unpredictability. Nonetheless, the stubborn viabil ity of this hope is strongly affirmed by the many attempts, described and discussed in this volume, to develop a useful and practical representation of influenza virus behavior. I hasten to add, however, that the desired model has yet to be perfected. The existence and usefulness of animal models of infectious diseases of man are well documented. Reproduction of disease by infecting an experimental animal satisfies the third of Koch's four postulates to establish proof of disease causation by a specific bacterium. Animal models also have been extremely useful in studies of the pathogenesis, immunoprophylaxis, and specific therapy of several important diseases, ineluding (with only modest success) influenza. Development of such a model is simple, at least in concept. and can be achieved by one or only a few scientists.

Sensory Evaluation of Food: Principles and Practices

Sensory Evaluation of Food: Principles and Practices

The book is designed as a text for undergraduate and graduate courses in sensory evaluation and as a reference for industrial practitioners. It covers all the basic techniques of sensory testing, from simple discrimination tests to home use placements for consumers. It provides a practical guide to how tests are conducted and, for the reader who wishes a deeper understanding, provides the fundamental psychological and statistical theories that form the basis and rationale for sensory test design. Statistics used in sensory evaluation are demonstrated as integrated applications in the context of appropriate sensory methods and are also presented as a stand-alone material in appendixes. Statistical applications are tailored to common and relevance are obvious, and space is not wasted on designs or analyses that are not suitable for data collection from human observers. The text presents divergent philosophies in a balanced manner. Chapters are constructed so that beginning students who want only practical aspects of conducting sensory tests will find clear instructions on how tests should be conducted. Advanced students and practitioners will profit from the detailed section on rationale and sensory evaluation issues. "It covers the entire spectrum of sensory analysis. I have read many books on this intriguing subject, but this is the Rolls-Royce." a?? Aubrey Parsons, governing council member, International Union for Food Science and Technology