talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Data Scientists at Work

Data Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. "Data scientist is the sexiest job in the 21st century," according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, according to a McKinsey report. Through incisive in-depth interviews, this book mines the what, how, and why of the practice of data science from the stories, ideas, shop talk, and forecasts of its preeminent practitioners across diverse industries: social network (Yann LeCun, Facebook); professional network (Daniel Tunkelang, LinkedIn); venture capital (Roger Ehrenberg, IA Ventures); enterprise cloud computing and neuroscience (Eric Jonas, formerly Salesforce.com); newspaper and media (Chris Wiggins, The New York Times); streaming television (Caitlin Smallwood, Netflix); music forecast (Victor Hu, Next Big Sound); strategic intelligence (Amy Heineike, Quid); environmental big data (Andre´ Karpis?ts?enkoEach of these data scientists shares how he or she tailors the torrent-taming techniques of big data, data visualization, search, and statistics to specific jobs by dint of ingenuity, imagination, patience, and passion. , Planet OS); geospatial marketing intelligence (Jonathan Lenaghan, PlaceIQ); advertising (Claudia Perlich, Dstillery); fashion e-commerce (Anna Smith, Rent the Runway); specialty retail (Erin Shellman, Nordstrom); email marketing (John Foreman, MailChimp); predictive sales intelligence (Kira Radinsky, SalesPredict); and humanitarian nonprofit (Jake Porway, DataKind). The book features a stimulating foreword by Google's Director of Research, Peter Norvig. Data Scientists at Work parts the curtain on the interviewees’ earliest data projects, how they became data scientists, their discoveries and surprises in working with data, their thoughts on the past, present, and future of the profession, their experiences of team collaboration within their organizations, and the insights they have gained as they get their hands dirty refining mountains of raw data into objects of commercial, scientific, and educational value for their organizations and clients.

Can algorithms help you find love? Many happy couples successfully brought together via online dating websites show us that data science can help you find love. I'm joined this week by Thomas Levi, Senior Data Scientist at Plenty of Fish, to discuss some of his work which helps people find one another as efficiently as possible. Matchmaking is a truly non-trivial problem, and one that's dynamically changing all the time as new users join and leave the "pool of fish". This episode explores the aspects of what makes this a tough problem and some of the ways POF has been successfully using data science to solve it, and continues to try to innovate with new techniques like interest matching. For his benevolent references, Thomas suggests readers check out All of Statistics as well as the caret library for R. And for a self serving recommendation, follow him on twitter (@tslevi) or connect with Thomas Levi on Linkedin.

This week's episode explores the possibilities of extracting novel insights from the many great social web APIs available. Matthew Russell's Mining the Social Web is a fantastic exploration of the tools and methods, and we explore a few related topics. One helpful feature of the book is it's use of a Vagrant virtual machine. Using it, readers can easily reproduce the examples from the book, and there's a short video available that will walk you through setting up the Mining the Social Web virtual machine. The book also has an accompanying github repository which can be found here. A quote from Matthew that particularly reasonates for me was "The first commandment of Data Science is to 'Know thy data'." Take a listen for a little more context around this sage advice. In addition to the book, we also discuss some of the work done by Digital Reasoning where Matthew serves as CTO. One of their products we spend some time discussing is Synthesys, a service that processes unstructured data and delivers knowledge and insight extracted from the data. Some listeners might already be familiar with Digital Reasoning from recent coverage in Fortune Magazine on their cognitive computing efforts. For his benevolent recommendation, Matthew recommends the Hardcore History Podcast, and for his self-serving recommendation, Matthew mentioned that they are currently hiring for Data Science job opportunities at Digital Reasoning if any listeners are looking for new opportunities.

Jeff Stanton joins me in this episode to discuss his book An Introduction to Data Science, and some of the unique challenges and issues faced by someone doing applied data science. A challenge to any data scientist is making sure they have a good input data set and apply any necessary data munging steps before their analysis. We cover some good advise for how to approach such problems.

The Data Skeptic Podcast is launching a contest- not one of chance, but one of skill. Listeners are encouraged to put their data science skills to good use, or if all else fails, guess! The contest works as follows. Below is some data about the cumulative number of downloads the podcast has achieved on a few given dates. Your job is to predict the date and time at which the podcast will recieve download number 27,182. Why this arbitrary number? It's as good as any other arbitrary number! Use whatever means you want to formulate a prediction. Once you have it, wait until that time and then post a review of the Data Skeptic Podcast on iTunes. You don't even have to leave a good review! The review which is posted closest to the actual time at which this download occurs will win a free copy of Matthew Russell's "Mining the Social Web" courtesy of the Data Skeptic Podcast. "Price is Right" rules are in play - the winner is the person that posts their review closest to the actual time without going over. More information at dataskeptic.com

Data Science at the Command Line

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data. To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools. Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on plain text, CSV, HTML/XML, and JSON Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow using Drake Create reusable tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines using GNU Parallel Model data with dimensionality reduction, clustering, regression, and classification algorithms

Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R, Revised and Expanded Edition

To succeed with predictive analytics, you must understand it on three levels: Strategy and management Methods and models Technology and code This up-to-the-minute reference thoroughly covers all three categories. Now fully updated, this uniquely accessible book will help you use predictive analytics to solve real business problems and drive real competitive advantage. If you’re new to the discipline, it will give you the strong foundation you need to get accurate, actionable results. If you’re already a modeler, programmer, or manager, it will teach you crucial skills you don’t yet have. Unlike competitive books, this guide illuminates the discipline through realistic vignettes and intuitive data visualizations– not complex math. Thomas W. Miller, leader of Northwestern University’s pioneering program in predictive analytics, guides you through defining problems, identifying data, crafting and optimizing models, writing effective R code, interpreting results, and more. Every chapter focuses on one of today’s key applications for predictive analytics, delivering skills and knowledge to put models to work–and maximize their value. Reflecting extensive student and instructor feedback, this edition adds five classroom-tested case studies, updates all code for new versions of R, explains code behavior more clearly and completely, and covers modern data science methods even more effectively. All data sets, extensive R code, and additional examples available for download at http://www.ftpress.com/miller If you want to make the most of predictive analytics, data science, and big data, this is the book for you. Thomas W. Miller’s unique balanced approach combines business context and quantitative tools, appealing to managers, analysts, programmers, and students alike. Miller addresses multiple business cases and challenges, including segmentation, brand positioning, product choice modeling, pricing research, finance, sports, text analytics, sentiment analysis, and social network analysis. He illuminates the use of cross-sectional data, time series, spatial, and spatio-temporal data. You’ll learn why each problem matters, what data are relevant, and how to explore the data you’ve identified. Miller guides you through conceptually modeling each data set with words and figures; and then modeling it again with realistic R programs that deliver actionable insights. You’ll walk through model construction, explanatory variable subset selection, and validation, mastering best practices for improving out-of-sample predictive performance. Throughout, Miller employs data visualization and statistical graphics to help you explore data, present models, and evaluate performance. This edition adds five new case studies, updates all code for the newest versions of R, adds more commenting to clarify how the code works, and offers a more detailed and up-to-date primer on data science methods. Gain powerful, actionable, profitable insights about: Advertising and promotion Consumer preference and choice Market baskets and related purchases Economic forecasting Operations management Unstructured text and language Customer sentiment Brand and price Sports team performance And much more

Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science

Master predictive analytics, from start to finish Start with strategy and management Master methods and build models Transform your models into highly-effective code—in both Python and R This one-of-a-kind book will help you use predictive analytics, Python, and R to solve real business problems and drive real competitive advantage. You’ll master predictive analytics through realistic case studies, intuitive data visualizations, and up-to-date code for both Python and R—not complex math. Step by step, you’ll walk through defining problems, identifying data, crafting and optimizing models, writing effective Python and R code, interpreting results, and more. Each chapter focuses on one of today’s key applications for predictive analytics, delivering skills and knowledge to put models to work—and maximize their value. Thomas W. Miller, leader of Northwestern University’s pioneering program in predictive analytics, addresses everything you need to succeed: strategy and management, methods and models, and technology and code. If you’re new to predictive analytics, you’ll gain a strong foundation for achieving accurate, actionable results. If you’re already working in the field, you’ll master powerful new skills. If you’re familiar with either Python or R, you’ll discover how these languages complement each other, enabling you to do even more. All data sets, extensive Python and R code, and additional examples available for download at http://www.ftpress.com/miller/ Python and R offer immense power in predictive analytics, data science, and big data. This book will help you leverage that power to solve real business problems, and drive real competitive advantage. Thomas W. Miller’s unique balanced approach combines business context and quantitative tools, illuminating each technique with carefully explained code for the latest versions of Python and R. If you’re new to predictive analytics, Miller gives you a strong foundation for achieving accurate, actionable results. If you’re already a modeler, programmer, or manager, you’ll learn crucial skills you don’t already have. Using Python and R, Miller addresses multiple business challenges, including segmentation, brand positioning, product choice modeling, pricing research, finance, sports, text analytics, sentiment analysis, and social network analysis. He illuminates the use of cross-sectional data, time series, spatial, and spatio-temporal data. You’ll learn why each problem matters, what data are relevant, and how to explore the data you’ve identified. Miller guides you through conceptually modeling each data set with words and figures; and then modeling it again with realistic code that delivers actionable insights. You’ll walk through model construction, explanatory variable subset selection, and validation, mastering best practices for improving out-of-sample predictive performance. Miller employs data visualization and statistical graphics to help you explore data, present models, and evaluate performance. Appendices include five complete case studies, and a detailed primer on modern data science methods. Use Python and R to gain powerful, actionable, profitable insights about: Advertising and promotion Consumer preference and choice Market baskets and related purchases Economic forecasting Operations management Unstructured text and language Customer sentiment Brand and price Sports team performance And much more

Guerrilla Analytics

Doing data science is difficult. Projects are typically very dynamic with requirements that change as data understanding grows. The data itself arrives piecemeal, is added to, replaced, contains undiscovered flaws and comes from a variety of sources. Teams also have mixed skill sets and tooling is often limited. Despite these disruptions, a data science team must get off the ground fast and begin demonstrating value with traceable, tested work products. This is when you need Guerrilla Analytics. In this book, you will learn about: The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting. Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny. Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research. Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions. Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects

Hands-On Programming with R

Learn how to program by diving into the R language, and then use your newfound skills to solve practical data science problems. With this book, you’ll learn how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions, and use all of R’s programming tools. RStudio Master Instructor Garrett Grolemund not only teaches you how to program, but also shows you how to get more from R than just visualizing and modeling data. You’ll gain valuable programming skills and support your work as a data scientist at the same time. Work hands-on with three practical data analysis projects based on casino games Store, retrieve, and change data values in your computer’s memory Write programs and simulations that outperform those written by typical R users Use R programming tools such as if else statements, for loops, and S3 classes Learn how to write lightning-fast vectorized R code Take advantage of R’s package system and debugging tools Practice and apply R programming concepts as you learn them

This episode features a discussion with statistics PhD student Zach Seeskin about a project he was involved in as part of the Eric and Wendy Schmidt Data Science for Social Good Summer Fellowship.  The project involved exploring the relationship (if any) between streetlight outages and crime in the City of Chicago.  We discuss how the data was accessed via the City of Chicago data portal, how the analysis was done, and what correlations were discovered in the data.  Won't you listen and hear what was found? 

Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition

The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the ever-increasing complexity and size of data sets and the wide range of applications in computer science, business, and health care, the process of discovering knowledge in data is more relevant than ever before. This book provides the tools needed to thrive in today's big data world. The author demonstrates how to leverage a company's existing databases to increase profits and market share, and carefully explains the most current data science methods and techniques. The reader will "learn data mining by doing data mining". By adding chapters on data modelling preparation, imputation of missing data, and multivariate statistical analysis, Discovering Knowledge in Data, Second Edition remains the eminent reference on data mining. The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis. Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization Offers extensive coverage of the R statistical programming language Contains 280 end-of-chapter exercises Includes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book

The Data Skeptic Podcast features conversations with topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches. This first episode is a short discussion about what this podcast is all about.

Analytics in a Big Data World: The Essential Guide to Data Science and its Applications

The guide to targeting and leveraging business opportunities using big data & analytics By leveraging big data & analytics, businesses create the potential to better understand, manage, and strategically exploiting the complex dynamics of customer behavior. Analytics in a Big Data World reveals how to tap into the powerful tool of data analytics to create a strategic advantage and identify new business opportunities. Designed to be an accessible resource, this essential book does not include exhaustive coverage of all analytical techniques, instead focusing on analytics techniques that really provide added value in business environments. The book draws on author Bart Baesens' expertise on the topics of big data, analytics and its applications in e.g. credit risk, marketing, and fraud to provide a clear roadmap for organizations that want to use data analytics to their advantage, but need a good starting point. Baesens has conducted extensive research on big data, analytics, customer relationship management, web analytics, fraud detection, and credit risk management, and uses this experience to bring clarity to a complex topic. Includes numerous case studies on risk management, fraud detection, customer relationship management, and web analytics Offers the results of research and the author's personal experience in banking, retail, and government Contains an overview of the visionary ideas and current developments on the strategic use of analytics for business Covers the topic of data analytics in easy-to-understand terms without an undo emphasis on mathematics and the minutiae of statistical analysis For organizations looking to enhance their capabilities via data analytics, this resource is the go-to reference for leveraging data to enhance business capabilities.

Hadoop For Dummies

Let Hadoop For Dummies help harness the power of your data and rein in the information overload Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters. Explains the origins of Hadoop, its economic benefits, and its functionality and practical applications Helps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop cluster up and running quickly and easily Details how to use Hadoop applications for data mining, web analytics and personalization, large-scale text processing, data science, and problem-solving Shows you how to improve the value of your Hadoop cluster, maximize your investment in Hadoop, and avoid common pitfalls when building your Hadoop cluster From programmers challenged with building and maintaining affordable, scaleable data systems to administrators who must deal with huge volumes of information effectively and efficiently, this how-to has something to help you with Hadoop.

Developing Analytic Talent: Becoming a Data Scientist

Learn what it takes to succeed in the the most in-demand tech job Harvard Business Review calls it the sexiest tech job of the 21st century. Data scientists are in demand, and this unique book shows you exactly what employers want and the skill set that separates the quality data scientist from other talented IT professionals. Data science involves extracting, creating, and processing data to turn it into business value. With over 15 years of big data, predictive modeling, and business analytics experience, author Vincent Granville is no stranger to data science. In this one-of-a-kind guide, he provides insight into the essential data science skills, such as statistics and visualization techniques, and covers everything from analytical recipes and data science tricks to common job interview questions, sample resumes, and source code. The applications are endless and varied: automatically detecting spam and plagiarism, optimizing bid prices in keyword advertising, identifying new molecules to fight cancer, assessing the risk of meteorite impact. Complete with case studies, this book is a must, whether you're looking to become a data scientist or to hire one. Explains the finer points of data science, the required skills, and how to acquire them, including analytical recipes, standard rules, source code, and a dictionary of terms Shows what companies are looking for and how the growing importance of big data has increased the demand for data scientists Features job interview questions, sample resumes, salary surveys, and examples of job ads Case studies explore how data science is used on Wall Street, in botnet detection, for online advertising, and in many other business-critical situations Developing Analytic Talent: Becoming a Data Scientist is essential reading for those aspiring to this hot career choice and for employers seeking the best candidates.

Practical Data Science with R

NEWER EDITION AVAILABLE IN MEAP Practical Data Science with R, Second Edition is now available in the Manning Early Access Program. An eBook of this older edition is included at no additional cost when you buy the revised edition! You may still purchase Practical Data Science with R (First Edition) using the Buy options on this page. Practical Data Science with R lives up to its name. It explains basic principles without the theoretical mumbo-jumbo and jumps right to the real use cases you'll face as you collect, curate, and analyze the data crucial to the success of your business. You'll apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support. About the Technology Business analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science tasks without a lot of academic theory or advanced mathematics. About the Book Practical Data Science with R shows you how to apply the R programming language and useful statistical techniques to everyday business situations. Using examples from marketing, business intelligence, and decision support, it shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels. What's Inside Data science for the business professional Statistical analysis using the R language Project lifecycle, from planning to delivery Numerous instantly familiar use cases Keys to effective data presentations About the Reader This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed. About the Authors Nina Zumel and John Mount are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at win-vector.com. Quotes A unique and important addition to any data scientist’s library. - From the Foreword by Jim Porzak, Cofounder Bay Area R Users Group Covers the process end-to-end, from data exploration to modeling to delivering the results. - Nezih Yigitbasi, Intel Full of useful gems for both aspiring and experienced data scientists. - Fred Rahmanian, Siemens Healthcare Hands-on data analysis with real-world examples. Highly recommended. - Dr. Kostas Passadis, IPTO

2013 Data Science Salary Survey

What tools do successful data scientists and analysts use, and how much money do they make? We surveyed hundreds of attendees at the O'Reilly Strata Conferences in Santa Clara, California and New York to understand. Findings from the survey include: Average number of tools and median income for all respondents Distribution of responses by age, location, industry, and position Detailed analysis of tools used by respondents and correlation to their salaries - including by tool clusters (Hadoop, SQL/Excel, and other) Correlation of specialized big data tools usage and salary What tools should you be learning and using? Read this valuable report to gain insight from these potentially career-changing findings.

Data Smart: Using Data Science to Transform Information into Insight

Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions. But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope. Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet. Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype. But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data. Each chapter will cover a different technique in a spreadsheet so you can follow along: Mathematical optimization, including non-linear programming and genetic algorithms Clustering via k-means, spherical k-means, and graph modularity Data mining in graphs, such as outlier detection Supervised AI through logistic regression, ensemble models, and bag-of-words models Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation Moving from spreadsheets into the R programming language You get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.