talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Mastering RStudio: Develop, Communicate, and Collaborate with R

"Mastering RStudio: Develop, Communicate, and Collaborate with R" is your guide to unlocking the potential of RStudio. You'll learn to use RStudio effectively in your data science projects, covering everything from creating R packages to interactive web apps with Shiny. By the end, you'll fully understand how to use RStudio tools to manage projects and share results effectively. What this Book will help me do Gain a comprehensive understanding of the RStudio interface and workflow optimizations. Effectively communicate data insights with R Markdown, including static and interactive documents. Create impactful data visualizations using R's diverse graphical systems and tools. Develop Shiny web applications to showcase and share analytical results. Learn to collaborate on projects using Git and GitHub, and understand R package development workflows. Author(s) Julian Hillebrand and None Nierhoff are experienced R developers with years of practical expertise in data science and software development. They have a passion for teaching how to utilize RStudio effectively. Their approach to writing combines practical examples with thorough explanations, ensuring readers can readily apply concepts to real-world scenarios. Who is it for? This book is ideal for R programmers and analysts seeking to enhance their workflows using RStudio. Whether you're looking to create professional data visualizations, develop R packages, or implement Shiny web applications, this book provides the tools you need. Suitable for those already familiar with basic R programming and fundamental concepts.

Data Munging with Hadoop

The Example-Rich, Hands-On Guide to Data Munging with Apache Hadoop TM Data scientists spend much of their time “munging” data: handling day-to-day tasks such as data cleansing, normalization, aggregation, sampling, and transformation. These tasks are both critical and surprisingly interesting. Most important, they deepen your understanding of your data’s structure and limitations: crucial insight for improving accuracy and mitigating risk in any analytical project. Now, two leading Hortonworks data scientists, Ofer Mendelevitch and Casey Stella, bring together powerful, practical insights for effective Hadoop-based data munging of large datasets. Drawing on extensive experience with advanced analytics, the authors offer realistic examples that address the common issues you’re most likely to face. They describe each task in detail, presenting example code based on widely used tools such as Pig, Hive, and Spark. This concise, hands-on eBook is valuable for every data scientist, data engineer, and architect who wants to master data munging: not just in theory, but in practice with the field’s #1 platform–Hadoop. Coverage includes A framework for understanding the various types of data quality checks, including cell-based rules, distribution validation, and outlier analysis Assessing tradeoffs in common approaches to imputing missing values Implementing quality checks with Pig or Hive UDFs Transforming raw data into “feature matrix” format for machine learning algorithms Choosing features and instances Implementing text features via “bag-of-words” and NLP techniques Handling time-series data via frequency- or time-domain methods Manipulating feature values to prepare for modeling Data Munging with Hadoop is part of a larger, forthcoming work entitled Data Science Using Hadoop. To be notified when the larger work is available, register your purchase of Data Munging with Hadoop at informit.com/register and check the box “I would like to hear from InformIT and its family of brands about products and special offers.”

Sports Analytics and Data Science: Winning the Game with Methods and Models

TO BUILD WINNING TEAMS AND SUCCESSFUL SPORTS BUSINESSES, GUIDE YOUR DECISIONS WITH DATA This up-to-the-minute reference will help you master all three facets of sports analytics – and use it to win! Sports Analytics and Data Science is the most accessible and practical guide to sports analytics for everyone who cares about winning and everyone who is interested in data science. You’ll discover how successful sports analytics blends business and sports savvy, modern information technology, and sophisticated modeling techniques. You’ll master the discipline through realistic sports vignettes and intuitive data visualizations—not complex math. Thomas W. Miller, leader of Northwestern University’s pioneering program in predictive analytics, guides you through defining problems, identifying data, crafting and optimizing models, writing effective R and Python code, interpreting your results, and more. Every chapter focuses on one key sports analytics application. Miller guides you through assessing players and teams, predicting scores and making game-day decisions, crafting brands and marketing messages, increasing revenue and profitability, and much more. Step by step, you’ll learn how analysts transform raw data and analytical models into wins: both on the field and in any sports business. Whether you’re a team executive, coach, fan, fantasy player, or data scientist, this guide will be a powerful source of competitive advantage… in any sport, by any measure. All data sets, extensive R and Python code, and additional examples available for download at http://www.ftpress.com/miller/ This exceptionally complete and practical guide to sports data science and modeling teaches through realistic examples from sports industry economics, marketing, management, performance measurement, and competitive analysis. Thomas W. Miller, faculty director of Northwestern University’s pioneering Predictive Analytics program, shows how to use advanced measures of individual and team performance to judge the competitive position of both individual athletes and teams, and to make more accurate predictions about their future performance. Miller’s modeling techniques draw on methods from economics, accounting, finance, classical and Bayesian statistics, machine learning, simulation, and mathematical programming. Miller illustrates them through realistic case studies, with fully worked examples in both R and Python. Sports Analytics and Data Science will be an invaluable resource for everyone who wants to seriously investigate and more accurately predict player, team, and sports business performance, including students, teachers, sports analysts, sports fans, trainers, coaches, and team and sports business managers. It will also be valuable to all students of analytics and data science who want to build their skills through familiar and accessible sports applications Gain powerful, actionable insights for: Understanding sports markets Assessing players Ranking teams Predicting scores Making game day decisions Crafting marketing messages Promoting brands and products Growing revenues Managing finances Playing what-if games And much more

... or should this have been called data science from a neuroscientist's perspective? Either way, I'm sure you'll enjoy this discussion with Laurie Skelly. Laurie earned a PhD in Integrative Neuroscience from the Department of Psychology at the University of Chicago. In her life as a social neuroscientist, using fMRI to study the neural processes behind empathy and psychopathy, she learned the ropes of zooming in and out between the macroscopic and the microscopic -- how millions of data points come together to tell us something meaningful about human nature. She's currently at Metis Data Science, an organization that helps people learn the skills of data science to transition in industry. In this episode, we discuss fMRI technology, Laurie's research studying empathy and psychopathy, as well as the skills and tools used in common between neuroscientists and data scientists. For listeners interested in more on this subject, Laurie recommended the blogs Neuroskeptic, Neurocritic, and Neuroecology. We conclude the episode with a mention of the upcoming Metis Data Science San Francisco cohort which Laurie will be teaching. If anyone is interested in applying to participate, they can do so here.

Elasticsearch in Action

Elasticsearch in Action teaches you how to build scalable search applications using Elasticsearch. You'll ramp up fast, with an informative overview and an engaging introductory example. Within the first few chapters, you'll pick up the core concepts you need to implement basic searches and efficient indexing. With the fundamentals well in hand, you'll go on to gain an organized view of how to optimize your design. Perfect for developers and administrators building and managing search-oriented applications. About the Technology Modern search seems like magic'you type a few words and the search engine appears to know what you want. With the Elasticsearch real-time search and analytics engine, you can give your users this magical experience without having to do complex low-level programming or understand advanced data science algorithms. You just install it, tweak it, and get on with your work. About the Book Elasticsearch in Action teaches you how to write applications that deliver professional quality search. As you read, you'll learn to add basic search features to any application, enhance search results with predictive analysis and relevancy ranking, and use saved data from prior searches to give users a custom experience. This practical book focuses on Elasticsearch's REST API via HTTP. Code snippets are written mostly in bash using cURL, so they're easily translatable to other languages. What's Inside What is a great search application? Building scalable search solutions Using Elasticsearch with any language Configuration and tuning About the Reader This book is for developers and administrators building and managing search-oriented applications. About the Authors Radu Gheorghe is a search consultant and software engineer. Matthew Lee Hinman develops highly available, cloud-based systems. Roy Russo is a specialist in predictive analytics. Quotes To understand how a modern search infrastructure works is a daunting task. Radu, Matt, and Roy make it an engaging, hands-on experience. - Sen Xu, Twitter Inc. An indispensable guide to the challenges of search of semi-structured data. - Artur Nowak, Evidence Prime The best resource for a complex topic. Highly recommended. - Daniel Beck, juris GmbH Took me from confused to confident in a week. - Alan McCann, Givsum.com

Mastering SciPy

Dive into 'Mastering SciPy' to unlock the full potential of the SciPy ecosystem for scientific computation and data analysis. This book thoughtfully combines mathematical concepts with Python programming to tackle real-world computational challenges. What this Book will help me do Effectively implement algorithms for data interpolation, approximation, and function optimization. Develop strategies for managing large datasets and performing linear algebra computations. Create and solve differential equations for scientific modeling and simulations. Apply advanced data analysis, statistical methods, and machine learning algorithms. Utilize computational geometry techniques for applications in engineering and data science. Author(s) The authors, None Blanco-Silva and Francisco Javier B Silva, are practitioners and educators in scientific computing and Python programming. They bring a wealth of experience in using SciPy to solve practical scientific challenges. Their clear and engaging approach makes these complex topics accessible and applicable. Who is it for? This book is tailored for professionals and researchers who use Python and are familiar with numerical methods. If you are looking to deepen your understanding of SciPy's capabilities to solve scientific and engineering problems, this book is ideal for you. Readers with a background in IPython and computational mathematics will benefit the most. Beginners in scientific Python can also learn by following the hands-on examples and clear explanations.

Access 2016 For Dummies

Your all-access guide to all things Access 2016 If you don't know a relational database from an isolationist table—but still need to figure out how to organize and analyze your data— Access 2016 For Dummies is for you. Written in a friendly and accessible manner, it assumes no prior Access or database-building knowledge and walks you through the basics of creating tables to store your data, building forms that ease data entry, writing queries that pull real information from your data, and creating reports that back up your analysis. Add in a dash of humor and fun, and Access 2016 For Dummies is the only resource you'll need to go from data rookie to data pro! This expanded and updated edition of Access For Dummies covers all of the latest information and features to help data newcomers better understand Access' role in the world of data analysis and data science. Inside, you'll get a crash course on how databases work—and how to build one from the ground up. Plus, you'll find step-by-step guidance on how to structure data to make it useful, manipulate, edit, and import data into your database, write and execute queries to gain insight from your data, and report data in elegant ways. Speak the lingo of database builders and create databases that suit your needs Organize your data into tables and build forms that ease data entry Query your data to get answers right Create reports that tell the story of your data findings If you have little to no experience with creating and managing a database of any sort, Access 2016 For Dummies is the perfect starting point for learning the basics of building databases, simplifying data entry and reporting, and improving your overall data skills.

Learning Bayesian Models with R

Dive into the world of Bayesian Machine Learning with "Learning Bayesian Models with R." This comprehensive guide introduces the foundations of probability theory and Bayesian inference, teaches you how to implement these concepts with the R programming language, and progresses to practical techniques for supervised and unsupervised problems in data science. What this Book will help me do Understand and set up an R environment for Bayesian modeling Build Bayesian models including linear regression and classification for predictive analysis Learn to apply Bayesian inference to real-world machine learning problems Work with big data and high-performance computation frameworks like Hadoop and Spark Master advanced Bayesian techniques and apply them to deep learning and AI challenges Author(s) Hari Manassery Koduvely is a proficient data scientist with extensive experience in leveraging Bayesian frameworks for real-world applications. His passion for Bayesian Machine Learning is evident in his approachable and detailed teaching methodology, aimed at making these complex topics accessible for practitioners. Who is it for? This book is best suited for data scientists, analysts, and statisticians familiar with R and basic probability theory who aim to enhance their expertise in Bayesian approaches. It's ideal for professionals tackling machine learning challenges in applied data contexts. If you're looking to incorporate advanced probabilistic methods into your projects, this guide will show you how.

Learning to Love Data Science

Until recently, many people thought big data was a passing fad. "Data science" was an enigmatic term. Today, big data is taken seriously, and data science is considered downright sexy. With this anthology of reports from award-winning journalist Mike Barlow, you’ll appreciate how data science is fundamentally altering our world, for better and for worse. Barlow paints a picture of the emerging data space in broad strokes. From new techniques and tools to the use of data for social good, you’ll find out how far data science reaches. With this anthology, you’ll learn how: Analysts can now get results from their data queries in near real time Indie manufacturers are blurring the lines between hardware and software Companies try to balance their desire for rapid innovation with the need to tighten data security Advanced analytics and low-cost sensors are transforming equipment maintenance from a cost center to a profit center CIOs have gradually evolved from order takers to business innovators New analytics tools let businesses go beyond data analysis and straight to decision-making Mike Barlow is an award-winning journalist, author, and communications strategy consultant. Since launching his own firm, Cumulus Partners, he has represented major organizations in a number of industries.

Today's guest is Cameron Davidson-Pilon. Cameron has a masters degree in quantitative finance from the University of Waterloo. Think of it as statistics on stock markets. For the last two years he's been the team lead of data science at Shopify. He's the founder of dataoragami.net which produces screencasts teaching methods and techniques of applied data science. He's also the author of the just released in print book Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference, which you can also get in a digital form. This episode focuses on the topic of Bayesian A/B Testing which spans just one chapter of the book. Related to today's discussion is the Data Origami post The class imbalance problem in A/B testing. Lastly, Data Skeptic will be giving away a copy of the print version of the book to one lucky listener who has a US based delivery address. To participate, you'll need to write a review of any site, book, course, or podcast of your choice on datasciguide.com. After it goes live, tweet a link to it with the hashtag #WinDSBook to be given an entry in the contest. This contest will end November 20th, 2015, at which time I'll draw a single randomized winner and contact them for delivery details via direct message on Twitter.

Analyzing and Visualizing Data with F#

In this report, F# contributor Tomas Petricek explains many of the key features of the F# language that make it a great tool for data science and machine learning. Real world examples take you through the entire data science workflow with F#, from data access and analysis to presenting the results. You'll learn about: How F# and its unique features—such as type providers—ease the chore of data access The process of data analysis and visualization, using the Deedle library, R type provider and the XPlot charting library Implementations for a clustering algorithm using the standard F# library and how the F# type inference helps you understand your code The report also includes a list of resources to help you learn more about using F# for data science.

Hadoop with Python

Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you’ll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework. Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools. Use the Python library Snakebite to access HDFS programmatically from within Python applications Write MapReduce jobs in Python with mrjob, the Python MapReduce library Extend Pig Latin with user-defined functions (UDFs) in Python Use the Spark Python API (PySpark) to write Spark programs with Python Learn how to use the Luigi Python workflow scheduler to manage MapReduce jobs and Pig scripts Zachary Radtka, a platform engineer at Miner & Kasch, has extensive experience creating custom analytics that run on petabyte-scale data sets.

Mastering Data analysis with R

Unlock the full potential of the R programming language with 'Mastering Data Analysis with R'. This book takes you from basic data manipulation to advanced visualization and modeling techniques, providing hands-on guidance to solve real-world data science challenges. What this Book will help me do Efficiently manipulate and clean large datasets using R techniques. Build and evaluate statistical models and machine learning algorithms. Visualize data insights through compelling graphics and visualizations. Analyze social networks and graph data within R's environment. Perform geospatial data analysis with specialized R packages. Author(s) None Daróczi is a seasoned data scientist and R developer with extensive industry and academic experience. He specializes in employing R for sophisticated data analysis tasks and visualization. His approachable writing style, combined with in-depth technical expertise, ensures learners of varying levels can connect with and benefit from his materials. Who is it for? This book is ideal for data scientists, statisticians, and analysts who are familiar with basics of R and want to deepen their expertise. If you are looking to learn practical applications of advanced R capabilities for data wrangling, modeling, and visualization, this is for you. It suits professionals aiming to implement data-driven solutions and empowers them to make informed decisions with R's tools. Find practical techniques to elevate your data analysis proficiency here.

2015 Data Science Salary Survey

For the third consecutive year, O’Reilly Media conducted an anonymous survey to expose the tools that successful data scientists and engineers use, and how those tool choices might relate to their salary. For the 2015 version of the Data Science Salary Survey, we heard from over 600 respondents who work in and around the data space for a variety of industries across 47 countries and 38 U.S. states. The research was based on data collected through an online 32-question survey, including demographic information, time spent on various data-related tasks, and the use or non-use of 116 software tools. Findings include: Download this free in-depth report to gain insight from these potentially career-changing findings, and plug your own variables into one of the linear models to predict your own salary. Average number of tools and median income for all respondents Distribution of responses by a variety of factors, including age, gender, location, industry, role, and cloud computing Detailed analysis of tool use, including tool clusters Correlation of tool usage and salary The survey is now open for the 2016 report, and it takes just 5 to 10 minutes to complete: http://www.oreilly.com/go/ds-salary-​survey-2016.

Getting Data Right

Over the last 20 years, companies have invested roughly $3-4 trillion in enterprise software. These investments have been primarily focused on the development and deployment of single systems, applications, functions, and geographies targeted at the automation and optimization of key business processes. Companies are now investing heavily in big data analytics ($44 billion alone in 2014) in an effort to begin analyzing all of the data being generated from their process automation systems. But companies are quickly realizing that one of their key bottlenecks is Data Variety—the silo’d nature of the data that is a natural result of internal and external source proliferation. The problem of big data variety has crept up from the bottom—and the cost of variety is only appreciated when companies attempt to ask simple questions across many business silos (divisions, geographies, functions, etc.). Current top-down, deterministic data unification approaches (such as ETL, ELT, and MDM) were simply not designed to scale to the variety of hundreds or thousands or even tens of thousands of data silos. Download this free eBook to learn about the fundamental challenges that Data Variety poses to enterprises looking to maximize the value of their existing investments—and how new approaches promise to help organizations embrace and leverage the fundamental diversity of data. Readers will also find best practices for designing bottom-up and probabilistic methods for finding and managing data; principles for doing data science at scale in the big data era; preparing and unifying data in ways that complement existing systems; optimizing data warehousing; and how to use “data ops” to automate large-scale integration.

There's an old adage which says you cannot fit a model which has more parameters than you have data. While this is often the case, it's not a universal truth. Today's guest Jake VanderPlas explains this topic in detail and provides some excellent examples of when it holds and doesn't. Some excellent visuals articulating the points can be found on Jake's blog Pythonic Perambulations, specifically on his post The Model Complexity Myth. We also touch on Jake's work as an astronomer, his noteworthy open source contributions, and forthcoming book (currently available in an Early Edition) Python Data Science Handbook.

The Art and Science of Analyzing Software Data

The Art and Science of Analyzing Software Data provides valuable information on analysis techniques often used to derive insight from software data. This book shares best practices in the field generated by leading data scientists, collected from their experience training software engineering students and practitioners to master data science. The book covers topics such as the analysis of security data, code reviews, app stores, log files, and user telemetry, among others. It covers a wide variety of techniques such as co-change analysis, text analysis, topic analysis, and concept analysis, as well as advanced topics such as release planning and generation of source code comments. It includes stories from the trenches from expert data scientists illustrating how to apply data analysis in industry and open source, present results to stakeholders, and drive decisions. Presents best practices, hints, and tips to analyze data and apply tools in data science projects Presents research methods and case studies that have emerged over the past few years to further understanding of software data Shares stories from the trenches of successful data science initiatives in industry

Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection

Detect fraud earlier to mitigate loss and prevent cascading damage Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques is an authoritative guidebook for setting up a comprehensive fraud detection analytics solution. Early detection is a key factor in mitigating fraud damage, but it involves more specialized techniques than detecting fraud at the more advanced stages. This invaluable guide details both the theory and technical aspects of these techniques, and provides expert insight into streamlining implementation. Coverage includes data gathering, preprocessing, model building, and post-implementation, with comprehensive guidance on various learning techniques and the data types utilized by each. These techniques are effective for fraud detection across industry boundaries, including applications in insurance fraud, credit card fraud, anti-money laundering, healthcare fraud, telecommunications fraud, click fraud, tax evasion, and more, giving you a highly practical framework for fraud prevention. It is estimated that a typical organization loses about 5% of its revenue to fraud every year. More effective fraud detection is possible, and this book describes the various analytical techniques your organization must implement to put a stop to the revenue leak. Examine fraud patterns in historical data Utilize labeled, unlabeled, and networked data Detect fraud before the damage cascades Reduce losses, increase recovery, and tighten security The longer fraud is allowed to go on, the more harm it causes. It expands exponentially, sending ripples of damage throughout the organization, and becomes more and more complex to track, stop, and reverse. Fraud prevention relies on early and effective fraud detection, enabled by the techniques discussed here. Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques helps you stop fraud in its tracks, and eliminate the opportunities for future occurrence.

Data and Social Good

Data may indeed be the "new oil"—a seemingly inexhaustible source of fuel for spectacular economic growth—but it's also a valuable resource for humanitarian groups looking to improve and protect the lives of less fortunate people. In this O'Reilly report, you'll learn how statisticians and data scientists are volunteering their time to help a variety of nonprofit organizations around the world. Mike Barlow cites several examples of how data and the work of data scientists have made a measurable impact on organizations such as DataKind, a group that connects socially minded data scientists with organizations working to address critical humanitarian issues. There's certainly no lack of demand for data science services among nonprofits today, because these organizations, too, realize the potential of data for changing people's fortunes.

Yusan Lin shares her research on using data science to explore the fashion industry in this episode. She has applied techniques from data mining, natural language processing, and social network analysis to explore who are the innovators in the fashion world and how their influence effects other designers. If you found this episode interesting and would like to read more, Yusan's papers Text-Generated Fashion Influence Model: An Empirical Study on Style.com and The Hidden Influence Network in the Fashion Industry are worth reading.