Data Collection

Leveraging Human Intelligence For Better AI At Alegion With Cheryl Martin - Episode 38

2018-07-02 · Data Engineering Podcast Listen

podcast_episode

by Cheryl Martin (Alegion) , Tobias Macey

AI/ML API Chef Data Engineering Data Management DataOps

Summary

Data is often messy or incomplete, requiring human intervention to make sense of it before being usable as input to machine learning projects. This is problematic when the volume scales beyond a handful of records. In this episode Dr. Cheryl Martin, Chief Data Scientist for Alegion, discusses the importance of properly labeled information for machine learning and artificial intelligence projects, the systems that they have built to scale the process of incorporating human intelligence in the data preparation process, and the challenges inherent to such an endeavor.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Are you struggling to keep up with customer request and letting errors slip into production? Want to try some of the innovative ideas in this podcast but don’t have time? DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality. Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love. Join the DataOps movement and sign up for the newsletter at datakitchen.io/de today. After that learn more about why you should be doing DataOps by listening to the Head Chef in the Data Kitchen at dataengineeringpodcast.com/datakitchen Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Cheryl Martin, chief data scientist at Alegion, about data labelling at scale

Interview

Introduction How did you get involved in the area of data management? To start, can you explain the problem space that Alegion is targeting and how you operate? When is it necessary to include human intelligence as part of the data lifecycle for ML/AI projects? What are some of the biggest challenges associated with managing human input to data sets intended for machine usage? For someone who is acting as human-intelligence provider as part of the workforce, what does their workflow look like?

What tools and processes do you have in place to ensure the accuracy of their inputs? How do you prevent bad actors from contributing data that would compromise the trained model?

What are the limitations of crowd-sourced data labels?

When is it beneficial to incorporate domain experts in the process?

When doing data collection from various sources, how do you ensure that intellectual property rights are respected? How do you determine the taxonomies to be used for structuring data sets that are collected, labeled or enriched for your customers?

What kinds of metadata do you track and how is that recorded/transmitted?

Do you think that human intelligence will be a necessary piece of ML/AI forever?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Alegion University of Texas at Austin Cognitive Science Labeled Data Mechanical Turk Computer Vision Sentiment Analysis Speech Recognition Taxonomy Feature Engineering

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

2018-05-28 · Data Engineering Podcast Listen

podcast_episode

by Yair Weinberger (Alooma) , Tobias Macey

API Amazon EMR Kinesis Azure BigQuery Cassandra Cloud Computing Cloud Storage Data Engineering Data Management Datadog Docker +24 more

Summary

Building an ETL pipeline is a common need across businesses and industries. It’s easy to get one started but difficult to manage as new requirements are added and greater scalability becomes necessary. Rather than duplicating the efforts of other engineers it might be best to use a hosted service to handle the plumbing so that you can focus on the parts that actually matter for your business. In this episode CTO and co-founder of Alooma, Yair Weinberger, explains how the platform addresses the common needs of data collection, manipulation, and storage while allowing for flexible processing. He describes the motivation for starting the company, how their infrastructure is architected, and the challenges of supporting multi-tenancy and a wide variety of integrations.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Yair Weinberger about Alooma, a company providing data pipelines as a service

Interview

Introduction How did you get involved in the area of data management? What is Alooma and what is the origin story? How is the Alooma platform architected?

I want to go into stream VS batch here What are the most challenging components to scale?

How do you manage the underlying infrastructure to support your SLA of 5 nines? What are some of the complexities introduced by processing data from multiple customers with various compliance requirements?

How do you sandbox user’s processing code to avoid security exploits?

What are some of the potential pitfalls for automatic schema management in the target database? Given the large number of integrations, how do you maintain the

What are some challenges when creating integrations, isn’t it simply conforming with an external API?

For someone getting started with Alooma what does the workflow look like? What are some of the most challenging aspects of building and maintaining Alooma? What are your plans for the future of Alooma?

Contact Info

LinkedIn @yairwein on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Alooma Convert Media Data Integration ESB (Enterprise Service Bus) Tibco Mulesoft ETL (Extract, Transform, Load) Informatica Microsoft SSIS OLAP Cube S3 Azure Cloud Storage Snowflake DB Redshift BigQuery Salesforce Hubspot Zendesk Spark The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps RDBMS (Relational Database Management System) SaaS (Software as a Service) Change Data Capture Kafka Storm Google Cloud PubSub Amazon Kinesis Alooma Code Engine Zookeeper Idempotence Kafka Streams Kubernetes SOC2 Jython Docker Python Javascript Ruby Scala PII (Personally Identifiable Information) GDPR (General Data Protection Regulation) Amazon EMR (Elastic Map Reduce) Sequoia Capital Lightspeed Investors Redis Aerospike Cassandra MongoDB

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Visual Data Storytelling with Tableau, First edition

2018-05-14 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Lindy Ryan

Data Science DataViz Tableau data data-science data-science-tasks data-visualization

Tell Insightful, Actionable Business Stories with Tableau, the World’s Leading Data Visualization Tool! Visual Data Storytelling with Tableau brings together knowledge, context, and hands-on skills for telling powerful, actionable data stories with Tableau. This full-color guide shows how to organize data and structure analysis with storytelling in mind, embrace exploration and visual discovery, and articulate findings with rich data, carefully curated visualizations, and skillfully crafted narrative. You don’t need any visualization experience. Each chapter illuminates key aspects of design practice and data visualization, and guides you step-by-step through applying them in Tableau. Through realistic examples and classroom-tested exercises, Professor Lindy Ryan helps you use Tableau to analyze data, visualize it, and help people connect more intuitively and emotionally with it. Whether you’re an analyst, executive, student, instructor, or journalist, you won’t just master the tools: you’ll learn to craft data stories that make an immediate impact--and inspire action. Learn how to: Craft more powerful stories by blending data science, genre, and visual design Ask the right questions upfront to plan data collection and analysis Build storyboards and choose charts based on your message and audience Direct audience attention to the points that matter most Showcase your data stories in high-impact presentations Integrate Tableau storytelling throughout your business communication Explore case studies that show what to do--and what not to do Discover visualization best practices, tricks, and hacks you can use with any tool Includes coverage up through Tableau 10

Honeycomb Data Infrastructure with Sam Stokes - Episode 20

2018-02-26 · Data Engineering Podcast Listen

podcast_episode

by Sam Stokes (Honeycomb) , Tobias Macey

AI/ML Data Engineering Data Management Data Science GitHub Linux Marketing

Summary

One of the sources of data that often gets overlooked is the systems that we use to run our businesses. This data is not used to directly provide value to customers or understand the functioning of the business, but it is still a critical component of a successful system. Sam Stokes is an engineer at Honeycomb where he helps to build a platform that is able to capture all of the events and context that occur in our production environments and use them to answer all of your questions about what is happening in your system right now. In this episode he discusses the challenges inherent in capturing and analyzing event data, the tools that his team is using to make it possible, and how this type of knowledge can be used to improve your critical infrastructure.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers A few announcements:

There is still time to register for the O’Reilly Strata Conference in San Jose, CA March 5th-8th. Use the link dataengineeringpodcast.com/strata-san-jose to register and save 20% The O’Reilly AI Conference is also coming up. Happening April 29th to the 30th in New York it will give you a solid understanding of the latest breakthroughs and best practices in AI for business. Go to dataengineeringpodcast.com/aicon-new-york to register and save 20% If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to dataengineeringpodcast.com/odsc-east-2018 and register.

Your host is Tobias Macey and today I’m interviewing Sam Stokes about his work at Honeycomb, a modern platform for observability of software systems

Interview

Introduction How did you get involved in the area of data management? What is Honeycomb and how did you get started at the company? Can you start by giving an overview of your data infrastructure and the path that an event takes from ingest to graph? What are the characteristics of the event data that you are dealing with and what challenges does it pose in terms of processing it at scale? In addition to the complexities of ingesting and storing data with a high degree of cardinality, being able to quickly analyze it for customer reporting poses a number of difficulties. Can you explain how you have built your systems to facilitate highly interactive usage patterns? A high degree of visibility into a running system is desirable for developers and systems adminstrators, but they are not always willing or able to invest the effort to fully instrument the code or servers that they want to track. What have you found to be the most difficult aspects of data collection, and do you have any tooling to simplify the implementation for user? How does Honeycomb compare to other systems that are available off the shelf or as a service, and when is it not the right tool? What have been some of the most challenging aspects of building, scaling, and marketing Honeycomb?

Contact Info

@samstokes on Twitter Blog samstokes on GitHub

Parting Question

WEB ANALYTICS: THE MISSING MANUAL

2018-02-01 · Superweek 2018

talk

by Mariia Bocheva (/ OWOX)

Analytics

Websites come in many types, styles, and budgets. Each website, be it a news site, an online store or a personal blog, has its own performance indicators. These may include the number of unique visitors, average order value, conversion rate etc. Web analytics systems are aimed to help track those indicators, analyze them and make decisions. But how you can trust them if the data collection process is screwed and data is scattered. During this speech, you'll get a checklist for your project that covers all the main mistakes in the project settings and analysis. Save your time and learn from the mistakes other teams have already made.

Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15

2018-01-22 · Data Engineering Podcast Listen

podcast_episode

by Alex Ratner (Snorkel) , Tobias Macey

AI/ML Big Data Data Engineering Data Management GitHub Linux PyTorch TensorFlow

Summary

The majority of the conversation around machine learning and big data pertains to well-structured and cleaned data sets. Unfortunately, that is just a small percentage of the information that is available, so the rest of the sources of knowledge in a company are housed in so-called “Dark Data” sets. In this episode Alex Ratner explains how the work that he and his fellow researchers are doing on Snorkel can be used to extract value by leveraging labeling functions written by domain experts to generate training sets for machine learning models. He also explains how this approach can be used to democratize machine learning by making it feasible for organizations with smaller data sets than those required by most tooling.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Alex Ratner about Snorkel and Dark Data

Interview

Introduction How did you get involved in the area of data management? Can you start by sharing your definition of dark data and how Snorkel helps to extract value from it? What are some of the most challenging aspects of building labelling functions and what tools or techniques are available to verify their validity and effectiveness in producing accurate outcomes? Can you provide some examples of how Snorkel can be used to build useful models in production contexts for companies or problem domains where data collection is difficult to do at large scale? For someone who wants to use Snorkel, what are the steps involved in processing the source data and what tooling or systems are necessary to analyse the outputs for generating usable insights? How is Snorkel architected and how has the design evolved over its lifetime? What are some situations where Snorkel would be poorly suited for use? What are some of the most interesting applications of Snorkel that you are aware of? What are some of the other projects that you and your group are working on that interact with Snorkel? What are some of the features or improvements that you have planned for future releases of Snorkel?

Contact Info

Website ajratner on Github @ajratner on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Stanford DAWN HazyResearch Snorkel Christopher Ré Dark Data DARPA Memex Training Data FDA ImageNet National Library of Medicine Empirical Studies of Conflict Data Augmentation PyTorch Tensorflow Generative Model Discriminative Model Weak Supervision

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Neuroimaging and Big Data

2018-01-12 · Data Skeptic Listen

podcast_episode

by Ryan Cabeen (Laboratory of Neuroimaging (LONI), USC) , Farshid Sepherband (Laboratory of Neuroimaging (LONI), USC) , Kyle Polich , Dr. Meng Law (Laboratory of Neuroimaging (LONI), USC) , Dr. Arthur Toga (Laboratory of Neuroimaging (LONI), USC)

AI/ML Big Data Data Science

Last year, Kyle had a chance to visit the Laboratory of Neuroimaging, or LONI, at USC, and learn about how some researchers are using data science to study the function of the brain. We're going to be covering some of their work in two episodes on Data Skeptic. In this first part of our two-part episode, we'll talk about the data collection and brain imaging and the LONI pipeline. We'll then continue our coverage in the second episode, where we'll talk more about how researchers can gain insights about the human brain and their current challenges. Next week, we'll also talk more about what all that has to do with data science machine learning and artificial intelligence. Joining us in this week's episode are members of the LONI lab, which include principal investigators, Dr. Arthur Toga and Dr. Meng Law, and researchers, Farshid Sepherband, PhD and Ryan Cabeen, PhD.

Business Research Reporting

2017-12-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Dr. Dorinda Clippinger

business-intelligence data data-science

Business Research Reporting addresses the essential activities of locating, collecting, evaluating, analyzing, interpreting, and reporting business data. It highlights the value of primary and secondary research to making business decisions and solving business problems. It aims to help business managers, MBA candidates, and upper-level college students boost their research skills and report research with confidence. This book discusses primary data collection, sampling concepts, and the use of measurement and scales in preparing instruments. Also, this book explores statistical and non-statistical analysis of qualitative and quantitative data and data interpretation (findings, conclusions, and recommendations). The author shows how to locate, evaluate, and extract secondary data found on the web and in brick-and-mortar libraries, including optimized searching, evaluating, and recording. Plus, the book demonstrates how to avoid copyright infringement and plagiarism, use online citation software, and cite sources when writing and presenting. Two glossaries—one each for primary and secondary research—round out the content. Business Research Reporting can be your go-to guidebook for years to come. Reading through it in a couple of hours, you can pick up ample information to apply instantly. Then keep it handy and refer to it in your ongoing research activities.

Python Web Scraping - Second Edition

2017-05-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Katharine Jarmul (Cape Privacy)

JavaScript Python Selenium data data-science data-science-tasks web-scraping

"Python Web Scraping" is a practical guide to extracting and processing online data using the Python programming language. With this book, you'll learn step-by-step how to build web scrapers and crawlers that can handle a range of data sources and structures. After reading this, you will be equipped to tackle real-world web scraping challenges effectively. What this Book will help me do Learn how to extract structured data from standard webpages using Python. Gain proficiency with libraries such as Selenium and PyQt for handling dynamic and JavaScript-dependent content. Build concurrent scrapers to efficiently process large volumes of web pages in parallel. Understand and implement form interaction automation for data extraction from complex websites. Develop advanced scrapers using Scrapy to handle sophisticated web crawling tasks. Author(s) None Jarmul is an experienced data scientist and programmer with extensive knowledge in Python. They bring practical expertise from working on real-world web scraping projects. In their work, they focus on creating content that empowers readers by demystifying complex technical topics. Who is it for? This book is perfect for software developers eager to dive into web scraping using Python, even if they're new to the subject. If you have basic to intermediate Python skills and want to automate data collection and processing, this is the book for you. The techniques here are valuable for tackling diverse data extraction scenarios.

Research Methods in Human-Computer Interaction, 2nd Edition

2017-04-28 · O'Reilly Data Science Books O'Reilly Amazon

book

by Harry Hochheiser , Jinjuan Heidi Feng , Jonathan Lazar

data data-science data-science-tasks exploratory-data-analysis

Research Methods in Human-Computer Interaction is a comprehensive guide to performing research and is essential reading for both quantitative and qualitative methods. Since the first edition was published in 2009, the book has been adopted for use at leading universities around the world, including Harvard University, Carnegie-Mellon University, the University of Washington, the University of Toronto, HiOA (Norway), KTH (Sweden), Tel Aviv University (Israel), and many others. Chapters cover a broad range of topics relevant to the collection and analysis of HCI data, going beyond experimental design and surveys, to cover ethnography, diaries, physiological measurements, case studies, crowdsourcing, and other essential elements in the well-informed HCI researcher's toolkit. Continual technological evolution has led to an explosion of new techniques and a need for this updated 2nd edition, to reflect the most recent research in the field and newer trends in research methodology. This Research Methods in HCI revision contains updates throughout, including more detail on statistical tests, coding qualitative data, and data collection via mobile devices and sensors. Other new material covers performing research with children, older adults, and people with cognitive impairments. Comprehensive and updated guide to the latest research methodologies and approaches, and now available in EPUB3 format (choose any of the ePub or Mobi formats after purchase of the eBook) Expanded discussions of online datasets, crowdsourcing, statistical tests, coding qualitative data, laws and regulations relating to the use of human participants, and data collection via mobile devices and sensors New material on performing research with children, older adults, and people with cognitive impairments, two new case studies from Google and Yahoo!, and techniques for expanding the influence of your research to reach non-researcher audiences, including software developers and policymakers

Translating Statistics to Make Decisions: A Guide for the Non-Statistician

2017-03-10 · O'Reilly Data Science Books O'Reilly Amazon

book

by Victoria Cox

data data-science data-science-tasks statistics

Examine and solve the common misconceptions and fallacies that non-statisticians bring to their interpretation of statistical results. Explore the many pitfalls that non-statisticians—and also statisticians who present statistical reports to non-statisticians—must avoid if statistical results are to be correctly used for evidence-based business decision making. Victoria Cox, senior statistician at the United Kingdom's Defence Science and Technology Laboratory (Dstl), distills the lessons of her long experience presenting the actionable results of complex statistical studies to users of widely varying statistical sophistication across many disciplines: from scientists, engineers, analysts, and information technologists to executives, military personnel, project managers, and officials across UK government departments, industry, academia, and international partners. The author shows how faulty statistical reasoning often undermines the utility of statistical results even among those with advanced technical training. Translating Statistics teaches statistically naive readers enough about statistical questions, methods, models, assumptions, and statements that they will be able to extract the practical message from statistical reports and better constrain what conclusions cannot be made from the results. To non-statisticians with some statistical training, this book offers brush-ups, reminders, and tips for the proper use of statistics and solutions to common errors. To fellow statisticians, the author demonstrates how to present statistical output to non-statisticians to ensure that the statistical results are correctly understood and properly applied to real-world tasks and decisions. The book avoids algebra and proofs, but it does supply code written in R for those readers who are motivated to work out examples. Pointing along the way to instructive examples of statistics gone awry, Translating Statistics walks readers through the typical course of a statistical study, progressing from the experimental design stage through the data collection process, exploratory data analysis, descriptive statistics, uncertainty, hypothesis testing, statistical modelling and multivariate methods, to graphs suitable for final presentation. The steady focus throughout the book is on how to turn the mathematical artefacts and specialist jargon that are second nature to statisticians into plain English for corporate customers and stakeholders. The final chapter neatly summarizes the book's lessons and insights for accurately communicating statistical reports to the non-statisticians who commission and act on them. What You'll Learn Recognize and avoid common errors and misconceptions that cause statistical studies to be misinterpreted and misused by non-statisticians in organizational settings Gain a practical understanding of the methods, processes, capabilities, and caveats of statistical studies to improve the application of statistical data to business decisions See how to code statistical solutions in R Who This Book Is For Non-statisticians—including both those with and without an introductory statistics course under their belts—who consume statistical reports in organizational settings, and statisticians who seek guidance for reporting statistical studies to non-statisticians in ways that will be accurately understood and will inform sound business and technical decisions

Total Survey Error in Practice

2017-02-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Frauke Kreuter (LMU Munich) , Edith de Leeuw , N. Clyde Tucker , Paul P. Biemer , Brady T. West , Stephanie Eckman , Brad Edwards , Lars E. Lyberg

Data Quality data data-science data-science-tasks statistics survey-methodologies

Featuring a timely presentation of total survey error (TSE), this edited volume introduces valuable tools for understanding and improving survey data quality in the context of evolving large-scale data sets This book provides an overview of the TSE framework and current TSE research as related to survey design, data collection, estimation, and analysis. It recognizes that survey data affects many public policy and business decisions and thus focuses on the framework for understanding and improving survey data quality. The book also addresses issues with data quality in official statistics and in social, opinion, and market research as these fields continue to evolve, leading to larger and messier data sets. This perspective challenges survey organizations to find ways to collect and process data more efficiently without sacrificing quality. The volume consists of the most up-to-date research and reporting from over 70 contributors representing the best academics and researchers from a range of fields. The chapters are broken out into five main sections: The Concept of TSE and the TSE Paradigm, Implications for Survey Design, Data Collection and Data Processing Applications, Evaluation and Improvement, and Estimation and Analysis. Each chapter introduces and examines multiple error sources, such as sampling error, measurement error, and nonresponse error, which often offer the greatest risks to data quality, while also encouraging readers not to lose sight of the less commonly studied error sources, such as coverage error, processing error, and specification error. The book also notes the relationships between errors and the ways in which efforts to reduce one type can increase another, resulting in an estimate with larger total error. This book: • Features various error sources, and the complex relationships between them, in 25 high-quality chapters on the most up-to-date research in the field of TSE • Provides comprehensive reviews of the literature on error sources as well as data collection approaches and estimation methods to reduce their effects • Presents examples of recent international events that demonstrate the effects of data error, the importance of survey data quality, and the real-world issues that arise from these errors • Spans the four pillars of the total survey error paradigm (design, data collection, evaluation and analysis) to address key data quality issues in official statistics and survey research Total Survey Error in Practice is a reference for survey researchers and data scientists in research areas that include social science, public opinion, public policy, and business. It can also be used as a textbook or supplementary material for a graduate-level course in survey research methods. Paul P. Biemer, PhD, is distinguished fellow at RTI International and associate director of Survey Research and Development at the Odum Institute, University of North Carolina, USA. Edith de Leeuw, PhD, is professor of survey methodology in the Department of Methodology and Statistics at Utrecht University, the Netherlands. Stephanie Eckman, PhD, is fellow at RTI International, USA. Brad Edwards is vice president, director of Field Services, and deputy area director at Westat, USA. Frauke Kreuter, PhD, is professor and director of the Joint Program in Survey Methodology, University of Maryland, USA; professor of statistics and methodology at the University of Mannheim, Germany; and head of the Statistical Methods Research Department at the Institute for Employment Research, Germany. Lars E. Lyberg, PhD, is senior advisor at Inizio, Sweden. N. Clyde Tucker, PhD, is principal survey methodologist at the American Institutes for Research, USA. Brady T. West, PhD, is research associate professor in the Survey Resea

FIRESIDE STAND-UP: THE TRUTH OF GOOGLE ANALYTICS NOBODY TOLD YOU

2017-02-01 · Superweek 2017

talk

by Miroslav Varga

Analytics Google Analytics

Miroslav will present Analytics from a completely different perspective. Some things he will reveal about the life and job of an Analytics / data expert will be painful, honest, interesting and most of all, funny. After his Stand-up, you will look at attribution, data collection and Analytics dimensions / metrics, from a new point of view. Please, don't expect everything will be 100% accurate - after all, it's a stand-up.

Perspectives on Data Science for Software Engineering

2016-07-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Laurie Williams , Thomas Zimmermann , Tim Menzies

Analytics Cloud Computing Data Science DataViz Cyber Security data data-science

Perspectives on Data Science for Software Engineering presents the best practices of seasoned data miners in software engineering. The idea for this book was created during the 2014 conference at Dagstuhl, an invitation-only gathering of leading computer scientists who meet to identify and discuss cutting-edge informatics topics. At the 2014 conference, the concept of how to transfer the knowledge of experts from seasoned software engineers and data scientists to newcomers in the field highlighted many discussions. While there are many books covering data mining and software engineering basics, they present only the fundamentals and lack the perspective that comes from real-world experience. This book offers unique insights into the wisdom of the community’s leaders gathered to share hard-won lessons from the trenches. Ideas are presented in digestible chapters designed to be applicable across many domains. Topics included cover data collection, data sharing, data mining, and how to utilize these techniques in successful software projects. Newcomers to software engineering data science will learn the tips and tricks of the trade, while more experienced data scientists will benefit from war stories that show what traps to avoid. Presents the wisdom of community experts, derived from a summit on software analytics Provides contributed chapters that share discrete ideas and technique from the trenches Covers top areas of concern, including mining security and social data, data visualization, and cloud-based data Presented in clear chapters designed to be applicable across many domains

Manufacturing Performance Management using SAP OEE: Implementing and Configuring Overall Equipment Effectiveness

2016-06-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dipankar Saha , Mahalakshmi Syamsunder , Sumanta Chakraborty

Analytics ERP KPI SAP data data-engineering

Learn how to configure, implement, enhance, and customize SAP OEE to address manufacturing performance management. Manufacturing Performance Management using SAP OEE will show you how to connect your business processes with your plant systems and how to integrate SAP OEE with ERP through standard workflows and shop floor systems for automated data collection. Manufacturing Performance Management using SAP OEE is a must-have comprehensive guide to implementing SAP OEE. It will ensure that SAP consultants and users understand how SAP OEE can offer solutions for manufacturing performance management in process industries. With this book in hand, managing shop floor execution effectively will become easier than ever. Authors Dipankar Saha and Mahalakshmi Symsunder, both SAP manufacturing solution experts, and Sumanta Chakraborty, product owner of SAP OEE, will explain execution and processing related concepts, manual and automatic data collection through the OEE Worker UI, and how to enhance and customize interfaces and dashboards for your specific purposes. You'll learn how to capture and categorize production and loss data and use it effectively for root-cause analysis. In addition, this book will show you: Various down-time handling scenarios. How to monitor, calculate, and define standard as well as industry-specific KPIs. How to carry out standard operational analytics for continuous improvement on the shop floor, at local plant level using MII and SAP Lumira, and also global consolidated analytics at corporation level using SAP HANA. Steps to benchmark manufacturing performance to compare similar manufacturing plants' performance, leading to a more efficient and effective shop floor. Manufacturing Performance Management using SAP OEE will provide you with in-depth coverage of SAP OEE and how to effectively leverage its features. This will allow you to efficiently manage the manufacturing process and to enhance the shop floor's overall performance, making you the sought-after SAP OEE expert in the organization. Manufacturing Performance Management using SAP OEE will provide you with in-depth coverage of SAP OEE and how to effectively leverage its features. This will allow you to efficiently manage the manufacturing process and to enhance the shop floor's overall performance, making you the sought-after SAP OEE expert in the organization. What You Will Learn Configure your ERP OEE add-on to build your plant and global hierarchy and relevant master data and KPIs Use the SAP OEE standard integration (SAP OEEINT) to integrate your ECC and OEE system to establish bi-directional integration between the enterprise and the shop floor Enable your shop floor operator on the OEE Worker UI to handle shop floor production execution Use SAP OEE as a tool for measuring manufacturing performance Enhance and customize SAP OEE to suit your specific requirements Create local plant-based reporting using SAP Lumira and MII Use standard SAP OEE HANA analytics Who This Book Is For SAP MII, ME, and OEE consultants and users who will implement and use the solution.

Going Pro in Data Science

2016-03-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jerry Overton

Agile/Scrum Data Science data data-science

Digging for answers to your pressing business questions probably won’t resemble those tidy case studies that lead you step-by-step from data collection to cool insights. Data science is not so clear-cut in the real world. Instead of high-quality data with the right velocity, variety, and volume, many data scientists have to work with missing or sketchy information extracted from people in the organization. In this O’Reilly report, Jerry Overton—Distinguished Engineer at global IT leader DXC—introduces practices for making good decisions in a messy and complicated world. What he simply calls “data science that works” is a trial-and-error process of creating and testing hypotheses, gathering evidence, and drawing conclusions. These skills are far more useful for practicing data scientists than, say, mastering the details of a machine-learning algorithm. Adapted and expanded from a series of articles Overton published on O’Reilly Radar and on the CSC Blog, each chapter is ideal for current and aspiring data scientists who want to go pro, as well as IT execs and managers looking to hire in this field. The report covers: Using the scientific method to gain a competitive advantage The skill set you need to look for when choosing a data scientist Why practical induction is a key part of thinking like a data scientist Best practices for writing solid code in your data science gig How agile experimentation lets you find answers (or dead ends) much faster Advice for surviving (and even thriving) as a data scientist in your organization

Fundamentals of Big Data Network Analysis for Research and Industry

2016-01-19 · O'Reilly Data Science Books O'Reilly Amazon

book

by Hyunjoung Lee , Il Sohn

Big Data Marketing data data-science data-science-tasks exploratory-data-analysis

Fundamentals of Big Data Network Analysis for Research and Industry Hyunjoung Lee, Institute of Green Technology, Yonsei University, Republic of Korea Il Sohn, Material Science and Engineering, Yonsei University, Republic of Korea Presents the methodology of big data analysis using examples from research and industry There are large amounts of data everywhere, and the ability to pick out crucial information is increasingly important. Contrary to popular belief, not all information is useful; big data network analysis assumes that data is not only large, but also meaningful, and this book focuses on the fundamental techniques required to extract essential information from vast datasets. Featuring case studies drawn largely from the iron and steel industries, this book offers practical guidance which will enable readers to easily understand big data network analysis. Particular attention is paid to the methodology of network analysis, offering information on the method of data collection, on research design and analysis, and on the interpretation of results. A variety of programs including UCINET, NetMiner, R, NodeXL, and Gephi for network analysis are covered in detail. Fundamentals of Big Data Network Analysis for Research and Industry looks at big data from a fresh perspective, and provides a new approach to data analysis. This book: Explains the basic concepts in understanding big data and filtering meaningful data Presents big data analysis within the networking perspective Features methodology applicable to research and industry Describes in detail the social relationship between big data and its implications Provides insight into identifying patterns and relationships between seemingly unrelated big data Fundamentals of Big Data Network Analysis for Research and Industry will prove a valuable resource for analysts, research engineers, industrial engineers, marketing professionals, and any individuals dealing with accumulated large data whose interest is to analyze and identify potential relationships among data sets.

Big Data MBA

2015-12-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Bill Schmarzo (Dell EMC)

Analytics Big Data Data Science business-intelligence data data-science

Integrate big data into business to drive competitive advantage and sustainable success Big Data MBA brings insight and expertise to leveraging big data in business so you can harness the power of analytics and gain a true business advantage. Based on a practical framework with supporting methodology and hands-on exercises, this book helps identify where and how big data can help you transform your business. You'll learn how to exploit new sources of customer, product, and operational data, coupled with advanced analytics and data science, to optimize key processes, uncover monetization opportunities, and create new sources of competitive differentiation. The discussion includes guidelines for operationalizing analytics, optimal organizational structure, and using analytic insights throughout your organization's user experience to customers and front-end employees alike. You'll learn to “think like a data scientist” as you build upon the decisions your business is trying to make, the hypotheses you need to test, and the predictions you need to produce. Business stakeholders no longer need to relinquish control of data and analytics to IT. In fact, they must champion the organization's data collection and analysis efforts. This book is a primer on the business approach to analytics, providing the practical understanding you need to convert data into opportunity. Understand where and how to leverage big data Integrate analytics into everyday operations Structure your organization to drive analytic insights Optimize processes, uncover opportunities, and stand out from the rest Help business stakeholders to “think like a data scientist” Understand appropriate business application of different analytic techniques If you want data to transform your business, you need to know how to put it to use. Big Data MBA shows you how to implement big data and analytics to make better decisions.

Business Statistics Made Easy in SAS

2015-10-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Gregory Lee

AI/ML Analytics Big Data DWH SAS data data-science data-science-tasks statistics

Learn or refresh core statistical methods for business with SAS® and approach real business analytics issues and techniques using a practical approach that avoids complex mathematics and instead employs easy-to-follow explanations.

Business Statistics Made Easy in SAS® is designed as a user-friendly, practice-oriented, introductory text to teach businesspeople, students, and others core statistical concepts and applications. It begins with absolute core principles and takes you through an overview of statistics, data and data collection, an introduction to SAS®, and basic statistics (descriptive statistics and basic associational statistics). The book also provides an overview of statistical modeling, effect size, statistical significance and power testing, basics of linear regression, introduction to comparison of means, basics of chi-square tests for categories, extrapolating statistics to business outcomes, and some topical issues in statistics, such as big data, simulation, machine learning, and data warehousing.

The book steers away from complex mathematical-based explanations, and it also avoids basing explanations on the traditional build-up of distributions, probability theory and the like, which tend to lose the practice-oriented reader. Instead, it teaches the core ideas of statistics through methods such as careful, intuitive written explanations, easy-to-follow diagrams, step-by-step technique implementation, and interesting metaphors.

With no previous SAS experience necessary, Business Statistics Made Easy in SAS® is an ideal introduction for beginners. It is suitable for introductory undergraduate classes, postgraduate courses such as MBA refresher classes, and for the business practitioner. It is compatible with SAS® University Edition.

Microsoft Mapping: Geospatial Development in Windows 10 with Bing Maps and C#, Second Edition

2015-10-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ray Rischpater , Carmen Au

Azure Microsoft SQL data data-engineering geographic-information-system-gis location-data web-mapping

This revised edition of Microsoft Mapping includes the latest details about SQL Server 2014 and the new 3D and Streetside-capable map control for Windows 10 applications. It contains updated chapters on Microsoft Azure and Power Map for Excel plus a new chapter on Bing Maps for Universal Windows. The book tells a story, from beginning to end, of planning and deploying a single geospatial application built using Microsoft technologies from end-to-end. Readers are expected to have basic familiarity with the fundamentals of developing for Microsoft platforms (some understanding of basic SQL, C#, .NET, and WCF); as readers work through the book they will build on their existing skills so that they will be able to deploy geospatial applications for social networking, data collection, enterprise management, or other purposes.

talk-data.com

Activity Trend

Top Events

Top Speakers

Leveraging Human Intelligence For Better AI At Alegion With Cheryl Martin - Episode 38

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

Visual Data Storytelling with Tableau, First edition

Honeycomb Data Infrastructure with Sam Stokes - Episode 20

WEB ANALYTICS: THE MISSING MANUAL

Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15

Neuroimaging and Big Data

Business Research Reporting

Python Web Scraping - Second Edition

Research Methods in Human-Computer Interaction, 2nd Edition

Translating Statistics to Make Decisions: A Guide for the Non-Statistician

Total Survey Error in Practice

FIRESIDE STAND-UP: THE TRUTH OF GOOGLE ANALYTICS NOBODY TOLD YOU

Perspectives on Data Science for Software Engineering

Manufacturing Performance Management using SAP OEE: Implementing and Configuring Overall Equipment Effectiveness

Going Pro in Data Science

Fundamentals of Big Data Network Analysis for Research and Industry

Big Data MBA

Business Statistics Made Easy in SAS

Microsoft Mapping: Geospatial Development in Windows 10 with Bing Maps and C#, Second Edition