Data Analytics

Why You Should Stop Saying Data Literacy part 2 w/ Jordan Morrow

2020-09-16 · Analytics on Fire Listen

podcast_episode

by Mico Yuk (Data Storytelling Academy) , Jordan Morrow (Brainstorm, Inc.)

Analytics BI

Data literacy is one of the most sought after cultural transformations of 2020. In episode #47 we discussed 'Why we should stop saying 'data literacy'. Today's guest, Jordan Morrow, shared the episode on Linkedin sparking off some very interesting and even defensive responses from the data literacy social media mafia. Today, Jordan joins the podcast to share his unique point of view on the topic. Known as the pioneer of data literacy, tune in to learn how to start a data literacy practice, the best audience for data literacy skills, and what the most important data literacy skills are. Knowledge bombs galore!

 [16:43]  - Key Quote: "Not everyone needs to be a data scientist, but everyone should develop skills in data analytics in today's day and age." – Jordan Morrow [22:53]  - The order that data literacy skills should be taught in [29:25]  - Other phrases that could be used instead of data literacy For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/66

Enjoyed the Show? Please leave us a review on iTunes. Check out our sponsor! Are you a BI/Analytics leader who is tired of creating useless reports or dashboards? Are you struggling to get users over to your BI portal? Ever thought about embedding your analytics? If so, then you have to check out Logi Composer, the first ever out of the box development experience for teams who want to get up and running fast! Logi Analytics is offering AoF Listeners a special 14-day trial to get up and running fast! Just visit - logianalytics.com/aof

Analytics Stories

2020-09-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Wayne L. Winston

Analytics data data-science google-analytics web-analytics

Inform your own analyses by seeing how one of the best data analysts in the world approaches analytics problems Analytics Stories: How to Make Good Things Happen is a thoughtful, incisive, and entertaining exploration of the application of analytics to real-world problems and situations. Covering fields as diverse as sports, finance, politics, healthcare, and business, Analytics Stories bridges the gap between the oft inscrutable world of data analytics and the concrete problems it solves. Distinguished professor and author Wayne L. Winston answers questions like: Was Liverpool over Barcelona the greatest upset in sports history? Was Derek Jeter a great infielder What's wrong with the NFL QB rating? How did Madoff keep his fund going? Does a mutual fund’s past performance predict future performance? What caused the Crash of 2008? Can we predict where crimes are likely to occur? Is the lot of the American worker improving? How can analytics save the US Republic? The birth of evidence-based medicine: How did James Lind know citrus fruits cured scurvy? How can I objectively compare hospitals? How can we predict heart attacks in real time? How does a retail store know if you're pregnant? How can I use A/B testing to improve sales from my website? How can analytics help me write a hit song? Perfect for anyone with the word “analyst” in their job title, Analytics Stories illuminates the process of applying analytic principles to practical problems and highlights the potential pitfalls that await careless analysts.

[Replay] - Data Collaboration and A.I. with Adam Weinstein - Making Data Simple [Season 3 - Episode 3]

2020-08-19 · Making Data Simple Listen

podcast_episode

by Adam Weinstein (Cursor) , Al Martin (IBM)

Analytics Big Data IBM Python

Send us a text Adam Weinstein is currently CEO and Co-Founder of Cursor, having worked at LinkedIn as a Senior Manager of Business Development and having founded enGreet, a print-on-demand greeting card company that merged crowd-sourcing with social expressions. In this episode, he describes his data analytics company and provides insight into creating a successful startup.

Shownotes

00:00 - Check us out on YouTube and SoundCloud!

00:10 - Connect with Producer Steve Moore on LinkedIn & Twitter

00:15 - Connect with Producer Liam Seston on LinkedIn & Twitter.

00:20 - Connect with Producer Rachit Sharma on LinkedIn.

00:25 - Connect with Host Al Martin on LinkedIn & Twitter.

00:55 - Connect with Adam Weinstein on LinkedIn.

03:55 - Find out more about Cursor.

06:45 - Learn more about Cursor's Co-Founder and CEO Adam Weinstein.

13:10 - Learn more about Big Data Analytics.

19:20 - What is Python/Jupyter Notebooks?

26:35 - Learn more about Data Fluency.

35:30 - What is a startup? Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Activating Local Music Fans for the 2020 Presidential Election With #iVoted's Emily White

2020-07-22 · How Music Charts Listen

podcast_episode

by Pat Sansone (Wilco) , Emily White , Mike Luba (Madison House)

Analytics

On this episode, we chat with Emily White, a music industry veteran who started her career as a world class tour manager before retiring at 23 to pursue artist management, entrepreneurial endeavors, and academia. Emily has worked with everyone from Dinosaur Jr. to Zac Brown Band, she’s founded and run multiple entertainment companies, released a number of books, and now, when she’s not teaching at NYU’s Tisch School of the Arts, Emily is using music data analytics to help activate voters for the upcoming presidential election in November. The #iVoted initiative, which Emily founded with Madison House co-founder Mike Luba and Wilco’s Pat Sansone, is gearing up to be one of the biggest digital music festivals ever, with dozens of artists performing via webcast nationwide. The cost of admission for fans? A selfie from home with their mail-in ballot or a photo from outside their polling place, though we strongly encourage the former. For a full list of artists performing on Nov. 3, check out iVotedConcerts.com, and full disclaimer: Chartmetric is a proud data partner of the #iVoted initiative. Connect With Emilyhttps://twitter.com/emwizzle https://twitter.com/iVotedConcerts https://twitter.com/collectiveent_ https://www.instagram.com/collectiveentinc/ Connect With Ushttp://podcast.chartmetric.com/http://chartmetric.com/https://blog.chartmetric.comhttps://smarturl.it/chartmetric_social

Learning Spark, 2nd Edition

2020-07-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denny Lee (Databricks) , Brooke Wenig , Jules S. Damji (Anyscale Inc) , Tathagata Das (Databricks)

AI/ML Analytics API Avro CSV Delta Hive Java JSON Kafka ORC Parquet +9 more

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow

Airflow: A beast character in the gaming world

2020-07-01 · Airflow Summit 2020

session

by Naresh Yegireddi (PlayStation) , Patricio Garza (PlayStation)

Airflow Analytics AWS Amazon EC2 Big Data Docker ETL/ELT Python Spark

Being a pioneer for the past 25 years, SONY PlayStation has played a vital role in the Interactive Gaming Industry. Over 100+ million monthly active users, 100+ million PS-4 console sales along with thousands of game development partners across the globe, big-data problem is quite inevitable. This presentation talks about how we scaled Airflow horizontally which has helped us building a stable, scalable and optimal data processing infrastructure powered by Apache Spark, AWS ECS, EC2 and Docker. Due to the demand for processing large volumes of data and also to meet the growing Organization’s data analytics and usage demands, the data team at PlayStation took an initiative to build an open source big data processing infrastructure where Apache Spark in Python as the core ETL engine. Apache Airflow is the core workflow management tool for the entire eco system. We started with an Airflow application running on a single AWS EC2 instance to support parallelism of 16 with 1 scheduler and 1 worker and eventually scaled it to a bigger scheduler along with 4 workers to support a parallelism of 96, DAG concurrency of 96 and a worker task concurrency of 24. Containerized all the services on AWS ECS which gave us an ability to scale Airflow horizontally.

Practical R 4: Applying R to Data Manipulation, Processing and Integration

2020-06-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jon Westfall

Analytics HTML SAS data data-science data-science-tools r

Get started with an accelerated introduction to the R ecosystem, programming language, and tools including R script and RStudio. Utilizing many examples and projects, this book teaches you how to get data into R and how to work with that data using R. Once grounded in the fundamentals, the rest of Practical R 4 dives into specific projects and examples starting with running and analyzing a survey using R and LimeSurvey. Next, you'll carry out advanced statistical analysis using R and MouselabWeb. Then, you’ll see how R can work for you without statistics, including how R can be used to automate data formatting, manipulation, reporting, and custom functions. The final part of this book discusses using R on a server; you’ll build a script with R that can run an RStudio Server and monitor a report source for changes to alert the user when something has changed. This project includes both regular email alerting and push notification. And, finally, you’ll use R to create a customized daily rundown report of a person's most important information such as a weather report, daily calendar, to-do's and more. This demonstrates how to automate such a process so that every morning, the user navigates to the same web page and gets the updated report. What You Will Learn Set up and run an R script, including installation on a new machine and downloading and configuring R Turn any machine into a powerful data analytics platform accessible from anywhere with RStudio Server Write basic R scripts and modify existing scripts to suit your own needs Create basic HTML reports in R, inserting information as needed Build a basic R package and distribute it Who This Book Is For Some prior exposure to statistics, programming, and maybe SAS is recommended but not required.

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

2020-06-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robert Ilijason

AI/ML Analytics AWS Azure Big Data Cloud Computing Confluence Databricks Hadoop Hive Microsoft Python +5 more

Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics solutions cost, while at the same time getting the results you need, incrementally faster. This book explains how the confluence of these pivotal technologies gives you enormous power, and cheaply, when it comes to huge datasets. You will begin by learning how cloud infrastructure makes it possible to scale your code to large amounts of processing units, without having to pay for the machinery in advance. From there you will learn how Apache Spark, an open source framework, can enable all those CPUs for data analytics use. Finally, you will see how services such as Databricks provide the power of Apache Spark, without you having to know anything aboutconfiguring hardware or software. By removing the need for expensive experts and hardware, your resources can instead be allocated to actually finding business value in the data. This book guides you through some advanced topics such as analytics in the cloud, data lakes, data ingestion, architecture, machine learning, and tools, including Apache Spark, Apache Hadoop, Apache Hive, Python, and SQL. Valuable exercises help reinforce what you have learned. What You Will Learn Discover the value of big data analytics that leverage the power of the cloud Get started with Databricks using SQL and Python in either Microsoft Azure or AWS Understand the underlying technology, and how the cloud and Apache Spark fit into the bigger picture See how these tools are used in the real world Run basic analytics, including machine learning, on billions of rows at a fraction of a cost or free Who This Book Is For Data engineers, data scientists, and cloud architects who want or need to run advanced analytics in the cloud. It is assumed that the reader has data experience, but perhaps minimal exposure to Apache Spark and Azure Databricks. The book is also recommended for people who want to get started in the analytics field, as it provides a strong foundation.

SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform

2020-05-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Benjamin Weissman , Enrico van de Laar

AI/ML Analytics BI Big Data Cloud Computing Data Lake HDFS Kubernetes Linux Spark SQL Data Streaming +4 more

Use this guide to one of SQL Server 2019’s most impactful features—Big Data Clusters. You will learn about data virtualization and data lakes for this complete artificial intelligence (AI) and machine learning (ML) platform within the SQL Server database engine. You will know how to use Big Data Clusters to combine large volumes of streaming data for analysis along with data stored in a traditional database. For example, you can stream large volumes of data from Apache Spark in real time while executing Transact-SQL queries to bring in relevant additional data from your corporate, SQL Server database. Filled with clear examples and use cases, this book provides everything necessary to get started working with Big Data Clusters in SQL Server 2019. You will learn about the architectural foundations that are made up from Kubernetes, Spark, HDFS, and SQL Server on Linux. You then are shown how to configure and deploy Big Data Clusters in on-premises environments or in the cloud. Next, you are taught about querying. You will learn to write queries in Transact-SQL—taking advantage of skills you have honed for years—and with those queries you will be able to examine and analyze data from a wide variety of sources such as Apache Spark. Through the theoretical foundation provided in this book and easy-to-follow example scripts and notebooks, you will be ready to use and unveil the full potential of SQL Server 2019: combining different types of data spread across widely disparate sources into a single view that is useful for business intelligence and machine learning analysis. What You Will Learn Install, manage, and troubleshoot Big Data Clusters in cloud or on-premise environments Analyze large volumes of data directly from SQL Server and/or Apache Spark Manage data stored in HDFS from SQL Server as if it wererelational data Implement advanced analytics solutions through machine learning and AI Expose different data sources as a single logical source using data virtualization Who This Book Is For Data engineers, data scientists, data architects, and database administrators who want to employ data virtualization and big data analytics in their environments

Forensic Analytics, 2nd Edition

2020-05-12 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mark J. Nigrini

Analytics BI Power BI SAS Tableau data data-science data-science-tasks statistics

Become the forensic analytics expert in your organization using effective and efficient data analysis tests to find anomalies, biases, and potential fraud—the updated new edition Forensic Analytics reviews the methods and techniques that forensic accountants can use to detect intentional and unintentional errors, fraud, and biases. This updated second edition shows accountants and auditors how analyzing their corporate or public sector data can highlight transactions, balances, or subsets of transactions or balances in need of attention. These tests are made up of a set of initial high-level overview tests followed by a series of more focused tests. These focused tests use a variety of quantitative methods including Benford’s Law, outlier detection, the detection of duplicates, a comparison to benchmarks, time-series methods, risk-scoring, and sometimes simply statistical logic. The tests in the new edition include the newly developed vector variation score that quantifies the change in an array of data from one period to the next. The goals of the tests are to either produce a small sample of suspicious transactions, a small set of transaction groups, or a risk score related to individual transactions or a group of items. The new edition includes over two hundred figures. Each chapter, where applicable, includes one or more cases showing how the tests under discussion could have detected the fraud or anomalies. The new edition also includes two chapters each describing multi-million-dollar fraud schemes and the insights that can be learned from those examples. These interesting real-world examples help to make the text accessible and understandable for accounting professionals and accounting students without rigorous backgrounds in mathematics and statistics. Emphasizing practical applications, the new edition shows how to use either Excel or Access to run these analytics tests. The book also has some coverage on using Minitab, IDEA, R, and Tableau to run forensic-focused tests. The use of SAS and Power BI rounds out the software coverage. The software screenshots use the latest versions of the software available at the time of writing. This authoritative book: Describes the use of statistically-based techniques including Benford’s Law, descriptive statistics, and the vector variation score to detect errors and anomalies Shows how to run most of the tests in Access and Excel, and other data analysis software packages for a small sample of the tests Applies the tests under review in each chapter to the same purchasing card data from a government entity Includes interesting cases studies throughout that are linked to the tests being reviewed. Includes two comprehensive case studies where data analytics could have detected the frauds before they reached multi-million-dollar levels Includes a continually-updated companion website with the data sets used in the chapters, the queries used in the chapters, extra coverage of some topics or cases, end of chapter questions, and end of chapter cases. Written by a prominent educator and researcher in forensic accounting and auditing, the new edition of Forensic Analytics: Methods and Techniques for Forensic Accounting Investigations is an essential resource for forensic accountants, auditors, comptrollers, fraud investigators, and graduate students.

Taming Complexity In Your Data Driven Organization With DataOps

2020-04-28 · Data Engineering Podcast Listen

podcast_episode

by Chris Bergh (Data Kitchen) , Tobias Macey

Agile/Scrum AI/ML Analytics Data Engineering Data Management DataOps Kubernetes

Summary Data is a critical element to every role in an organization, which is also what makes managing it so challenging. With so many different opinions about which pieces of information are most important, how it needs to be accessed, and what to do with it, many data projects are doomed to failure. In this episode Chris Bergh explains how taking an agile approach to delivering value can drive down the complexity that grows out of the varied needs of the business. Building a DataOps workflow that incorporates fast delivery of well defined projects, continuous testing, and open lines of communication is a proven path to success.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! If DataOps sounds like the perfect antidote to your pipeline woes, DataKitchen is here to help. DataKitchen’s DataOps Platform automates and coordinates all the people, tools, and environments in your entire data analytics organization – everything from orchestration, testing and monitoring to development and deployment. In no time, you’ll reclaim control of your data pipelines so you can start delivering business value instantly, without errors. Go to dataengineeringpodcast.com/datakitchen today to learn more and thank them for supporting the show! Your host is Tobias Macey and today I’m welcoming back Chris Bergh to talk about ways that DataOps principles can help to reduce organizational complexity

Interview

Introduction How did you get involved in the area of data management? How are typical data and analytic teams organized? What are their roles and structure? Can you start by giving an outline of the ways that complexity can manifest in a data organization?

What are some of the contributing factors that generate this complexity? How does the size or scale of an organization and their data needs impact the segmentation of responsibilities and roles?

How does this organizational complexity play out within a single team? For example between data engineers, data scientists, and production/operations? How do you approach the definition of useful interfaces between different roles or groups within an organization?

What are your thoughts on the relationship between the multivariate complexities of data and analytics workflows and the software trend toward microservices as a means of addressing the challenges of organizational communication patterns in the software lifecycle?

How does this organizational complexity play out between multiple teams? For example between centralized data team and line of business self service teams? Isn’t organizational complexity just ‘the way it is’? Is there any how in getting out of meetings and inter team conflict? What are some of the technical elements that are most impactful in reducing the time to delivery for different roles? What are some strategies that you have found to be useful for maintaining a connection to the business need throughout the different stages of the data lifecycle? What are some of the signs or symptoms of problematic complexity that individuals and organizations should keep an eye out for? What role can automated testing play in improving this process? How do the current set of tools contribute to the fragmentation of data wor

ML Ops: Operationalizing Data Science

2020-04-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Dev Kannabiran , Michael O’Connell (TIBCO Software) , Dan Rope , Thomas Hill , Steven Hillion (Astronomer) , David Sweenor

AI/ML Analytics Data Science MLOps data data-science

More than half of the analytics and machine learning (ML) models created by organizations today never make it into production. Instead, many of these ML models do nothing more than provide static insights in a slideshow. If they aren’t truly operational, these models can’t possibly do what you’ve trained them to do. This report introduces practical concepts to help data scientists and application engineers operationalize ML models to drive real business change. Through lessons based on numerous projects around the world, six experts in data analytics provide an applied four-step approach—Build, Manage, Deploy and Integrate, and Monitor—for creating ML-infused applications within your organization. You’ll learn how to: Fulfill data science value by reducing friction throughout ML pipelines and workflows Constantly refine ML models through retraining, periodic tuning, and even complete remodeling to ensure long-term accuracy Design the ML Ops lifecycle to ensure that people-facing models are unbiased, fair, and explainable Operationalize ML models not only for pipeline deployment but also for external business systems that are more complex and less standardized Put the four-step Build, Manage, Deploy and Integrate, and Monitor approach into action

Strategic Analytics: The Insights You Need from Harvard Business Review

2020-04-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Eric Siegel (Machine Learning Week; Columbia University) , Thomas Davenport (Babson College) , Cassie Kozyrkov (Google) , Harvard Business Review , Edward L. Glaeser

AI/ML Analytics Blockchain analytics-platforms data data-science

Is your company ready for the next wave of analytics? Data analytics offer the opportunity to predict the future, use advanced technologies, and gain valuable insights about your business. But unless you're staying on top of the latest developments, your company is wasting that potential--and your competitors will be gaining speed while you fall behind. Strategic Analytics: The Insights You Need from Harvard Business Review will provide you with today's essential thinking about what data analytics are capable of, what critical talents your company needs to reap their benefits, and how to adopt analytics throughout your organization--before it's too late. Business is changing. Will you adapt or be left behind? Get up to speed and deepen your understanding of the topics that are shaping your company's future with the Insights You Need from Harvard Business Review series. Featuring HBR's smartest thinking on fast-moving issues--blockchain, cybersecurity, AI, and more--each book provides the foundational introduction and practical case studies your organization needs to compete today and collects the best research, interviews, and analysis to get it ready for tomorrow. You can't afford to ignore how these issues will transform the landscape of business and society. The Insights You Need series will help you grasp these critical ideas--and prepare you and your company for the future.

Building A Knowledge Graph Of Commercial Real Estate At Cherre

2020-04-07 · Data Engineering Podcast Listen

podcast_episode

by John Maiden (Cherre) , Tobias Macey

AI/ML Analytics Big Data Data Engineering Data Management Kubernetes Data Streaming

Summary Knowledge graphs are a data resource that can answer questions beyond the scope of traditional data analytics. By organizing and storing data to emphasize the relationship between entities, we can discover the complex connections between multiple sources of information. In this episode John Maiden talks about how Cherre builds knowledge graphs that provide powerful insights for their customers and the engineering challenges of building a scalable graph. If you’re wondering how to extract additional business value from existing data, this episode will provide a way to expand your data resources.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on great conferences. We have partnered with organizations such as ODSC, and Data Council. Upcoming events include ODSC East which has gone virtual starting April 16th. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing John Maiden about how Cherre is building and using a knowledge graph of commercial real estate information

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Cherre is and the role that data plays in the business? What are the benefits of a knowledge graph for making real estate investment decisions? What are the main ways that you and your customers are using the knowledge graph?

What are some of the challenges that you face in providing a usable interface for end-users to query the graph?

What technology are you using for storing and processing the graph?

What challenges do you face in scaling the complexity and analysis of the graph?

What are the main sources of data for the knowledge graph? What are some of the ways that messiness manifests in the data that you are using to populate the graph?

How are you managing cleaning of the data and how do you identify and process records that can’t be coerced into the desired structure? How do you handle missing attributes or extra attributes in a given record?

How did you approach the process of determining an effective taxonomy for records in the graph? What is involved in performing entity extraction on your data? What are some of the most interesting or unexpected questions that you have been able to ask and answer with the graph? What are some of the most interesting/unexpected/challenging lessons that you have learned in the process of working with this data? What are some of the near and medium term improvements that you have planned for your knowledge graph? What advice do you have for anyone who is interested in building a knowledge graph of their own?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for liste

End-to-end Data Analytics for Product Development

2020-04-06 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mattia De Dominicis , Chris Jones , Luigi Salmaso , Rosa Arboretti Giancristofaro

Analytics data data-science data-science-tasks exploratory-data-analysis

An interactive guide to the statistical tools used to solve problems during product and process innovation End to End Data Analytics for Product Development is an accessible guide designed for practitioners in the industrial field. It offers an introduction to data analytics and the design of experiments (DoE) whilst covering the basic statistical concepts useful to an understanding of DoE. The text supports product innovation and development across a range of consumer goods and pharmaceutical organizations in order to improve the quality and speed of implementation through data analytics, statistical design and data prediction. The book reviews information on feasibility screening, formulation and packaging development, sensory tests, and more. The authors – noted experts in the field – explore relevant techniques for data analytics and present the guidelines for data interpretation. In addition, the book contains information on process development and product validation that can be optimized through data understanding, analysis and validation. The authors present an accessible, hands-on approach that uses MINITAB and JMP software. The book: • Presents a guide to innovation feasibility and formulation and process development • Contains the statistical tools used to solve challenges faced during product innovation and feasibility • Offers information on stability studies which are common especially in chemical or pharmaceutical fields • Includes a companion website which contains videos summarizing main concepts Written for undergraduate students and practitioners in industry, End to End Data Analytics for Product Development offers resources for the planning, conducting, analyzing and interpreting of controlled tests in order to develop effective products and processes.

Modern Big Data Architectures

2020-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dominik Ryzko

AI/ML Analytics Big Data Data Science data data-engineering

Provides an up-to-date analysis of big data and multi-agent systems The term Big Data refers to the cases, where data sets are too large or too complex for traditional data-processing software. With the spread of new concepts such as Edge Computing or the Internet of Things, production, processing and consumption of this data becomes more and more distributed. As a result, applications increasingly require multiple agents that can work together. A multi-agent system (MAS) is a self-organized computer system that comprises multiple intelligent agents interacting to solve problems that are beyond the capacities of individual agents. Modern Big Data Architectures examines modern concepts and architecture for Big Data processing and analytics. This unique, up-to-date volume provides joint analysis of big data and multi-agent systems, with emphasis on distributed, intelligent processing of very large data sets. Each chapter contains practical examples and detailed solutions suitable for a wide variety of applications. The author, an internationally-recognized expert in Big Data and distributed Artificial Intelligence, demonstrates how base concepts such as agent, actor, and micro-service have reached a point of convergence—enabling next generation systems to be built by incorporating the best aspects of the field. This book: Illustrates how data sets are produced and how they can be utilized in various areas of industry and science Explains how to apply common computational models and state-of-the-art architectures to process Big Data tasks Discusses current and emerging Big Data applications of Artificial Intelligence Modern Big Data Architectures: A Multi-Agent Systems Perspective is a timely and important resource for data science professionals and students involved in Big Data analytics, and machine and artificial learning.

Open Source Data Pipelines for Intelligent Applications

2020-03-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kyle Bader , Daniel Riek , Sherard Griffin , Pete Brey , Nathan LeClaire

Analytics Big Data Kubernetes data data-engineering

For decades, businesses have used information about their customers to make critical decisions on what to stock in inventory, which items to recommend to customers, and when to run promotions. But the advent of big data early in this century changed the game considerably. The key to achieving a competitive advantage today is the ability to process and store ever-increasing amounts of information that affect those decisions. In this report, solutions specialists from Red Hat provide an architectural guide to help you navigate the modern data analytics ecosystem. You’ll learn how the industry has evolved and examine current approaches to storage. That includes a deep dive into the anatomy of a portable data platform architecture, along with several aspects of running data pipelines and intelligent applications with Kubernetes. Explore the history of open source data processing and the evolution of container scheduling Get a concise overview of intelligent applications Learn how to use storage with Kubernetes to produce effective intelligent applications Understand how to structure applications on Kubernetes in your platform architecture Delve into example pipeline architectures for deploying intelligent applications on Kubernetes

Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics

2020-02-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Dan Clark

Analytics BI Data Modelling DAX Microsoft Power BI business-intelligence data data-science microsoft-power-platform power-bi

Analyze company data quickly and easily using Microsoft’s powerful data tools. Learn to build scalable and robust data models, clean and combine different data sources effectively, and create compelling and professional visuals. Beginning Power BI is a hands-on, activity-based guide that takes you through the process of analyzing your data using the tools that that encompass the core of Microsoft’s self-service BI offering. Starting with Power Query, you will learn how to get data from a variety of sources, and see just how easy it is to clean and shape the data prior to importing it into a data model. Using Power BI tabular and the Data Analysis Expressions (DAX), you will learn to create robust scalable data models which will serve as the foundation of your data analysis. From there you will enter the world of compelling interactive visualizations to analyze and gain insight into your data. You will wrap up your Power BI journey by learning how to package and share your reports and dashboards with your colleagues. Author Dan Clark takes you through each topic using step-by-step activities and plenty of screen shots to help familiarize you with the tools. This third edition covers the new and evolving features in the Power BI platform and new chapters on data flows and composite models. This book is your hands-on guide to quick, reliable, and valuable data insight. What You Will Learn Simplify data discovery, association, and cleansing Build solid analytical data models Create robust interactive data presentations Combine analytical and geographic data in map-based visualizations Publish and share dashboards and reports Who This Book Is For Business analysts, database administrators, developers, and other professionals looking to better understand and communicate with data

Principles of Managerial Statistics and Data Science

2020-02-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Roberto Rivera

Analytics Big Data Data Science DataViz data data-science data-science-tasks statistics

Introduces readers to the principles of managerial statistics and data science, with an emphasis on statistical literacy of business students Through a statistical perspective, this book introduces readers to the topic of data science, including Big Data, data analytics, and data wrangling. Chapters include multiple examples showing the application of the theoretical aspects presented. It features practice problems designed to ensure that readers understand the concepts and can apply them using real data. Over 100 open data sets used for examples and problems come from regions throughout the world, allowing the instructor to adapt the application to local data with which students can identify. Applications with these data sets include: Assessing if searches during a police stop in San Diego are dependent on driver’s race Visualizing the association between fat percentage and moisture percentage in Canadian cheese Modeling taxi fares in Chicago using data from millions of rides Analyzing mean sales per unit of legal marijuana products in Washington state Topics covered in Principles of Managerial Statistics and Data Science include:data visualization; descriptive measures; probability; probability distributions; mathematical expectation; confidence intervals; and hypothesis testing. Analysis of variance; simple linear regression; and multiple linear regression are also included. In addition, the book offers contingency tables, Chi-square tests, non-parametric methods, and time series methods. The textbook: Includes academic material usually covered in introductory Statistics courses, but with a data science twist, and less emphasis in the theory Relies on Minitab to present how to perform tasks with a computer Presents and motivates use of data that comes from open portals Focuses on developing an intuition on how the procedures work Exposes readers to the potential in Big Data and current failures of its use Supplementary material includes: a companion website that houses PowerPoint slides; an Instructor's Manual with tips, a syllabus model, and project ideas; R code to reproduce examples and case studies; and information about the open portal data Features an appendix with solutions to some practice problems Principles of Managerial Statistics and Data Science is a textbook for undergraduate and graduate students taking managerial Statistics courses, and a reference book for working business professionals.

Tableau Desktop Certified Associate: Exam Guide

2019-12-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by Fabian Peri , Dmitry Anoshin , JC Gillet , Radhika Biyani , Gleb Makarenko

Analytics BI Tableau data data-science data-science-tasks data-visualization

Tableau Desktop Certified Associate: Exam Guide is your companion for mastering Tableau and preparing for the certification exam with confidence. Through this book, you will gain a comprehensive understanding of Tableau Desktop's features and learn to implement them in practical scenarios to solve analytics challenges. What this Book will help me do Understand and apply Tableau best practices for analyzing and visualizing data effectively. Visualize geographic data using vector maps and gain insights into spatial distributions. Leverage advanced analytics techniques such as forecasting to predict key metrics. Build effective dashboards that convey information clearly and efficiently. Gain confidence in tackling Tableau Desktop Certified Associate exam questions with expert tips and mock exams. Author(s) The authors, Dmitry Anoshin, JC Gillet, Peri Biyani, and others, are experienced professionals in data analytics and business intelligence. With significant expertise in teaching and applying Tableau, they bring a wealth of knowledge to this guide, offering clear instructions and practical insights. Their dedication to empowering learners fosters a supportive and assured journey through this book. Who is it for? This book is ideal for business analysts, BI professionals, and data analysts aiming to become certified Tableau Desktop Associates. If you have a foundational understanding of Tableau Desktop and are looking to deepen your expertise while preparing for certification, this book is tailored to help you achieve that goal.

talk-data.com

Activity Trend

Top Events

Top Speakers

Why You Should Stop Saying Data Literacy part 2 w/ Jordan Morrow

Analytics Stories

[Replay] - Data Collaboration and A.I. with Adam Weinstein - Making Data Simple [Season 3 - Episode 3]

Activating Local Music Fans for the 2020 Presidential Election With #iVoted's Emily White

Learning Spark, 2nd Edition

Airflow: A beast character in the gaming world

Practical R 4: Applying R to Data Manipulation, Processing and Integration

Beginning Apache Spark Using Azure Databricks: Unleashing Large Cluster Analytics in the Cloud

SQL Server Big Data Clusters: Data Virtualization, Data Lake, and AI Platform

Forensic Analytics, 2nd Edition

Taming Complexity In Your Data Driven Organization With DataOps

ML Ops: Operationalizing Data Science

Strategic Analytics: The Insights You Need from Harvard Business Review

Building A Knowledge Graph Of Commercial Real Estate At Cherre

End-to-end Data Analytics for Product Development

Modern Big Data Architectures

Open Source Data Pipelines for Intelligent Applications

Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics

Principles of Managerial Statistics and Data Science

Tableau Desktop Certified Associate: Exam Guide