Analytics

Face Mask Sentiment Analysis

2020-11-27 · Data Skeptic Listen

podcast_episode

by Kyle Polich , Jonathan Lai (University of Rochester) , Jiebo Luo (University of Rochester) , Neil Yeung (University of Rochester)

AI/ML Computer Science Data Science

As the COVID-19 pandemic continues, the public (or at least those with Twitter accounts) are sharing their personal opinions about mask-wearing via Twitter. What does this data tell us about public opinion? How does it vary by demographic? What, if anything, can make people change their minds? Today we speak to, Neil Yeung and Jonathan Lai, Undergraduate students in the Department of Computer Science at the University of Rochester, and Professor of Computer Science, Jiebo-Luoto to discuss their recent paper. Face Off: Polarized Public Opinions on Personal Face Mask Usage during the COVID-19 Pandemic. Works Mentioned https://arxiv.org/abs/2011.00336 Emails: Neil Yeung [email protected] Jonathan Lia [email protected] Jiebo Luo [email protected] Thanks to our sponsors! Springboard School of Data offers a comprehensive career program encompassing data science, analytics, engineering, and Machine Learning. All courses are online and tailored to fit the lifestyle of working professionals. Up to 20 Data Skeptic listeners will receive $500 scholarships. Apply today at springboard.com/datasketpic Check out Brilliant's group theory course to learn about object-oriented design! Brilliant is great for learning something new or to get an easy-to-look-at review of something you already know. Check them out a Brilliant.org/dataskeptic to get 20% off of a year of Brilliant Premium!

BI e Digital Analytics - Data Hackers Podcast 33

2020-11-27 · Data Hackers Listen

podcast_episode

BI

Seja bem-vindo a mais um episódio no podcast do Data Hackers! Dessa vez nós falamos sobre o passado, presente e futuro do BI, e como a XP Inc. vem usando essa tecnologia no seu dia a dia para tomada de decisão. Você irá conhecer como eles tem estruturado seus times para responder perguntas de negócio de forma mais rápida e eficiente.

What Is a Data Lake?

2020-11-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alex Gorelik

AWS Azure BI Big Data Cloud Computing Data Governance Data Lake Data Management GCP Microsoft data data-engineering +2 more

A revolution is occurring in data management regarding how data is collected, stored, processed, governed, managed, and provided to decision makers. The data lake is a popular approach that harnesses the power of big data and marries it with the agility of self-service. With this report, IT executives and data architects will focus on the technical aspects of building a data lake for your organization. Alex Gorelik from Facebook explains the requirements for building a successful data lake that business users can easily access whenever they have a need. You'll learn the phases of data lake maturity, common mistakes that lead to data swamps, and the importance of aligning data with your company's business strategy and gaining executive sponsorship. You'll explore: The ingredients of modern data lakes, such as the use of different ingestion methods for different data formats, and the importance of the three Vs: volume, variety, and velocity Building blocks of successful data lakes, including data ingestion, integration, persistence, data governance, and business intelligence and self-service analytics State-of-the-art data lake architectures offered by Amazon Web Services, Microsoft Azure, and Google Cloud

SUSE and IBM Power Systems for SAP HANA

2020-11-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael Tabron , Alex Cabanes

IBM Linux SAP data data-engineering

For organizations charting their way forward in today's digital economy, the clear imperative is to find better ways of extracting more value from data. By gleaning insight from data regarding customer preferences and business operations, organizations can respond to demand more effectively and better deliver the experiences that today's customers want. To this end, many organizations running SAP solutions seek to make the move to the SAP HANA database. SAP HANA offers the speed of in-memory data processing and the ability to combine transactions and analytics on a single platform for insight in real time. However, considerations at the level of IT infrastructure can make or break the success of an SAP HANA implementation. What the database runs on, in other words, matters significantly. This IBM® Redguide publication explores the value of deploying SAP HANA on SUSE Linux Enterprise Server for SAP Applications and the IBM Power platform with IBM POWER9™ processors. Both offerings are optimized to help your organization reap the rewards of SAP HANA while also transforming IT service delivery more generally. Designed for enterprise-grade operations, SUSE Linux Enterprise Server for SAP Applications offers an open-source software-defined infrastructure (SDI) that is optimized for SAP workloads. Reliable, fast, and secure, it also supports the automation that is needed to substantially free up IT staff from service deployment and management duties. Power Systems servers support SAP HANA implementations according to the SAP Tailored Data Center Integration (TDI) 5.0 specification. Optimized for scale-up and scale-out scenarios and built to support virtual persistent memory, Power Systems serves help you provision faster, scale affordably, and maximize uptime by persisting memory across virtual machines (VMs) and multiple SAP HANA instances. Both SUSE and IBM have partnered with SAP for decades to fine-tune these offerings. Together, SUSE and IBM solutions offer a way forward for deploying, optimizing, and running SAP HANA implementations that is proven to be successful. This publication looks at various aspects of this combined offering in greater detail.

Keeping A Bigeye On The Data Quality Market

2020-11-23 · Data Engineering Podcast Listen

podcast_episode

by Egor Gryaznov (Bigeye) , Tobias Macey

Airflow BI BigEye CI/CD Cloud Computing Data Analytics Data Engineering Data Governance Data Management Data Quality Datafold dbt +3 more

Summary One of the oldest aphorisms about data is "garbage in, garbage out", which is why the current boom in data quality solutions is no surprise. With the growth in projects, platforms, and services that aim to help you establish and maintain control of the health and reliability of your data pipelines it can be overwhelming to stay up to date with how they all compare. In this episode Egor Gryaznov, CTO of Bigeye, joins the show to explore the landscape of data quality companies, the general strategies that they are using, and what problems they solve. He also shares how his own product is designed and the challenges that are involved in building a system to help data engineers manage the complexity of a data platform. If you are wondering how to get better control of your own pipelines and the traps to avoid then this episode is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Your host is Tobias Macey and today I’m interviewing Egor Gryaznov about the state of the industry for data quality management and what he is building at B

Leading with AI and Analytics: Build Your Data Science IQ to Drive Business Value

2020-11-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Florian Zettelmeyer , Eric Anderson

AI/ML Data Science data data-science

Lead your organization to become evidence-driven Data. It’s the benchmark that informs corporate projections, decision-making, and analysis. But, why do many organizations that see themselves as data-driven fail to thrive? In Leading with AI and Analytics, two renowned experts from the Kellogg School of Management show business leaders how to transform their organization to become evidence-driven, which leads to real, measurable changes that can help propel their companies to the top of their industries. The availability of unprecedented technology-enabled tools has made AI (Artificial Intelligence) an essential component of business analytics. But what’s often lacking are the leadership skills to integrate these technologies to achieve maximum value. Here, the authors provide a comprehensive game plan for developing that all-important human factor to get at the heart of data science: the ability to apply analytical thinking to real-world problems. Each of these tools and techniques comes to powerful life through a wealth of powerful case studies and real-world success stories. Inside, you’ll find the essential tools to help you: Written for anyone in a leadership or management role—from C-level/unit team managers to rising talent—this powerful, hands-on guide meets today’s growing need for real-world tools to lead and succeed with data. Develop a strong data science intuition quotient Lead and scale AI and analytics throughout your organization Move from “best-guess” decision making to evidence-based decisions Craft strategies and tactics to create real impact

Al and Nancy discuss Data and AI and its impact on sports

2020-11-18 · Making Data Simple Listen

podcast_episode

by Nancy Hensley (IBM Hybrid Cloud) , Al Martin (IBM)

AI/ML Big Data IBM Marketing

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, Data and AI Expert Services and Learning at IBM, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts.

This week on Making Data Simple, we have Nancy Hensley, Nancy is currently the Chief Marketing and Product Officer for Stats Perform. Nancy was the Chief Digital Officer at IBM.

Show Notes 1:37 – Nancy’s bio 3:10 - Are we talking Money Ball? 5:52 - On Base percentage 7:08 – Analyse examples 10:02 – Do you control the data? 11:24 – Out there statistics 14:12 - Can analytics go to far? 17:35 – Real time analysis 18:45 – Covid and sports 21:15 – Your role in sports betting 22:50 – What’s the most fascinating thing you’ve learned? 25:23 – What’s the future?

Website - Stats Perform Money Ball Stats Perform - Twitter Bill James – Baseball Abstract The Analyst Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Self Service Data Management From Ingest To Insights With Isima

2020-11-17 · Data Engineering Podcast Listen

podcast_episode

by Darshan Rawal (Isima) , Tobias Macey

AI/ML Airflow API BI CI/CD Cloud Computing Data Analytics Data Engineering Data Governance Data Management Data Quality Datafold +4 more

Summary The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often takes the form of business intelligence dashboards, machine learning models, or APIs on top of a cleaned and curated data set. Despite the rapid progression of impressive tools and products built to fulfill this mission, it is still an uphill battle to tie everything together into a cohesive and reliable platform. At Isima they decided to reimagine the entire ecosystem from the ground up and built a single unified platform to allow end-to-end self service workflows from data ingestion through to analysis. In this episode CEO and co-founder of Isima Darshan Rawal explains how the biOS platform is architected to enable ease of use, the challenges that were involved in building an entirely new system from scratch, and how it can integrate with the rest of your data platform to allow for incremental adoption. This was an interesting and contrarian take on the current state of the data management industry and is worth a listen to gain some additional perspective.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Follow go.datafold.com/dataengineeringpodcast to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help y

Python for Algorithmic Trading

2020-11-12 · O'Reilly Data Science Books O'Reilly Amazon

book

by Yves Hilpisch

AI/ML NumPy Pandas Python Data Streaming data data-science

Algorithmic trading, once the exclusive domain of institutional players, is now open to small organizations and individual traders using online platforms. The tool of choice for many traders today is Python and its ecosystem of powerful packages. In this practical book, author Yves Hilpisch shows students, academics, and practitioners how to use Python in the fascinating field of algorithmic trading. You'll learn several ways to apply Python to different aspects of algorithmic trading, such as backtesting trading strategies and interacting with online trading platforms. Some of the biggest buy- and sell-side institutions make heavy use of Python. By exploring options for systematically building and deploying automated algorithmic trading strategies, this book will help you level the playing field. Set up a proper Python environment for algorithmic trading Learn how to retrieve financial data from public and proprietary data sources Explore vectorization for financial analytics with NumPy and pandas Master vectorized backtesting of different algorithmic trading strategies Generate market predictions by using machine learning and deep learning Tackle real-time processing of streaming data with socket programming tools Implement automated algorithmic trading strategies with the OANDA and FXCM trading platforms

Understanding Your Pandora Data With Dan Wissinger and Jay Troop

2020-11-10 · How Music Charts Listen

podcast_episode

by Dan Wissinger (Pandora) , Jay Troop (Pandora)

If you haven’t heard the news, we’ve recently become the first third-party music analytics company to host Pandora data publicly, which includes stream counts, monthly listeners, and station adds for hundreds of thousands of artists. So, we’re especially excited about our guests today: Dan Wissinger and Jay Troop. Dan is currently a Senior Product Manager at Pandora, where he spearheads the Next Big Sound and AMP product teams, and Jay is a Senior Analyst for Next Big Sound and AMP. He’s also responsible for Artist & Industry Insights at Pandora writ large. On this episode, we introduce you to Dan, Jay, and Pandora; explain why Pandora matters to the music industry and to artists’ careers; and give you some strategies for making sense of your Pandora data. Speaking of which, if you’re not familiar with Next Big Sound, it’s the OG in music analytics, so we highly suggest checking it out as soon as you can, and the same goes for AMP, which is an artist’s best friend on Pandora. Connect With Us (@chartmetric)http://chartmetric.com/https://blog.chartmetric.comhttps://smarturl.it/chartmetric_social

Building A Cost Effective Data Catalog With Tree Schema

2020-11-10 · Data Engineering Podcast Listen

podcast_episode

by Grant Seward (Tree Schema) , Tobias Macey

Airflow BI CI/CD Cloud Computing Data Analytics Data Engineering Data Governance Data Management Data Quality Datafold dbt ETL/ELT +2 more

Summary A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external. While there are a number of platforms available for building that catalog, many of them are either difficult to deploy and integrate, or expensive to use at scale. In this episode Grant Seward explains how he built Tree Schema to be an easy to use and cost effective option for organizations to build their data catalogs. He also shares the internal architecture, how he approached the design to make it accessible and easy to use, and how it autodiscovers the schemas and metadata for your source systems.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Follow go.datafold.com/dataengineeringpodcast to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Your host is Tobias Macey and today I’m interviewing Grant Seward about Tree Schema, a human friendly data catalog

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what you have built at Tree Schema?

What was your motivation for creating it?

At what stage of maturity should a team or organization

Empowered by Data

2020-11-10 · O'Reilly Data Science Books O'Reilly Amazon

book

by Eva Murray

data data-science web-analytics

Learn to build an analytics community in your organization from scratch How to Build a Data Community shows readers how to create analytics and data communities within their organizations. Celebrated author Eva Murray relies on intuitive and practical advice structured as step-by-step guidance to demonstrate the creation of new data communities. How to Build a Data Community uses concrete insights gleaned from real-world case studies to describe, in full detail, all the critical components of a data community. Readers will discover: What analytics communities are and what they look like Why data-driven organizations need analytics communities How selected businesses and nonprofits have applied these concepts successfully and what their journey to a data-driven culture looked like. How they can establish their own communities and what they can do to ensure their community grows and flourishes Perfect for analytics professionals who are responsible for making policy-level decisions about data in their firms, the book is also a must-have for data practitioners and consultants who wish to make positive changes in the organizations with which they work.

Big Data Management

2020-11-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Peter Ghavami

Big Data Data Analytics Data Management Data Quality Hadoop Cyber Security data data-engineering

Data analytics is core to business and decision making. The rapid increase in data volume, velocity and variety offers both opportunities and challenges. While open source solutions to store big data, like Hadoop, offer platforms for exploring value and insight from big data, they were not originally developed with data security and governance in mind. Big Data Management discusses numerous policies, strategies and recipes for managing big data. It addresses data security, privacy, controls and life cycle management offering modern principles and open source architectures for successful governance of big data. The author has collected best practices from the world’s leading organizations that have successfully implemented big data platforms. The topics discussed cover the entire data management life cycle, data quality, data stewardship, regulatory considerations, data council, architectural and operational models are presented for successful management of big data. The book is a must-read for data scientists, data engineers and corporate leaders who are implementing big data platforms in their organizations.

IoT-Based Data Analytics for the Healthcare Industry

2020-11-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Anil Kumar Pandey , Ankit Chaudhary , Sandeep S Udmale , Sanjay Kumar Singh , Ravi Shankar Singh

AI/ML Data Analytics Data Collection IoT Data Streaming data data-science healthcare-analytics

IoT Based Data Analytics for the Healthcare Industry: Techniques and Applications explores recent advances in the analysis of healthcare industry data through IoT data analytics. The book covers the analysis of ubiquitous data generated by the healthcare industry, from a wide range of sources, including patients, doctors, hospitals, and health insurance companies. The book provides AI solutions and support for healthcare industry end-users who need to analyze and manipulate this vast amount of data. These solutions feature deep learning and a wide range of intelligent methods, including simulated annealing, tabu search, genetic algorithm, ant colony optimization, and particle swarm optimization. The book also explores challenges, opportunities, and future research directions, and discusses the data collection and pre-processing stages, challenges and issues in data collection, data handling, and data collection set-up. Healthcare industry data or streaming data generated by ubiquitous sensors cocooned into the IoT requires advanced analytics to transform data into information. With advances in computing power, communications, and techniques for data acquisition, the need for advanced data analytics is in high demand. Provides state-of-art methods and current trends in data analytics for the healthcare industry Addresses the top concerns in the healthcare industry using IoT and data analytics, and machine learning and deep learning techniques Discusses several potential AI techniques developed using IoT for the healthcare industry Explores challenges, opportunities, and future research directions, and discusses the data collection and pre-processing stages

Add Version Control To Your Data Lake With LakeFS

2020-11-03 · Data Engineering Podcast Listen

podcast_episode

by Oz Katz (Treeverse) , Einat Orr (Treeverse) , Tobias Macey

Cloud Computing Data Analytics Data Engineering Data Governance Data Lake Data Management Datadog Git Kubernetes S3 SaaS Cyber Security

Summary Data lakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional complexities to consider, including how to safely integrate new data sources or test out changes to existing pipelines. In order to address these challenges the team at Treeverse created LakeFS to introduce version control capabilities to your storage layer. In this episode Einat Orr and Oz Katz explain how they implemented branching and merging capabilities for object storage, best practices for how to use versioning primitives to introduce changes to your data lake, how LakeFS is architected, and how you can start using it for your own data platform.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. Your host is Tobias Macey and today I’m interviewing Einat Orr and Oz Katz about their work at Treeverse on the LakeFS system for versioning your data lakes the same way you version your code.

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what LakeFS is and why you built it?

There are a number of tools and platforms that support data virtualization and data versioning. How does LakeFS compare to the available options? (e.g. Alluxio, Denodo, Pachyderm, DVC, etc.)

What are the primary use cases that LakeFS enables? For someone who wants to use LakeFS what is involved in getting it set up? How is LakeFS implemented?

How has the design of the system changed or evolved since you began working on it? What assumptions did you have going into it which have since been invalidated or modified?

How does the workflow for an engineer or analyst change from working directly against S3 to running against the LakeFS interface? How do you handle merge conflicts and resolution?

What

Simon Crosby: Continuous Intelligence with Machine Learning, Digital Twin and Knowledge Graphs

2020-10-30 · Secrets of Data Analytics Leaders Listen

podcast_episode

by Simon Crosby (Swim Inc.) , Kevin Petrie (Eckerson Group)

AI/ML CI/CD Cyber Security

Continuous Intelligence (CI) integrates historical and real-time analytics to automatically monitor and update various types of systems, including supply chains, telecommunications networks and e-commerce sites. CI encompasses data ingestion, transformation and analytics, as well as operational “triggers” that recommend or initiate specific real-time actions.

CI casts a wider net than traditional analytics because it includes contextual data, for example related to market behavior, weather patterns or social media trends, that help enterprises operate the core systems more intelligently.

In this episode, our VP of Research Kevin Petrie interviews Simon Crosby, CTO at Swim.ai, a continuous intelligence software vendor that focuses on edge-based learning for fast-data. He co-founded security vendor Bromium in 2010, later sold to HP Inc in 2019.

Could you use an Analytics Quarantune? w/ Tiankai Feng

2020-10-30 · Analytics on Fire Listen

podcast_episode

by Lillian Pierson , Mico Yuk (Data Storytelling Academy) , Tiankai Feng (Adidas)

Today's episode features Tiankai Feng, Global Director and Voice of Consumer Analytics at Adidas. Tiankai is going to help you decide if you need an analytics Quarantune. A Quarantune is music or a song that is created for quarantine. The best part about this one is that it is all about data and analytics. Trust me - it's not only fun but it literally speaks to our daily lives in this field!! In this super fun episode, my amazing co-host Lillian Pierson and myself discuss Tiankai's unique talent for combining analytics and music and how it plays into his day-to-day roll at Adidas. He also shares his Digital Analytics Anthem while dropping a number of knowledge bombs right in the song. Tune in to find out if you could use an analytics Quarantune. Knowledge bombs galore!

  [26:03] - Tiankai on working with analysts: Analysts are great collaborators and not only service functions that have to follow your guidance, but we have to work together to make an impact.    [34:54] - Tiankai on the pandemic: You can predict and plan the most things possible, but when really big unexpected things happen like a pandemic or a huge society issue then none of those plans can actually work as you planned before.  Having data and timely insights is even more important now so you can act timely and the right way. [41:01] - Tiankai on the Perfect Job: It is always easy to complain, but it is so much harder to solve the issues.  If you are not happy with the situation then try to change it. For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/69

Enjoyed the Show? Please leave us a review on iTunes.

Microsoft Power BI Quick Start Guide - Second Edition

2020-10-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mitchell Pearson , Bradley Schacht , Erin Ostrowsky , Devin Knight

AI/ML BI Data Analytics DataViz DAX Microsoft Power BI Cyber Security business-intelligence data data-science microsoft-power-platform +1 more

"Microsoft Power BI Quick Start Guide" is your essential companion to mastering data visualization and analysis using Microsoft Power BI. This book offers step-by-step guidance on exploring data sources, creating effective dashboards, and leveraging advanced features like dataflows and AI insights to derive actionable intelligence quickly and effectively. What this Book will help me do Connect and import data from various sources using Power BI tools. Transform and cleanse data using the Power BI Query Editor and other techniques. Design optimized data models with relationships and DAX calculations. Create dynamic and visually compelling reports and dashboards. Implement row-level security and manage Power BI deployments within an organization. Author(s) Devin Knight, Erin Ostrowsky, and Mitchell Pearson are seasoned Power BI experts with extensive experience in business intelligence and data analytics. They bring a hands-on approach to teaching, focusing on practical skills and real-world applications. Their joint experience ensures a thorough and clear learning experience. Who is it for? This book is tailored for aspiring business intelligence professionals who wish to harness the power of Microsoft Power BI. If you have foundational knowledge of business intelligence concepts and are eager to apply them practically, this guide is for you. It's also ideal for individuals looking to upgrade their BI skill set and adopt modern data analysis tools. Whether a beginner or looking to enhance your current skills, you'll find tremendous value here.

The Big R-Book

2020-10-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Philippe J. S. De Brouwer

AI/ML Big Data Data Science DataViz data data-science data-science-tools r

Introduces professionals and scientists to statistics and machine learning using the programming language R Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science. The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling. Part 5 teaches readers about exploring data. In Part 6 we learn to build models, Part 7 introduces the reader to the reality in companies, Part 8 covers reports and interactive applications and finally Part 9 introduces the reader to big data and performance computing. It also includes some helpful appendices. Provides a practical guide for non-experts with a focus on business users Contains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reporting Uses a practical tone and integrates multiple topics in a coherent framework Demystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in R Shows readers how to visualize results in static and interactive reports Supplementary materials includes PDF slides based on the book’s content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion Site The Big R-Book is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.

Cloud Native Data Security As Code With Cyral

2020-10-26 · Data Engineering Podcast Listen

podcast_episode

by Manav Mital (Cyral) , Tobias Macey

Big Data Cloud Computing Data Analytics Data Engineering Data Governance Data Management Datadog Kubernetes SaaS Cyber Security Data Streaming

Summary One of the most challenging aspects of building a data platform has nothing to do with pipelines and transformations. If you are putting your workflows into production, then you need to consider how you are going to implement data security, including access controls and auditing. Different databases and storage systems all have their own method of restricting access, and they are not all compatible with each other. In order to simplify the process of securing your data in the Cloud Manav Mital created Cyral to provide a way of enforcing security as code. In this episode he explains how the system is architected, how it can help you enforce compliance, and what is involved in getting it integrated with your existing systems. This was a good conversation about an aspect of data management that is too often left as an afterthought.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today!

talk-data.com

Activity Trend

Top Events

Top Speakers

Face Mask Sentiment Analysis

BI e Digital Analytics - Data Hackers Podcast 33

What Is a Data Lake?

SUSE and IBM Power Systems for SAP HANA

Keeping A Bigeye On The Data Quality Market

Leading with AI and Analytics: Build Your Data Science IQ to Drive Business Value

Al and Nancy discuss Data and AI and its impact on sports

Self Service Data Management From Ingest To Insights With Isima

Python for Algorithmic Trading

Understanding Your Pandora Data With Dan Wissinger and Jay Troop

Building A Cost Effective Data Catalog With Tree Schema

Empowered by Data

Big Data Management

IoT-Based Data Analytics for the Healthcare Industry

Add Version Control To Your Data Lake With LakeFS

Simon Crosby: Continuous Intelligence with Machine Learning, Digital Twin and Knowledge Graphs

Could you use an Analytics Quarantune? w/ Tiankai Feng

Microsoft Power BI Quick Start Guide - Second Edition

The Big R-Book

Cloud Native Data Security As Code With Cyral