Analytics

Insightful Data Visualization with SAS Viya

2020-10-26 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Falko Schulz , Travis Murphy

AI/ML Big Data DataViz SAS analytics-platforms data data-science

Elevate your storytelling with SAS Visual Analytics Data visualization is the gateway to artificial intelligence (AI) and big data. Insightful Data Visualization with SAS Viya shows how the latest SAS Viya tools can be used to create data visualizations in an easier, smarter, and more engaging way than ever before. SAS Visual Analytics combined with human creativity can produce endless possibilities. In this book, you will learn tips and techniques for getting the most from your SAS Visual Analytics investment. From beginners to advanced SAS users, this book has something for everyone. Use AI wizards to create data visualization automatically, learn to use advanced analytics in your dashboards to surface smarter insights, and learn to extend SAS Visual Analytics with advanced integrations and options. Topics covered in this book include: SAS Visual Analytics Data visualization with SAS Reports and dashboards SAS code examples Self-service analytics SAS data access Extending SAS beyond drag and drop

Data Engineering with Python

2020-10-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Paul Crickard

Data Engineering Python data data-engineering

Discover the inner workings of data pipelines with 'Data Engineering with Python', a practical guide to mastering the art of data engineering. Through hands-on examples, you'll explore the process of designing data models, implementing data pipelines, and automating data flows, all within the context of Python. What this Book will help me do Understand the fundamentals of designing data architectures and capturing data requirements. Extract, clean, and transform data from various sources, refining it for precise applications. Implement end-to-end data pipelines, including staging, validation, and production deployment. Leverage Python to connect with databases, perform data manipulations, and build analytics workflows. Monitor and log data pipelines to ensure smooth, real-time operations and high quality. Author(s) Paul Crickard is a seasoned expert in data engineering and analytics, bringing years of practical experience to this technical guide. His unique ability to make complex technical concepts accessible makes this book invaluable for learners and professionals alike. A lifelong technologist, Paul focuses on actionable skills and building confidence to work with data pipelines and models. Who is it for? This book is ideal for aspiring data engineers, data analysts aiming to elevate their technical skillsets, or IT professionals transitioning into data-driven roles. Whether you're just stepping into the field or enhance your Python-based data capabilities, this book is tailored to provide solid grounding and practical expertise. Beginners in data engineering will find it accessible and easy to get started, while those refreshing their knowledge will benefit from its focused projects.

Hands-On SQL Server 2019 Analysis Services

2020-10-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Steven Hughes

BI Data Analytics DAX Power BI Cyber Security SQL SSAS data data-engineering microsoft-sql-server relational-databases

"Hands-On SQL Server 2019 Analysis Services" is a comprehensive guide to mastering data analysis using SQL Server Analysis Services (SSAS). This book provides you with step-by-step directions on creating and deploying tabular and multi-dimensional models, as well as using tools like MDX and DAX to query and analyze data. By the end, you'll be confident in designing effective data models for business analytics. What this Book will help me do Understand how to create and optimize both tabular and multi-dimensional models with SQL Server Analysis Services. Learn to use MDX and DAX to query and manipulate your data for enhanced insights. Integrate SSAS models with visualization tools like Excel and Power BI for effective decision-making. Implement robust security measures to safeguard data within your SSAS deployments. Master scaling and optimizing best practices to ensure high-performance analytical models. Author(s) Steven Hughes is a data analytics expert with extensive experience in business intelligence and SQL Server technologies. With years of practical experience in using SSAS and teaching data professionals, Steven has a knack for breaking down complex concepts into actionable knowledge. His approach to writing involves combining clear explanations with real-world examples. Who is it for? This book is intended for BI professionals, data analysts, and database developers who want to gain hands-on expertise with SQL Server 2019 Analysis Services. Ideal readers should have familiarity with database querying and a basic understanding of business intelligence tools like Power BI and Excel. It's perfect for those aiming to refine their skills in modeling and deploying robust analytics solutions.

IBM Db2 Analytics Accelerator V7 High Availability and Disaster Recovery

2020-10-21 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Frank Neumann , Ute Baumbach

IBM data data-engineering ibm-db2 relational-databases

IBM® Db2® Analytics Accelerator is a workload optimized appliance add-on to IBM DB2® for IBM z/OS® that enables the integration of analytic insights into operational processes to drive business critical analytics and exceptional business value. Together, the Db2 Analytics Accelerator and DB2 for z/OS form an integrated hybrid environment that can run transaction processing, complex analytical, and reporting workloads concurrently and efficiently. With IBM DB2 Analytics Accelerator for z/OS V7, the following flexible deployment options are introduced: Accelerator on IBM Integrated Analytics System (IIAS): Deployment on pre-configured hardware and software Accelerator on IBM Z®: Deployment within an IBM Secure Service Container LPAR For using the accelerator for business-critical environments, the need arose to integrate the accelerator into High Availability (HA) architectures and Disaster Recovery (DR) processes. This IBM Redpaper™ publication focuses on different integration aspects of both deployment options of the IBM Db2 Analytics Accelerator into HA and DR environments. It also shares best practices to provide wanted Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). HA systems often are a requirement in business-critical environments and can be implemented by redundant, independent components. A failure of one of these components is detected automatically and their tasks are taken over by another component. Depending on business requirements, a system can be implemented in a way that users do not notice outages (continuous availability), or in a major disaster, users notice an outage and systems resume services after a defined period, potentially with loss of data from previous work. IBM Z was strong for decades regarding HA and DR. By design, storage and operating systems are implemented in a way to support enhanced availability requirements. IBM Parallel Sysplex® and IBM Globally Dispersed Parallel Sysplex (IBM GDPS®) offer a unique architecture to support various degrees of automated failover and availability concepts. This IBM Redpaper publication shows how IBM Db2 Analytics Accelerator V7 can easily integrate into or complement existing IBM Z topologies for HA and DR. If you are using IBM Db2 Analytics Accelerator V5.1 or lower, see IBM Db2 Analytics Accelerator: High Availability and Disaster Recovery, REDP-5104.

Better Data Quality Through Observability With Monte Carlo

2020-10-19 · Data Engineering Podcast Listen

podcast_episode

by Barr Moses (Monte Carlo) , Tobias Macey , Lior Gavish (Monte Carlo)

AI/ML Big Data Cloud Computing Data Analytics Data Engineering Data Governance Data Management Data Quality Datadog Kubernetes Monte Carlo SaaS +2 more

Summary In order for analytics and machine learning projects to be useful, they require a high degree of data quality. To ensure that your pipelines are healthy you need a way to make them observable. In this episode Barr Moses and Lior Gavish, co-founders of Monte Carlo, share the leading causes of what they refer to as data downtime and how it manifests. They also discuss methods for gaining visibility into the flow of data through your infrastructure, how to diagnose and prevent potential problems, and what they are building at Monte Carlo to help you maintain your data’s uptime.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host is Tobias Macey and today I’m interviewing Barr Moses and Lior Gavish about observability for your data pipelines and how they are addressing it at Monte Carlo.

Interview

Introduction How did you get involved in the area of data management? H

Making Data Smarter with IBM Spectrum Discover: Practical AI Solutions

2020-10-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Maxime Deloche , Ivaylo B. Bozhinov , Kiran Ghag , Mathias Defiebre , Isom Crawford Jr. , Xin Liu , Gauthier Siri , Joseph Dain , Gucer Vasfi , Christopher Vollmar , Abeer Selim

AI/ML Cloud Computing IBM data data-engineering

More than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, and so on. Many organizations face significant challenges to manage this deluge of unstructured data, such as the following examples: Pinpointing and activating relevant data for large-scale analytics Lacking the fine-grained visibility that is needed to map data to business priorities Removing redundant, obsolete, and trivial (ROT) data Identifying and classifying sensitive data IBM® Spectrum Discover is a modern metadata management software that provides data insight for petabyte-scale file and Object Storage, storage on-premises, and in the cloud. This software enables organizations to make better business decisions and gain and maintain a competitive advantage. IBM Spectrum® Discover provides a rich metadata layer that enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of unstructured data. It improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research. This IBM Redbooks® publication presents several use cases that are focused on artificial intelligence (AI) solutions with IBM Spectrum Discover. This book helps storage administrators and technical specialists plan and implement AI solutions by using IBM Spectrum Discover and several other IBM Storage products.

Security and Privacy Issues in IoT Devices and Sensor Networks

2020-10-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Narayan C. Debnath , Sudhir Kumar Sharma , Bharat Bhushan

AI/ML Cloud Computing IoT Cyber Security data data-engineering data-security-privacy data security & privacy

Security and Privacy Issues in IoT Devices and Sensor Networks investigates security breach issues in IoT and sensor networks, exploring various solutions. The book follows a two-fold approach, first focusing on the fundamentals and theory surrounding sensor networks and IoT security. It then explores practical solutions that can be implemented to develop security for these elements, providing case studies to enhance understanding. Machine learning techniques are covered, as well as other security paradigms, such as cloud security and cryptocurrency technologies. The book highlights how these techniques can be applied to identify attacks and vulnerabilities, preserve privacy, and enhance data security. This in-depth reference is ideal for industry professionals dealing with WSN and IoT systems who want to enhance the security of these systems. Additionally, researchers, material developers and technology specialists dealing with the multifarious aspects of data privacy and security enhancement will benefit from the book's comprehensive information. Provides insights into the latest research trends and theory in the field of sensor networks and IoT security Presents machine learning-based solutions for data security enhancement Discusses the challenges to implement various security techniques Informs on how analytics can be used in security and privacy

Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing

2020-10-13 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ryan Wade (Blue Granite)

AI/ML BI Data Analytics Data Science IBM Microsoft Power BI Python SQL business-intelligence data data-science +2 more

This easy-to-follow guide provides R and Python recipes to help you learn and apply the top languages in the field of data analytics to your work in Microsoft Power BI. Data analytics expert and author Ryan Wade shows you how to use R and Python to perform tasks that are extremely hard, if not impossible, to do using native Power BI tools. For example, you will learn to score Power BI data using custom data science models and powerful models from Microsoft Cognitive Services. The R and Python languages are powerful complements to Power BI. They enable advanced data transformation techniques that are difficult to perform in Power BI in its default configuration but become easier by leveraging the capabilities of R and Python. If you are a business analyst, data analyst, or a data scientist who wants to push Power BI and transform it from being just a business intelligence tool into an advanced data analytics tool, then this is the book to help you do that. What You Will Learn Create advanced data visualizations via R using the ggplot2 package Ingest data using R and Python to overcome some limitations of Power Query Apply machine learning models to your data using R and Python without the need of Power BI premium capacity Incorporate advanced AI in Power BI without the need of Power BI premium capacity via Microsoft Cognitive Services, IBM Watson Natural Language Understanding, and pre-trained models in SQL Server Machine Learning Services Perform advanced string manipulations not otherwise possible in Power BI using R and Python Who This Book Is For Power users, data analysts, and data scientists who want to go beyond Power BI’s built-in functionality to create advanced visualizations, transform data in ways not otherwise supported, and automate data ingestion from sources such as SQL Server and Excel in a more concise way

Rapid Delivery Of Business Intelligence Using Power BI

2020-10-12 · Data Engineering Podcast Listen

podcast_episode

by Rob Collie (Power Pivot Pro) , Tobias Macey

AI/ML BI Big Data Cloud Computing Data Analytics Data Engineering Data Governance Data Management Kafka Kubernetes Microsoft Power BI +3 more

Summary Business intelligence efforts are only as useful as the outcomes that they inform. Power BI aims to reduce the time and effort required to go from information to action by providing an interface that encourages rapid iteration. In this episode Rob Collie shares his enthusiasm for the Power BI platform and how it stands out from other options. He explains how he helped to build the platform during his time at Microsoft, and how he continues to support users through his work at Power Pivot Pro. Rob shares some useful insights gained through his consulting work, and why he considers Power BI to be the best option on the market today for business analytics.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Equalum’s end to end data ingestion platform is relied upon by enterprises across industries to seamlessly stream data to operational, real-time analytics and machine learning environments. Equalum combines streaming Change Data Capture, replication, complex transformations, batch processing and full data management using a no-code UI. Equalum also leverages open source data frameworks by orchestrating Apache Spark, Kafka and others under the hood. Tool consolidation and linear scalability without the legacy platform price tag. Go to dataengineeringpodcast.com/equalum today to start a free 2 week test run of their platform, and don’t forget to tell them that we sent you. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host is Tobias Macey and today I’m interviewing Rob Collie about Microsoft’s Power BI platform and his

Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering

2020-10-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pankaj Khattar , Harsh Chawla

AI/ML Azure Big Data Cosmos Data Analytics Data Engineering Data Lake Data Science Databricks DWH Hadoop IoT +7 more

Get a 360-degree view of how the journey of data analytics solutions has evolved from monolithic data stores and enterprise data warehouses to data lakes and modern data warehouses. You will This book includes comprehensive coverage of how: To architect data lake analytics solutions by choosing suitable technologies available on Microsoft Azure The advent of microservices applications covering ecommerce or modern solutions built on IoT and how real-time streaming data has completely disrupted this ecosystem These data analytics solutions have been transformed from solely understanding the trends from historical data to building predictions by infusing machine learning technologies into the solutions Data platform professionals who have been working on relational data stores, non-relational data stores, and big data technologies will find the content in this book useful. The book also can help you start your journey into the data engineer world as it provides an overview of advanced data analytics and touches on data science concepts and various artificial intelligence and machine learning technologies available on Microsoft Azure. What Will You Learn You will understand the: Concepts of data lake analytics, the modern data warehouse, and advanced data analytics Architecture patterns of the modern data warehouse and advanced data analytics solutions Phases—such as Data Ingestion, Store, Prep and Train, and Model and Serve—of data analytics solutions and technology choices available on Azure under each phase In-depth coverage of real-time and batch mode data analytics solutions architecture Various managed services available on Azure such as Synapse analytics, event hubs, Stream analytics, CosmosDB, and managed Hadoop services such as Databricks and HDInsight Who This Book Is For Data platform professionals, database architects, engineers, and solution architects

Put your Data Strategy into Action and Get Results in 90 Days w/ Vincent Lee

2020-10-07 · Analytics on Fire Listen

podcast_episode

by Mico Yuk (Data Storytelling Academy) , Vincent Lee

BI

You don't want to miss Vincent share the three steps he took after he left Lillian's class that led to his success in 90 days. He discussed how he chose the right problem to solve, and the challenges he faced while setting expectations, and most importantly how he measures success! Tune in and be sure to share with others that struggle with their data strategy. Knowledge bombs galore!

  [17:03] Vincent's takeaways from Lillian's course: "What I learned from the workshop really inspired me and was really insightful. We have a lot of data in our company. We have to ask: 'What can we do with that data? How can we monetize it better? How we can drive revenue from the data?'"  [23:06] Vincent on understanding the data you have: "Once you understand your data you can move on to implement the right models and generate the right insights."   [41:43] Vincent on what is most important: "What is most important is actually transforming your data into relevant insights. Don't jump quickly into the solution. Sit down, narrow down your problem, find the real cost, and seek out different solutions. Only then you can find the most effective solution." For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/68

Enjoyed the Show? Please leave us a review on iTunes. Free Data Storytelling Training Register before it sells out again! Our BI Data Storytelling Mastery Accelerator 3-Day Live Workshop new dates are finally available. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of the workshop, you'll leave with a clear BI delivery action plan. Register today!

#151: The Rise of the Analytics Engineer with Claire Carroll

2020-10-06 · The Analytics Power Hour Listen

podcast_episode

by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery) , Claire Carroll (Hex)

Do you long for the days when your mother could ask you, "Now, what do you actually do for your job?" and "all" you had to do was explain websites and digital analytics? The "analyst" is now a role that can be defined an infinite number of ways in its breadth and depth. Is the analyst who is starting to do data transformations to create clean views still an analyst? Or is she a data engineer? A data scientist? On this episode, we explore the idea of an "analytics engineer" with Claire Carroll from Fishtown Analytics who, while she did not coin the term, can certainly be credited with its growth as a concept. And there is a brief but intense spat about the role of "analytics translator," which Claire sat out, but observed with bemusement. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

IBM Storage Solutions for SAS Analytics using IBM Spectrum Scale and IBM Elastic Storage System 3000 Version 1 Release 1

2020-10-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sanjay Sudam

ELK IBM Linux SAS data data-engineering

This IBM® Redpaper® publication is a blueprint for configuration, testing results, and tuning guidelines for running SAS workloads on Red Hat Enterprise Linux that use IBM Spectrum® Scale and IBM Elastic Storage® System (ESS) 3000. IBM lab validation was conducted with the Red Hat Linux nodes running with the SAS simulator scripts that are connected to the IBM Spectrum Scale and IBM ESS 3000. Simultaneous workloads are simulated across multiple x-86 nodes running with Red Hat Linux to determine scalability against the IBM Spectrum Scale clustered file system and ESS 3000 array. This paper outlines the architecture, configuration details, and performance tuning to maximize SAS application performance with the IBM Spectrum Scale 5.0.4.3 and IBM ESS 3000. This document is intended to facilitate the deployment and configuration of the SAS applications that use IBM Spectrum Scale and IBM Elastic Storage System (ESS) 3000. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Scale or IBM ESS 3000 are supported and entitled and where the issues are specific to a blueprint implementation.

Self Service Real Time Data Integration Without The Headaches With Meroxa

2020-10-05 · Data Engineering Podcast Listen

podcast_episode

by Ali Hamidi (Meroxa) , DeVaris Brown (Meroxa) , Tobias Macey

Big Data Cloud Computing Data Analytics Data Engineering Data Governance Data Management Datadog Kubernetes SaaS Cyber Security Data Streaming

Summary Analytical workloads require a well engineered and well maintained data integration process to ensure that your information is reliable and up to date. Building a real-time pipeline for your data lakes and data warehouses is a non-trivial effort, requiring a substantial investment of time and energy. Meroxa is a new platform that aims to automate the heavy lifting of change data capture, monitoring, and data loading. In this episode founders DeVaris Brown and Ali Hamidi explain how their tenure at Heroku informed their approach to making data integration self service, how the platform is architected, and how they have designed their system to adapt to the continued evolution of the data ecosystem.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host is Tobias Macey and today I’m interviewing DeVaris Brown and Ali Hamidi about Meroxa, a new platform as a service for dat

Creating Good Data: A Guide to Dataset Structure and Data Representation

2020-10-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Harry J. Foxwell

Data Analytics data data-science data-science-tasks data-visualization

Create good data from the start, rather than fixing it after it is collected. By following the guidelines in this book, you will be able to conduct more effective analyses and produce timely presentations of research data. Data analysts are often presented with datasets for exploration and study that are poorly designed, leading to difficulties in interpretation and to delays in producing meaningful results. Much data analytics training focuses on how to clean and transform datasets before serious analyses can even be started. Inappropriate or confusing representations, unit of measurement choices, coding errors, missing values, outliers, etc., can be avoided by using good dataset design and by understanding how data types determine the kinds of analyses which can be performed. This book discusses the principles and best practices of dataset creation, and covers basic data types and their related appropriate statistics and visualizations. A key focus of the book is why certain data types are chosen for representing concepts and measurements, in contrast to the typical discussions of how to analyze a specific data type once it has been selected. What You Will Learn Be aware of the principles of creating and collecting data Know the basic data types and representations Select data types, anticipating analysis goals Understand dataset structures and practices for analyzing and sharing Be guided by examples and use cases (good and bad) Use cleaning tools and methods to create good data Who This Book Is For Researchers who design studies and collect data and subsequently conduct and report the results of their analyses can use the best practices in this book to produce better descriptions and interpretations of their work. In addition, data analysts who explore and explain data of other researchers will be able to create better datasets.

Learn Data Science Using SAS Studio: A Quick-Start Guide

2020-10-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Engy Fouda

Data Science Marketing Python SAS analytics-platforms data data-science

Do you want to create data analysis reports without writing a line of code? This book introduces SAS Studio, a free data science web browser-based product for educational and non-commercial purposes. The power of SAS Studio comes from its visual point-and-click user interface that generates SAS code. It is easier to learn SAS Studio than to learn R and Python to accomplish data cleaning, statistics, and visualization tasks. The book includes a case study about analyzing the data required for predicting the results of presidential elections in the state of Maine for 2016 and 2020. In addition to the presidential elections, the book provides real-life examples including analyzing stocks, oil and gold prices, crime, marketing, and healthcare. You will see data science in action and how easy it is to perform complicated tasks and visualizations in SAS Studio.You will learn, step-by-step, how to do visualizations, including maps. In most cases, you will not need a line of code as you work with the SAS Studio graphical user interface. The book includes explanations of the code that SAS Studio generates automatically. You will learn how to edit this code to perform more complicated advanced tasks. The book introduces you to multiple SAS products such as SAS Viya, SAS Analytics, and SAS Visual Statistics. What You Will Learn Become familiar with SAS Studio IDE Understand essential visualizations Know the fundamental statistical analysis required in most data science and analytics reports Clean the most common data set problems Use linear progression for data prediction Write programs in SAS Get introduced to SAS-Viya, which is more potent than SAS studio Who This Book Is For A general audience of people who are new to data science, students, and data analysts and scientists who are experiencedbut new to SAS. No programming or in-depth statistics knowledge is needed.

Product Analytics: Applied Data Science Techniques for Actionable Consumer Insights

2020-10-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joanne Rodrigues

Data Science KPI data data-science

This guide shows how to combine data science with social science to gain unprecedented insight into customer behavior, so you can change it. Joanne Rodrigues-Craig bridges the gap between predictive data science and statistical techniques that reveal why important things happen -- why customers buy more, or why they immediately leave your site -- so you can get more behaviors you want and less you don’t. Drawing on extensive enterprise experience and deep knowledge of demographics and sociology, Rodrigues-Craig shows how to create better theories and metrics, so you can accelerate the process of gaining insight, altering behavior, and earning business value. You’ll learn how to: Develop complex, testable theories for understanding individual and social behavior in web products Think like a social scientist and contextualize individual behavior in today’s social environments Build more effective metrics and KPIs for any web product or system Conduct more informative and actionable A/B tests Explore causal effects, reflecting a deeper understanding of the differences between correlation and causation Alter user behavior in a complex web product Understand how relevant human behaviors develop, and the prerequisites for changing them Choose the right statistical techniques for common tasks such as multistate and uplift modeling Use advanced statistical techniques to model multidimensional systems Do all of this in R (with sample code available in a separate code manual)

What Data folks can learn from the Caribbean? w/ Raquel Seville

2020-09-30 · Analytics on Fire Listen

podcast_episode

by Mico Yuk (Data Storytelling Academy) , Raquel Seville (BI Brainz Caribbean)

BI

Data isn't just relevant in the US, it's important all over the world. And currently, the Caribbean is experiencing something of a movement in relation to data. In today's episode, you'll hear from Raquel Seville, the CEO of BI Brainz Caribbean. Learn what's happening in the Caribbean on the data scene, how data innovation is taking center stage, and how you can be a part of it.

    [13:49]  - What's happening with data in the Caribbean right now [21:07]   - What the rest of the world can learn from the data work happening in the Caribbean [21:07]   - What the rest of the world can learn from the data work happening in the Caribbean For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/67

Enjoyed the Show? Please leave us a review on iTunes. Free Data Storytelling Training Register before it sells out again! Our BI Data Storytelling Mastery Accelerator 3-Day Live Workshop new dates are finally available. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of the workshop, you'll leave with a clear BI delivery action plan. Register today!

Metabase Up and Running

2020-09-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tim Abraham

AWS BI Data Analytics Metabase SQL business-intelligence data data-science

Metabase Up and Running is your go-to guide for mastering Metabase, the open-source business intelligence tool. You'll progress from the basics of installation and setup to connecting data sources and creating insightful visualizations and dashboards. By the end, you'll be confident in implementing Metabase in your organization for impactful decision-making. What this Book will help me do Understand how to securely deploy and configure Metabase on Amazon Web Services. Master the creation of dashboards, reports, and visual visualizations using Metabase's tools. Gain expertise in user and permissions management within Metabase. Learn to use Metabase's SQL console for advanced database interactions. Acquire skills to embed Metabase within applications and automate reports via email or Slack. Author(s) None Abraham, an experienced tool specialist, is passionate about teaching others how to leverage data tools effectively. With a background in business analytics, Abraham has guided companies of all sizes. Their approachable writing style ensures a learning journey that is both informative and engaging. Who is it for? This book is ideal for business analysts and data professionals looking to amplify their business intelligence capabilities using Metabase. Readers should have some understanding of data analytics principles. Whether you're starting in analytics or seeking advanced automation, this book offers valuable guidance to meet your goals.

Speed Up And Simplify Your Streaming Data Workloads With Red Panda

2020-09-29 · Data Engineering Podcast Listen

podcast_episode

by Alexander Gallego (Vectorized) , Tobias Macey

API BI Big Data Cloud Computing Data Analytics Data Engineering Data Governance Data Management Kafka Kubernetes Redpanda Cyber Security +1 more

Summary Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popularity, there are numerous accounts of the difficulty that operators face in keeping it reliable and performant, or trying to scale an installation. To make the benefits of the Kafka ecosystem more accessible and reduce the operational burden, Alexander Gallego and his team at Vectorized created the Red Panda engine. In this episode he explains how they engineered a drop-in replacement for Kafka, replicating the numerous APIs, that can scale more easily and deliver consistently low latencies with a much lower hardware footprint. He also shares some of the areas of innovation that they have found to help foster the next wave of streaming applications while working within the constraints of the existing Kafka interfaces. This was a fascinating conversation with an energetic and enthusiastic engineer and founder about the challenges and opportunities in the realm of streaming data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. If you’re looking for a way to optimize your data engineering pipeline – with instant query performance – look no further than Qubz. Qubz is next-generation OLAP technology built for the scale of Big Data from UST Global, a renowned digital services provider. Qubz lets users and enterprises analyze data on the cloud and on-premise, with blazing speed, while eliminating the complex engineering required to operationalize analytics at scale. With an emphasis on visual data engineering, connectors for all major BI tools and data sources, Qubz allow users to query OLAP cubes with sub-second response times on hundreds of billions of rows. To learn more, and sign up for a free demo, visit dataengineeringpodcast.com/qubz. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to s

talk-data.com

Activity Trend

Top Events

Top Speakers

Insightful Data Visualization with SAS Viya

Data Engineering with Python

Hands-On SQL Server 2019 Analysis Services

IBM Db2 Analytics Accelerator V7 High Availability and Disaster Recovery

Better Data Quality Through Observability With Monte Carlo

Making Data Smarter with IBM Spectrum Discover: Practical AI Solutions

Security and Privacy Issues in IoT Devices and Sensor Networks

Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing

Rapid Delivery Of Business Intelligence Using Power BI

Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering

Put your Data Strategy into Action and Get Results in 90 Days w/ Vincent Lee

#151: The Rise of the Analytics Engineer with Claire Carroll

IBM Storage Solutions for SAS Analytics using IBM Spectrum Scale and IBM Elastic Storage System 3000 Version 1 Release 1

Self Service Real Time Data Integration Without The Headaches With Meroxa

Creating Good Data: A Guide to Dataset Structure and Data Representation

Learn Data Science Using SAS Studio: A Quick-Start Guide

Product Analytics: Applied Data Science Techniques for Actionable Consumer Insights

What Data folks can learn from the Caribbean? w/ Raquel Seville

Metabase Up and Running

Speed Up And Simplify Your Streaming Data Workloads With Red Panda