talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

Practice, practice, practice! Storytelling takes practice, especially Storytelling With Data (SWD). What are the fundamental principles for communicating effectively with data and using storytelling concepts to communicate that data?

Today's guest is Cole Knaflic, author of Storytelling With Data and Storytelling With Data: Let's Practice. She is super practical and realistic when teaching people who are struggling to tell stories. Are you ready to join Cole's SWD community? In this episode, you'll learn: [09:30] Storytelling With Data and Let's Practice: Read Cole's books, then do the work. [09:58] Why write the second book? Provide small and simple ways to practice and integrate SWD (Storytelling with Data) to make big changes and major impact over time. [11:35] Key Quote: Having kids and a growing business simultaneously, it's been hard, but it's actually been fantastic. - Cole Knaflic For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/42 Sponsor This exciting season of AOF is sponsored by our BI Data Storytelling Mastery Accelerator 3-Day Live workshop. Our second one is coming up on Jan 28-30 and registration is open! Join us and consider upgrading to be a VIP (we have tons of bonuses planned). Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of three days, you'll leave with the tools, techniques, and resources you need to engage your users. Register today!   Enjoyed the Show?  Please leave us a review on iTunes.

Summary Building a reliable data platform is a neverending task. Even if you have a process that works for you and your business there can be unexpected events that require a change in your platform architecture. In this episode the head of data for Mayvenn shares their experience migrating an existing set of streaming workflows onto the Ascend platform after their previous vendor was acquired and changed their offering. This is an interesting discussion about the ongoing maintenance and decision making required to keep your business data up to date and accurate.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Sheel Choksi and Sean Knapp about Mayvenn’s experience migrating their dataflows onto the Ascend platform

Interview

Introduction How did you get involved in the area of data management? Can you start off by describing what Mayvenn is and give a sense of how you are using data? What are the sources of data that you are working with? What are the biggest challenges you are facing in collecting, processing, and analyzing your data? Before adopting Ascend, what did your overall platform for data management look like? What were the pain points that you were facing which led you to seek a new solution?

What were the selection criteria that you set forth for addressing your needs at the time? What were the aspects of Ascend which were most appealing?

What are some of the edge cases that you have dealt with in the Ascend platform? Now that you have been using Ascend for a while, what components of your previous architecture have you been able to retire? Can you talk through the migration process of incorporating Ascend into your platform and any validation that you used to ensure that your data operations remained accurate and consistent? How has the migration to Ascend impacted your overall capacity for processing data or integrating new sources into your analytics? What are your future plans for how to use data across your organization?

Contact Info

Sheel

LinkedIn sheelc on GitHub

Sean

LinkedIn @seanknapp on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is b

podcast_episode
by Mico Yuk (Data Storytelling Academy)
BI

Welcome to 2020! It's time to kick the new year and new decade off by recapping awesome things that happened in 2019 for Analytics on Fire (AoF). Plus, what 10 things can amazing AoF listeners expect and look forward to in 2020? Let's begin the countdown.

In this episode, you'll learn: [03:52] 10 things to expect from AoF in 2020 [06:02] #3: More solo teaching episodes from yours truly [07:34] #4: Introducing the #askmico hashtag [08:23] #5: Introducing a new co-host For full show notes, his book give away, and the links mentioned visit: https://bibrainz.com/podcast/41 Sponsor This exciting season of AoF is sponsored by our BI Data Storytelling Mastery Accelerator 3-Day Live workshop, on January 28-30, 2020 in Atlanta, GA. Join us for your chance to be in the hot-seat and get your use-case critiqued live. At the end of three days, you'll leave with a clear BI delivery action plan. Register today!

Interpretability Machine learning has shown a rapid expansion into every sector and industry. With increasing reliance on models and increasing stakes for the decisions of models, questions of how models actually work are becoming increasingly important to ask. Welcome to Data Skeptic Interpretability. In this episode, Kyle interviews Christoph Molnar about his book Interpretable Machine Learning. Thanks to our sponsor, the Gartner Data & Analytics Summit going on in Grapevine, TX on March 23 – 26, 2020. Use discount code: dataskeptic. Music Our new theme song is #5 by Big D and the Kids Table. Incidental music by Tanuki Suit Riot.

Data Analysis with Microsoft Power BI

Publisher's Note: Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product. Explore, create, and manage highly interactive data visualizations using Microsoft Power BI Extract meaningful business insights from your disparate enterprise data using the detailed information contained in this practical guide. Written by a recognized BI expert and bestselling author, Data Analysis with Microsoft Power BI teaches you the skills you need to interact with, author, and maintain robust visualizations and custom data models. Hands-on exercises based on real-life business scenarios clearly demonstrate each technique. Publishing your results to the Power BI Service (PowerBI.com) and Power BI Report Server are also fully covered. Inside, you will discover how to: •Understand Business Intelligence and self-service analytics •Explore the tools and features of Microsoft Power BI •Create and format effective data visualizations •Incorporate advanced interactivity and custom graphics •Build and populate accurate data models •Transform data using the Power BI Query Editor •Work with measures, calculated columns, and tabular models •Write powerful DAX language scripts •Share content on the PowerBI Service (PowerBI.com) •Store your visualizations on the Power BI Report Server

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Josh Crowhurst , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

It's the end of the year, and we know it, and we feel fiiiiine. Or, maybe we have a little anxiety. But, for the fifth year in a row, we're wrapping up the year with a reflective episode: reflecting on changes in the analytics industry, the evolution of the podcast, and the interpersonal dynamics between Tim and Michael. From the state of diversity in the industry (and on the show), to the trends in analytics staffing and careers, to the growing impact of ethical and privacy considerations on the role of the analyst, it's an episode chock full of agreement, acrimony, and angst. And, it's an episode with a special "guest;" it's the first time that producer Josh Crowhurst is on mic doing something besides simply keeping our advertisers happy! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

The Rise of Operational Analytics

Fast access to data has become a critical game changer. Today, a new breed of company understands that the faster they can build, access, and share well-defined datasets, the more competitive they’ll be in our data-driven world. In this practical report, Scott Haines from Twilio introduces you to operational analytics, a new approach for making sense of all the data flooding into business systems. Data architects and data scientists will see how Apache Kafka and other tools and processes laid the groundwork for fast analytics on a mix of historical and near-real-time data. You’ll learn how operational analytics feeds minute-by-minute customer interactions, and how NewSQL databases have entered the scene to drive machine learning algorithms, AI programs, and ongoing decision-making within an organization. Understand the key advantages that data-driven companies have over traditional businesses Explore the rise of operational analytics—and how this method relates to current tech trends Examine the impact of can’t wait business decisions and won’t wait customer experiences Discover how NewSQL databases support cloud native architecture and set the stage for operational databases Learn how to choose the right database to support operational analytics in your organization

Tableau Desktop Certified Associate: Exam Guide

Tableau Desktop Certified Associate: Exam Guide is your companion for mastering Tableau and preparing for the certification exam with confidence. Through this book, you will gain a comprehensive understanding of Tableau Desktop's features and learn to implement them in practical scenarios to solve analytics challenges. What this Book will help me do Understand and apply Tableau best practices for analyzing and visualizing data effectively. Visualize geographic data using vector maps and gain insights into spatial distributions. Leverage advanced analytics techniques such as forecasting to predict key metrics. Build effective dashboards that convey information clearly and efficiently. Gain confidence in tackling Tableau Desktop Certified Associate exam questions with expert tips and mock exams. Author(s) The authors, Dmitry Anoshin, JC Gillet, Peri Biyani, and others, are experienced professionals in data analytics and business intelligence. With significant expertise in teaching and applying Tableau, they bring a wealth of knowledge to this guide, offering clear instructions and practical insights. Their dedication to empowering learners fosters a supportive and assured journey through this book. Who is it for? This book is ideal for business analysts, BI professionals, and data analysts aiming to become certified Tableau Desktop Associates. If you have a foundational understanding of Tableau Desktop and are looking to deepen your expertise while preparing for certification, this book is tailored to help you achieve that goal.

Summary Transactional databases used in applications are optimized for fast reads and writes with relatively simple queries on a small number of records. Data warehouses are optimized for batched writes and complex analytical queries. Between those use cases there are varying levels of support for fast reads on quickly changing data. To address that need more completely the team at Materialize has created an engine that allows for building queryable views of your data as it is continually updated from the stream of changes being generated by your applications. In this episode Frank McSherry, chief scientist of Materialize, explains why it was created, what use cases it enables, and how it works to provide fast queries on continually updated data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Frank McSherry about Materialize, an engine for maintaining materialized views on incrementally updated data from change data captures

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Materialize is and the problems that you are aiming to solve with it?

What was your motivation for creating it?

What use cases does Materialize enable?

What are some of the existing tools or systems that you have seen employed to address those needs which can be replaced by Materialize? How does it fit into the broader ecosystem of data tools and platforms?

What are some of the use cases that Materialize is uniquely able to support? How is Materialize architected and how has the design evolved since you first began working on it? Materialize is based on your timely-dataflow project, which itself is based on the work you did on Naiad. What was your reasoning for using Rust as the implementation target and what benefits has it provided?

What are some of the components or primitives that were missing in the Rust ecosystem as compared to what is available in Java or C/C++, which have been the dominant languages for distributed data systems?

In the list of features, you highlight full support for ANSI SQL 92. What were some of the edge cases that you faced in complying with that standard given the distributed execution context for Materialize?

A majority of SQL oriented platforms define custom extensions or built-in functions that are specific to their problem domain. What are some of the existing or

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

Explore the modern market of data analytics platforms and the benefits of using Snowflake computing, the data warehouse built for the cloud. With the rise of cloud technologies, organizations prefer to deploy their analytics using cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform. Cloud vendors are offering modern data platforms for building cloud analytics solutions to collect data and consolidate into single storage solutions that provide insights for business users. The core of any analytics framework is the data warehouse, and previously customers did not have many choices of platform to use. Snowflake was built specifically for the cloud and it is a true game changer for the analytics market. This book will help onboard you to Snowflake, present best practices to deploy, and use the Snowflake data warehouse. In addition, it covers modern analytics architecture and use cases. It provides use cases of integration with leading analytics software such as Matillion ETL, Tableau, and Databricks. Finally, it covers migration scenarios for on-premise legacy data warehouses. What You Will Learn Know the key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Who This Book Is For Those working with data warehouse and business intelligence (BI) technologies, and existing and potential Snowflake users

IBM Power System L922 Technical Overview and Introduction

This IBM® Redpaper™ publication is a comprehensive guide covering the IBM Power System L922 (9008-22L) server, which was designed for data-intensive workloads such as databases and analytics in the Linux operating system. The objective of this paper is to introduce the major innovative Power L922 offering and its relevant functions: The new IBM POWER9™ processor, available at frequencies of 2.7 - 3.8 GHz, 2.9 - 3.8 GHz, and 3.4 - 3.9 GHz. Significantly strengthened cores and larger caches. Two integrated memory controllers that allow double the memory footprint of IBM POWER8® processor-based servers. An integrated I/O subsystem and hot-pluggable Peripheral Component Interconnect Express (PCIe) Gen4 and Gen3 I/O slots. I/O drawer expansion options offer greater flexibility. Support for Coherent Accelerator Processor Interface (CAPI) 2.0. New feature IBM EnergyScale™ technology provides new variable processor frequency modes that provide a significant performance boost beyond the static nominal frequency. This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products. The intended audience includes the following roles: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the current set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power L922 system. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.  Abstract This week on Making Data Simple, we have a joint finale for the series Stories from the Field. Hosts Al Martin and Wennie Allen have a discussion with Gordon Johnson, Global Head of Optimization for DHL. We get an insider's perspective on data within the shipping and logistics world, helping optimize shipping methods to get medical supplies where they are needed most.  Connect with Gordon LinkedIn Connect with Wennie LinkedIn Big Data Hub Show Notes 02:20 - Learn more here about how big data analytics is making an impact at DHL. 09:43 - Check out this article on how AI changes the Logistics Industry. 17:43 - Find out more about how machine learning is changing supply chain management here. 20:33 - Discover what incubators are all about here. Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn.  Producer Mark Simmonds - LinkedIn.  Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools. At WeWork they needed a system that would provide visibility into their Airflow pipelines and the outputs produced. In this episode Julien Le Dem and Willy Lulciuc explain how they built Marquez to serve that need, how it is architected, and how it compares to other options that you might be considering. Even if you already have a metadata repository this is worth a listen to learn more about the value that visibility of your data can bring to your organization.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You work hard to make sure that your data is clean, reliable, and reproducible throughout the ingestion pipeline, but what happens when it gets to the data warehouse? Dataform picks up where your ETL jobs leave off, turning raw data into reliable analytics. Their web based transformation tool with built in collaboration features lets your analysts own the full lifecycle of data in your warehouse. Featuring built in version control integration, real-time error checking for their SQL code, data quality tests, scheduling, and a data catalog with annotation capabilities it’s everything you need to keep your data warehouse in order. Sign up for a free trial today at dataengineeringpodcast.com/dataform and email [email protected] with the subject "Data Engineering Podcast" to get a hands-on demo from one of their data experts. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference, the Strata Data conference, and PyCon US. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Willy Lulciuc and Julien Le Dem about Marquez, an open source platform to collect, aggregate, and visualize a data ecosystem’s metadata

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Marquez is?

What was missing in existing metadata management platforms that necessitated the creation of Marquez?

How do the capabilities of Marquez compare with tools and services that bill themselves as data catalogs?

How does it compare to the Amundsen platform that Lyft recently released?

What are some of the tools or platforms that are currently integrated with Marquez and what additional integrations would you like to see? What are some of the capabilities that are unique to Marquez and how are you using them at WeWork? What are the primary resource types that you support in Marquez?

What are some of the lowest common denominator attributes that are necessary and useful to track in a metadata repository?

Can you explain how Marquez is architected and how the design has evolved since you first began working on it?

Many metadata management systems are simply a service layer on top of a separate data storage engine. What are the benefits of using PostgreSQL as the system of record for Marquez?

What are some of the complexities that arise from relying on a relational engine as opposed to a document store or graph database?

How is the metadata itself stored and managed in Marquez?

How much up-front data modeling is necessary and what types of schema representations are supported?

Can you talk through the overall workflow of someone using Marquez in their environment?

What is involved in registering and updating datasets? How do you define and track the health of a given dataset? What are some of the interesting questions that can be answered from the information stored in Marquez?

What were your assumptions going into this project and how have they been challenged or updated as you began using it for production use cases? For someone who is interested in using Marquez what is involved in deploying and maintaining an installation of it? What have you found to be the most challenging or unanticipated aspects of building and maintaining a metadata repository and data discovery platform? When is Marquez the wrong choice for a metadata repository? What do you have planned for the future of Marquez?

Contact Info

Julien Le Dem

@J_ on Twitter Email julienledem on GitHub

Willy

LinkedIn @wslulciuc on Twitter wslulciuc on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Marquez

DataEngConf Presentation

WeWork Canary Yahoo Dremio Hadoop Pig Parquet

Podcast Episode

Airflow Apache Atlas Amundsen

Podcast Episode

Uber DataBook LinkedIn DataHub Iceberg Table Format

Podcast Episode

Delta Lake

Podcast Episode

Great Expectations data pipeline unit testing framework

Podcast.init Episode

Redshift SnowflakeDB

Podcast Episode

Apache Kafka Schema Registry

Podcast Episode

Open Tracing Jaeger Zipkin DropWizard Java framework Marquez UI Cayley Graph Database Kubernetes Marquez Helm Chart Marquez Docker Container Dagster

Podcast Episode

Luigi DBT

Podcast Episode

Thrift Protocol Buffers

The intro and outro music is from a href="http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug?utm_source=rss&utm_medium=rss"…

Data leaders who launch self-service analytics programs without knowing their business users risk unleashing chaos. Data leaders need to canvas the organization and understand who produces what information for whom and where.

Originally published at https://www.eckerson.com/articles/succeeding-with-self-service-analytics-know-thy-customer

Big Data Analytics Methods

Big Data Analytics Methods unveils secrets to advanced analytics techniques ranging from machine learning, random forest classifiers, predictive modeling, cluster analysis, natural language processing (NLP), Kalman filtering and ensembles of models for optimal accuracy of analysis and prediction. More than 100 analytics techniques and methods provide big data professionals, business intelligence professionals and citizen data scientists insight on how to overcome challenges and avoid common pitfalls and traps in data analytics. The book offers solutions and tips on handling missing data, noisy and dirty data, error reduction and boosting signal to reduce noise. It discusses data visualization, prediction, optimization, artificial intelligence, regression analysis, the Cox hazard model and many analytics using case examples with applications in the healthcare, transportation, retail, telecommunication, consulting, manufacturing, energy and financial services industries. This book's state of the art treatment of advanced data analytics methods and important best practices will help readers succeed in data analytics.

Prepare Your Data for Tableau: A Practical Guide to the Tableau Data Prep Tool

Focus on the most important and most often overlooked factor in a successful Tableau project—data. Without a reliable data source, you will not achieve the results you hope for in Tableau. This book does more than teach the mechanics of data preparation. It teaches you: how to look at data in a new way, to recognize the most common issues that hinder analytics, and how to mitigate those factors one by one. Tableau can change the course of business, but the old adage of "garbage in, garbage out" is the hard truth that hides behind every Tableau sales pitch. That amazing sales demo does not work as well with bad data. The unfortunate reality is that almost all data starts out in a less-than-perfect state. Data prep is hard. Traditionally, we were forced into the world of the database where complex ETL (Extract, Transform, Load) operations created by the data team did all the heavy lifting for us. Fortunately, we have moved past those days. With the introduction of the Tableau Data Prep tool you can now handle most of the common Data Prep and cleanup tasks on your own, at your desk, and without the help of the data team. This essential book will guide you through: The layout and important parts of the Tableau Data Prep tool Connecting to data Data quality and consistency The shape of the data. Is the data oriented in columns or rows? How to decide? Why does it matter? What is the level of detail in the source data? Why is that important? Combining source data to bring in more fields and rows Saving the data flow and the results of our data prep work Common cleanup and setup tasks in Tableau Desktop What You Will Learn Recognize data sources that are good candidates for analytics in Tableau Connect tolocal, server, and cloud-based data sources Profile data to better understand its content and structure Rename fields, adjust data types, group data points, and aggregate numeric data Pivot data Join data from local, server, and cloud-based sources for unified analytics Review the steps and results of each phase of the Data Prep process Output new data sources that can be reviewed in Tableau or any other analytics tool Who This Book Is For Tableau Desktop users who want to: connect to data, profile the data to identify common issues, clean up those issues, join to additional data sources, and save the newly cleaned, joined data so that it can be used more effectively in Tableau

podcast_episode
by Mico Yuk (Data Storytelling Academy) , Mustafa Mustafa (Ferrara Candy Company)

BI tools change by the minute, so have you ever considered outsourcing your data visualization needs in the future? Maybe you should, especially if you don't have proper in-house skill sets. Don't risk your reputation because users can't unsee a bad data visualization.

Today's guest is long-term BI Brainz customer Mustafa Mustafa, senior director of IT at Ferrara Candy Company. Mustafa transformed Ferrara Candy into a forward-thinking and innovative company. He discusses the pros and cons of outsourcing data visualization by choosing the right partners. In this episode, you'll learn: [04:13] Key Quote: Users cannot unsee a bad data visualization. - Mico Yuk [11:20] Mustafa's background in learning Mico's BI Framework and dashboard strategies. [20:45] Should data visualization be outsourced? Consider customer cases and challenges, such as communication, common sense, and strategy to share information. For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/40 Sponsor This exciting season of AOF is sponsored by our BI Data Storytelling Mastery Accelerator 3-Day Live workshop. Our second one is coming up on Jan 28-30 and registration is open! Join us and consider upgrading to be a VIP (we have tons of bonuses planned). Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of three days, you'll leave with the tools, techniques, and resources you need to engage your users. Register today!   Enjoyed the Show?  Please leave us a review on iTunes.

The rise of machine learning has placed a premium on finding new sources of data to fuel predictive models. But acquiring external data is often expensive and many data sets are rife with errors and difficult to combine with internal data. But that’s going to change in 2020.

To help us understand the scale, scope, and dimensions of emerging data marketplaces is Justin Langseth, one of the visionaries in our space. Justin is a VP at Snowflake responsible for the Snowflake Data Exchange.  Prior to Snowflake, Justin was the technical founder and CEO/CTO of 5 data technology startups: Claraview (sold to Teradata), Zoomdata (sold to Logi Analytics), Clarabridge, Strategy.com, and Augaroo. He has 25 years of experience in business intelligence, natural language processing, big data, and AI.

Practical DataOps: Delivering Agile Data Science at Scale

Gain a practical introduction to DataOps, a new discipline for delivering data science at scale inspired by practices at companies such as Facebook, Uber, LinkedIn, Twitter, and eBay. Organizations need more than the latest AI algorithms, hottest tools, and best people to turn data into insight-driven action and useful analytical data products. Processes and thinking employed to manage and use data in the 20th century are a bottleneck for working effectively with the variety of data and advanced analytical use cases that organizations have today. This book provides the approach and methods to ensure continuous rapid use of data to create analytical data products and steer decision making. Practical DataOps shows you how to optimize the data supply chain from diverse raw data sources to the final data product, whether the goal is a machine learning model or other data-orientated output. The book provides an approach to eliminate wasted effort and improve collaboration between data producers, data consumers, and the rest of the organization through the adoption of lean thinking and agile software development principles. This book helps you to improve the speed and accuracy of analytical application development through data management and DevOps practices that securely expand data access, and rapidly increase the number of reproducible data products through automation, testing, and integration. The book also shows how to collect feedback and monitor performance to manage and continuously improve your processes and output. What You Will Learn Develop a data strategy for your organization to help it reach its long-term goals Recognize and eliminate barriers to delivering data to users at scale Work on the right things for the right stakeholders through agile collaboration Create trust in data via rigorous testing and effective data management Build a culture of learning and continuous improvement through monitoring deployments and measuring outcomes Create cross-functional self-organizing teams focused on goals not reporting lines Build robust, trustworthy, data pipelines in support of AI, machine learning, and other analytical data products Who This Book Is For Data science and advanced analytics experts, CIOs, CDOs (chief data officers), chief analytics officers, business analysts, business team leaders, and IT professionals (data engineers, developers, architects, and DBAs) supporting data teams who want to dramatically increase the value their organization derives from data. The book is ideal for data professionals who want to overcome challenges of long delivery time, poor data quality, high maintenance costs, and scaling difficulties in getting data science output and machine learning into customer-facing production.