talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

Data Accelerator for AI and Analytics

This IBM® Redpaper publication focuses on data orchestration in enterprise data pipelines. It provides details about data orchestration and how to address typical challenges that customers face when dealing with large and ever-growing amounts of data for data analytics. While the amount of data increases steadily, artificial intelligence (AI) workloads must speed up to deliver insights and business value in a timely manner. This paper provides a solution that addresses these needs: Data Accelerator for AI and Analytics (DAAA). A proof of concept (PoC) is described in detail. This paper focuses on the functions that are provided by the Data Accelerator for AI and Analytics solution, which simplifies the daily work of data scientists and system administrators. This solution helps increase the efficiency of storage systems and data processing to obtain results faster while eliminating unnecessary data copies and associated data management.

Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a platform that synchronizes information about your customers out to third party systems for use by marketing and sales teams. In this episode Tejas Manohar explains the benefits of sourcing customer data from one location for all of your organization to use, the technical challenges of synchronizing the data to external systems with varying APIs, and the workflow for enabling self-service access to your customer data by your marketing teams. This is an interesting conversation about the importance of the data warehouse and how it can be used beyond just internal analytics.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. This episode of Data Engineering Podcast is sponsored by Datadog, a unified monitoring and analytics platform built for developers, IT operations teams, and businesses in the cloud age. Datadog provides customizable dashboards, log management, and machine-learning-based alerts in one fully-integrated platform so you can seamlessly navigate, pinpoint, and resolve performance issues in context. Monitor all your databases, cloud services, containers, and serverless functions in one place with Datadog’s 400+ vendor-backed integrations. If an outage occurs, Datadog provides seamless navigation between your logs, infrastructure metrics, and application traces in just a few clicks to minimize downtime. Try it yourself today by starting a free 14-day trial and receive a Datadog t-shirt after installing the agent. Go to dataengineeringpodcast.com/datadog today to see how you can enhance visibility into your stack with Datadog. Your host is Tobias Macey and today I’m interviewing Tejas Manohar about Hightouch, a data platform that helps you sync your customer data from your data warehouse to your CRM, marketing, and support tools

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what you are building at Hightouch and your motivation for creating it? What are the main points of friction for teams who are trying to make use of customer data? Where is Hightouch positioned in the ecosystem of customer data tools such as Segment, Mixpanel

Google bought Urchin in 2005 and, virtually overnight, made digital analytics available to all companies, no matter how large or how small. Optimizely was founded in January 2010 and had a similar (but lesser) impact on the world of A/B testing. What can we learn from ruminating on the past, the present, and the future (server-side testing! sample ratio mismatch checking! Bayesian approaches!) of experimentation? Quite a bit, if we pull in an industry veteran and pragmatic thinker like Ton Wesseling from Online Dialogue! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Send us a text In part 2 of this two-part series, host Al Martin welcomes back Rob Thomas, GM of IBM Data and AI. The conversation ranges across the latest trends in enterprise analytics, how to tell the truth from the hype, and Rob’s own strategies for staying current, connected, and engaged in this fast-changing space.


Show Notes

00:00 - Missed part 1? Catch up with the conversation here.

00:05 - Check us out on YouTube and SoundCloud.  00:10 - Connect with Producer Steve Moore on LinkedIn and Twitter.  00:15 - Connect with Producer Liam Seston on LinkedIn and Twitter.  00:20 - Connect with Producer Rachit Sharma on LinkedIn.  00:25 - Connect with Host Al Martin on LinkedIn and Twitter.  00:52 - Connect with Rob Thomas on LinkedIn and Twitter. 01:02 - Check out Rob's article on Watson Anywhere here.  02:03 - There is now proof that quantum computing can now boost machine learning! Read more here. 05:17 - Your team needs a compass, not a map. 10:00 - Warren Buffet and Mark Cuban agree, reading is like compound interest. 14:24 - Check out Wooden on Leadership here. 14:35 - Check out Mindset here.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner. As a result the feature store is becoming a required piece of the data platform. To fill that need Kevin Stumpf and the team at Tecton are building an enterprise feature store as a service. In this episode he explains how his experience building the Michelanagelo platform at Uber has informed the design and architecture of Tecton, how it integrates with your existing data systems, and the elements that are required for well engineered feature store.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to dataengineeringpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s dataengineeringpodcast.com/talkpython, and don’t forget to thank them for supporting the show. You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data infrastructure. The first 25 will receive a free, limited edition Monte Carlo hat! Your host is Tobias Macey and today I’m interviewing Kevin Stumpf about Tecton and the role that the feature store plays in a modern MLOps platform

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Tecton and your motivation for starting the business? For anyone who isn’t familiar with the concept, what is an example of a feature? How do you define what a feature store is? What role does a feature store play in the overall lifecycle of a machine learning p

Quickstart your analytics with Fivetran dbt packages

Have you ever looked at your data and not known where to start? In this video, you'll learn how Firefly Health leveraged Fivetran's dbt packages to quickly transform their raw Salesforce data to analytics-ready models. A process that typically would take weeks was cut down to just minutes with the power of dbt packages.

Speakers: Dom Colyer, Senior Sales Engineer, Fivetran Jacob Mulligan, Head of Analytics, Firefly Health

Predictive Analytics: Data Mining, Machine Learning and Data Science for Practitioners, 2nd Edition

Use Predictive Analytics to Uncover Hidden Patterns and Correlations and Improve Decision-Making Using predictive analytics techniques, decision-makers can uncover hidden patterns and correlations in their data and leverage these insights to improve many key business decisions. In this thoroughly updated guide, Dr. Dursun Delen illuminates state-of-the-art best practices for predictive analytics for both business professionals and students. Delen provides a holistic approach covering key data mining processes and methods, relevant data management techniques, tools and metrics, advanced text and web mining, big data integration, and much more. Balancing theory and practice, Delen presents intuitive conceptual illustrations, realistic example problems, and real-world case studiesincluding lessons from failed projects. It is all designed to help you gain a practical understanding you can apply for profit. * Leverage knowledge extracted via data mining to make smarter decisions * Use standardized processes and workflows to make more trustworthy predictions * Predict discrete outcomes (via classification), numeric values (via regression), and changes over time (via time-series forecasting) * Understand predictive algorithms drawn from traditional statistics and advanced machine learning * Discover cutting-edge techniques, and explore advanced applications ranging from sentiment analysis to fraud detection .

Balancing creativity and proficiency as a data team

Your data team has to produce solid data. The pipelines have to run, the logic in your transformations has to be sound, and the report has to show accurate revenue. But if that’s all you’re doing, your team is probably bored and your organization definitely isn’t getting as much value as it could out of its data.

Open-ended creative work is a huge part of the appeal of working in this field – identifying opportunities to improve processes, appeal to new customers, or build better products adds value for the organization, but it is also incredibly satisfying. One of the fundamental challenges of managing a data team is balancing the need for rigor and reliability with the team’s desire to spend most of their time creating new knowledge. In this video, Caitlin Moorman, Head of Analytics with Trove Recommerce, discusses how we can manage those sometimes conflicting priorities, and create tools and processes that make the balance easier.

It's the holiday season and, despite Tim's 27-slide deck making a case for why we should do an Airing of Grievances-themed show, we went in another direction. On this episode, we explore a delightful tale that exists at the intersection of "Giving Back to the Community" and "Growing the Analytics Talent Pool." Rob Jackson joined the gang to be peppered with questions about the what, why, and how of his digital marketing social enterprise: WYK Digital. It's an inspiring story of breaking down some of the barriers to digital-focused jobs for underserved youth. And doing so in the middle of a pandemic, no less! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Perfect complements: Using dbt with Looker for effective data governance

Learn how a rapidly growing software development firm transformed their legacy data analytics approach by embracing analytics engineering with dbt and Looker. In this video, Johnathan Brooks of 4 Mile Analytics outlines the complementary benefits of these tools and discusses design patterns and analytics engineering principles that enable strong data governance, increased agility and scalability, while decreasing maintenance overhead.

Data dream teams: TripActions
video
by Rob Winters (TripActions) , Bart Sandbergen (TripActions) , Virginia López-Gil Pérez (TripActions) , Teodora Vrabcheva (TripActions)

Join us for a fireside chat with members of the TripActions data team to get an inside look at how their team gets work done. We'll learn how their data team is structured, some projects they've recently worked on, and what's coming up for the team!

Speakers: Rob Winters, Director of Data with TripActions Bart Sandbergen, Data Analyst with TripActions Virginia López-Gil Pérez, Data Engineer with TripActions Teodora Vrabcheva, Senior Data Scientist with TripActions Simon Ouderkirk (Moderator), Senior Product Manager with Fishtown Analytics

Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At YipitData they rely on a variety of alternative data sources to inform investment decisions by hedge funds and businesses. In this episode Andrew Gross, Bobby Muldoon, and Anup Segu describe the self service data platform that they have built to allow data analysts to own the end-to-end delivery of data projects and how that has allowed them to scale their output. They share the journey that they went through to build a scalable and maintainable system for web scraping, how to make it reliable and resilient to errors, and the lessons that they learned in the process. This was a great conversation about real world experiences in building a successful data-oriented business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. Your host is Tobias Macey and today I’m interviewing Andrew Gross, Bobby Muldoon, and Anup Segu about they are building pipelines at Yipit Data

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what YipitData does? What kinds of data sources and data assets are you working with? What is the composition of your data teams and how are they structured? Given the use of your data products in the financial sector how do you handle monitoring and alerting around data qualit

Beyond the Modern Data Stack: dbt Cloud Metadata in Mode

In this video, President and Founder of Mode, Benn Stancil discusses new ways to align the optimal application boundaries in the modern data stack, providing a set of guidelines for determining how and where to draw the lines between your many tools. He also motivates an example of these boundaries by demonstrating how metadata surfaced in an analytics tool like Mode can increase overall data confidence.

The post-modern data stack

dbt is an essential part of the modern data stack. Over the past four years, the most innovative and forward-thinking data teams have implemented a best-of-breed approach to analytics. This approach has solved many problems, but it has also created new ones. In this video, Drew Banin, Chief Product Officer and co-founder of Fishtown Analytics will share his vision for the data stack of the future.

Implementing dbt at large enterprises
video
by Ryan Goltz (Chesapeake Energy) , Ben Singleton (JetBlue) , Amy Chen (Fishtown Analytics)

What does it look like to implement dbt at an organization where the number of employees is in the thousands? In this video we'll learn from the people who have answered exactly this question at organizations like JetBlue and Chesapeake Energy.

Speakers: Chris Holliday (Moderator), Senior VP, Client Management with Visual BI Amy Chen, Solutions Architect with Fishtown Analytics Ryan Goltz, Lead Data Strategist with Chesapeake Energy Ben Singleton, Director of Data Science & Analytics with JetBlue

Data dream teams: Netlify

Join us for a fireside chat with members of the Netlify data team to get an inside look at how their team gets work done. We'll learn how their data team is structured, some projects they've recently worked on, and what's coming up for the team!

Featured speakers: Emilie Schario, Senior Engineering Manager, Data and Business Intelligence with Netlify Laurie Voss, Senior Data Analyst with Netlify Francisco Lozano, Senior Analytics Engineer with Netlify Brian de la Motte, Senior Data Engineer with Netlify

How to start your analytics engineering team

At many organizations, dbt and the competency of Analytics Engineering are introduced well after the establishment of an analytics team. It's easy to agree in principal with all the benefits and value added by this new tool and analytics practice, but getting there can be a challenge. As with most tool implementations or team restructuring, there is often a long, painful transition from whatever was being done previously to the new future.

In this presentation we'll learn from Andres Recalde's experience implementing analytics engineering practices in both a greenfield situation (La Colombe) and his current successes (and failures!) of implementing analytics engineering at an already established organization (goPuff).

Orchestrating dbt with Dagster

dbt defined an entire new subspecialty of software engineering: Analytics Engineering. But it is one discipline among many: analytics engineers must collaborate with data scientists, data engineers, and data platform engineers to deliver a cohesive data platform. In this video, Nick Schrock of Elementl talks about how orchestrating dbt with Dagster allows you to place dbt in context, de-silo your operational systems, improve monitoring, and enable self-service operations.

Hiring a diverse data team
video
by Colleen Tartow (Starburst Data) , Meghan Colón (Fishtown Analytics) , Ilse Ackerman (Brooklyn Data Co.) , Alexis Johnson-Gresham (Brooklyn Data Co.)

Meghan Colón, Head of People Operations with Fishtown Analytics moderates this panel discussion on how to build equitable and inclusive data teams. She is joined by Ilse Ackerman, Director of Data & Analytics with Brooklyn Data Co., Alexis Johnson-Gresham, Engagement Manager also with Brooklyn Data Co., and Colleen Tartow, PhD, Director of Engineering with Starburst Data.

Human in the loop data processing

What do you do when data is too messy to be useful, but too large for manual cleaning? In this video, Bladey from Civis Analytics will share their tips for implementing 'human in the loop' data processing — focusing manual efforts on the messiest data. When their team implemented this approach, a data cleaning task that used to take two months was reduced down to two weeks.