talk-data.com talk-data.com

Topic

CRM

Customer Relationship Management (CRM)

sales marketing customer_service

67

tagged

Activity Trend

12 peak/qtr
2020-Q1 2026-Q1

Activities

67 activities · Newest first

Next-Gen Sales Forecasting: AI-Powered Pipeline Management | The Data Apps Conference

Sales pipeline forecasting is essential for revenue planning, but traditional approaches rely on either unstructured spreadsheets or rigid SaaS applications like Clari—creating data silos, limiting customization, and forcing teams to switch between multiple tools for complete pipeline visibility.

In this session, Oscar Bashaw (Solution Architect) will demonstrate how to:

Create a unified sales forecasting app with role-specific views for both reps and managers Implement structured data capture with input tables for consistent deal-level forecasting Consolidate multiple data sources (CRM, call recordings, product usage) into a single tool Leverage AI models from your data warehouse to provide intelligent deal insights without leaving the workflow Build dynamic visualizations with real-time pipeline coverage and attainment tracking Use AI to surface risk signals by analyzing call sentiment, deal history, and activity trends from connected data sources With Sigma, sales teams can move beyond disconnected spreadsheets and inflexible SaaS tools to create a dynamic, AI-powered forecasting solution that scales with your business. Join this session for a complete walkthrough of the app's architecture and learn how to build similar capabilities for your organization—reducing costs while improving forecast accuracy and sales team productivity.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps


➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

Patrick Thompson, co-founder of Clarify and former co-founder of Iteratively (acquired by Amplitude), joined Yuliia and Dumky to discuss the evolution from data quality to decision quality. Patrick shares his experience building data contracts solutions at Atlassian and later developing analytics tracking tools. Patrick challenges the assumption that AI will eliminate the need for structured data. He argues that while LLMs excel at understanding unstructured data, businesses still need deterministic systems for automation and decision-making. Patrick shares insights on why enforcing data quality at the source remains critical, even in an AI-first world, and explains his shift from analytics to CRM while maintaining focus on customer data unification and business impact over technical perfectionism.Tune in!

A clean energy platform that connects residents, businesses, and other consumers with clean energy facilities that lower their electric costs, centralized data from CRM and billing systems in BigQuery for churn analysis. Historically, account managers manually reviewed cancellation cases, parsing emails and call transcripts to categorize reasons using a complex 65-category system, later condensed into 16 actionable insights. By leveraging Google Gemini, we automated this process, training the LLM to analyze customer interactions and assign accurate categories, streamlining operations and enhancing retention strategies.

Summary In this episode of the Data Engineering Podcast Andrew Luo, CEO of OneSchema, talks about handling CSV data in business operations. Andrew shares his background in data engineering and CRM migration, which led to the creation of OneSchema, a platform designed to automate CSV imports and improve data validation processes. He discusses the challenges of working with CSVs, including inconsistent type representation, lack of schema information, and technical complexities, and explains how OneSchema addresses these issues using multiple CSV parsers and AI for data type inference and validation. Andrew highlights the business case for OneSchema, emphasizing efficiency gains for companies dealing with large volumes of CSV data, and shares plans to expand support for other data formats and integrate AI-driven transformation packs for specific industries.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Andrew Luo about how OneSchema addresses the headaches of dealing with CSV data for your businessInterview IntroductionHow did you get involved in the area of data management?Despite the years of evolution and improvement in data storage and interchange formats, CSVs are just as prevalent as ever. What are your opinions/theories on why they are so ubiquitous?What are some of the major sources of CSV data for teams that rely on them for business and analytical processes?The most obvious challenge with CSVs is their lack of type information, but they are notorious for having numerous other problems. What are some of the other major challenges involved with using CSVs for data interchange/ingestion?Can you describe what you are building at OneSchema and the story behind it?What are the core problems that you are solving, and for whom?Can you describe how you have architected your platform to be able to manage the variety, volume, and multi-tenancy of data that you process?How have the design and goals of the product changed since you first started working on it?What are some of the major performance issues that you have encountered while dealing with CSV data at scale?What are some of the most surprising things that you have learned about CSVs in the process of building OneSchema?What are the most interesting, innovative, or unexpected ways that you have seen OneSchema used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on OneSchema?When is OneSchema the wrong choice?What do you have planned for the future of OneSchema?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links OneSchemaEDI == Electronic Data InterchangeUTF-8 BOM (Byte Order Mark) CharactersSOAPCSV RFCIcebergSSIS == SQL Server Integration ServicesMS AccessDatafusionJSON SchemaSFTP == Secure File Transfer ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Natural Language and Search

When you look at operational analytics and business data analysis activities—such as log analytics, real-time application monitoring, website search, observability, and more—effective search functionality is key to identifying issues, improving customers experience, and increasing operational effectiveness. How can you support your business needs by leveraging ML-driven advancements in search relevance? In this report, authors Jon Handler, Milind Shyani, Karen Kilroy help executives and data scientists explore how ML can enable ecommerce firms to generate more pertinent search results to drive better sales. You'll learn how personalized search helps you quickly find relevant data within applications, websites, and data lake catalogs. You'll also discover how to locate the content available in CRM systems and document stores. This report helps you: Address the challenges of traditional document search, including data preparation and ingestion Leverage ML techniques to improve search outcomes and the relevance of documents you retrieve Discover what makes a good search solution that's reliable, scalable, and can drive your business forward Learn how to choose a search solution to improve your decision-making process With advancements in ML-driven search, businesses can realize even more benefits and improvements in their data and document search capabilities to better support their own business needs and the needs of their customers. About the authors: Jon Handler is a senior principal solutions architect at Amazon Web Services. Milind Shyani is an applied scientist at Amazon Web Services working on large language models, information retrieval and machine learning algorithms. Karen Kilroy, CEO of Kilroy Blockchain, is a lifelong technologist, full stack software engineer, speaker, and author living in Northwest Arkansas.

podcast_episode
by Michael Albert (UVA's Darden School) , Eric Siegel (Machine Learning Week; Columbia University) , Marc Ruggiano (University of Virginia’s Collaboratory for Applied Data Science in Business)

In his new book, The AI Playbook: Mastering the Rare Art of Machine Learning Deployment, Eric Siegel offers a detailed playbook for how business professionals can launch machine learning projects, providing both success stories where private industry got it right as well as cautionary tales others can learn from.

Siegel laid out the key findings of his book in our latest episode during a wide-ranging conversation with Marc Ruggiano, director of the University of Virginia’s Collaboratory for Applied Data Science in Business, and Michael Albert, an assistant professor of business administration at UVA's Darden School. The discussion, featuring three experts in business analytics, takes an in-depth look at the intersection of artificial intelligence, machine learning, business, and leadership.

http://www.bizML.com

https://www.darden.virginia.edu/faculty-research/centers-initiatives/data-analytics/bodily-professor

https://pubsonline.informs.org/do/10.1287/LYTX.2023.03.10/full/

https://www.kdnuggets.com/survey-machine-learning-projects-still-routinely-fail-to-deploy

CRISPDM: https://en.wikipedia.org/wiki/Cross-industry_standard_process_for_data_mining

CRM: https://en.wikipedia.org/wiki/Customer_relationship_management

Send us a text Welcome back to the second installment of our captivating podcast series, where we continue our deep dive into the realm of digital transformation. Join us in another engaging conversation with Bob McDonald, the driving force behind digital transformation and CRM Experience at IBM. In this episode, we delve even further into Bob's invaluable insights on reinventing enterprises through digital transformation. Buckle up as we explore a plethora of intriguing topics that hold the key to successfully navigating this transformative journey. From deciphering intricate workflows to establishing effective management systems, Bob shares his practical wisdom gained through years of experience. Discover the pivotal role of design principles in shaping effective digital transformations and learn how to avoid common pitfalls like pocket vetoes that can hinder progress. Defining success in the context of digital transformation is a crucial aspect, and Bob elaborates on his unique perspective in just a few minutes. Tune in as he imparts his lessons learned, offering invaluable advice that can guide both beginners and veterans in the field. Stay with us as Bob takes us on an intriguing journey, touching upon OMG moments, the importance of role models, and a book recommendation that has shaped his outlook on digital transformation.

00:32 Workflows02:25 Management System 05:13 Design Principles06:50 Avoiding pocket vetos 08:20 Defining success11:58 Lessons learned13:47 OMG15:05 Role models16:18 Book recommendationConnect with Bob McDonald on LinkedIn and if you're eager to be a guest on the Making Data Simple Podcast, reach out to us at [email protected]. Join our host Al Martin, WW VP Technical Sales, IBM, as we continue to unravel the world of trending technologies, business innovation, and leadership - all while making it both informative and enjoyable. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Send us a text

"Welcome to Part 1 of an insightful podcast series where we delve into the dynamic world of digital transformation. Join us as we engage in a thought-provoking conversation with Bob McDonald, CRM Experience at IBM, who doesn't just discuss digital transformation – he's living it. In this episode, we unveil Bob's unique perspective on reinventing the enterprise. Starting with the core concept, we explore what digital transformation truly means. With Bob's wealth of experience, we journey through the challenges of changing organizational culture and the crucial role data plays in the process. Join us as we dig deep into Bob's philosophy, learn from his practical insights, and explore the transformative power of digital transformation. If you're intrigued by the evolving landscape of business innovation and technology, this episode is a must-listen.

03:12 Putting it out there05:44 Bob McDonald intro10:22 What IS digital transformation22:00 Changing culture29:23 Solving the data problemConnect with Bob McDonald on LinkedIn and if you're interested in being a guest on Making Data Simple, reach out to us at [email protected]. The Making Data Simple Podcast, hosted by Al Martin, WW VP Technical Sales, IBM, is your destination for navigating trending technologies, business innovation, and leadership – all while keeping things simple and fun." Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Building Apps on the Lakehouse with Databricks SQL

BI applications are undoubtedly one of the major consumers of a data warehouse. Nevertheless, the prospect of accessing data using standard SQL is appealing to many more stakeholders than just the data analysts. We’ve heard from customers that they experience an increasing demand to provide access to data in their lakehouse platforms from external applications beyond BI, such as e-commerce platforms, CRM systems, SaaS applications, or custom data applications developed in-house. These applications require an “always on” experience, which makes Databricks SQL Serverless a great fit.

In this session, we give an overview of the approaches available to application developers to connect to Databricks SQL and create modern data applications tailored to needs of users across an entire organization. We discuss when to choose one of the Databricks native client libraries for languages such as Python, Go, or node.js and when to use the SQL Statement Execution API, the newest addition to the toolset. We also explain when ODBC and JDBC might not be the best for the task and when they are your best friends. Live demos are included.

Talk by: Adriana Ispas and Chris Stevens

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Transitioning to Microsoft Power Platform: An Excel User Guide to Building Integrated Cloud Applications in Power BI, Power Apps, and Power Automate

Welcome to this step-by-step guide for Excel users, data analysts, and finance specialists. It is designed to take you through practical report and development scenarios, including both the approach and the technical challenges. This book will equip you with an understanding of the overall Power Platform use case for addressing common business challenges. While Power BI continues to be an excellent tool of choice in the BI space, Power Platform is the real game changer. Using an integrated architecture, a small team of citizen developers can build solutions for all kinds of business problems. For small businesses, Power Platform can be used to build bespoke CRM, Finance, and Warehouse management tools. For large businesses, it can be used to build an integration point for existing systems to simplify reporting, operation, and approval processes. The author has drawn on his15 years of hands-on analytics experience to help you pivot from the traditional Excel-based reporting environment. By using different business scenarios, this book provides you with clear reasons why a skill is important before you start to dive into the scenarios. You will use a fast prototyping approach to continue to build exciting reporting, automation, and application solutions and improve them while you acquire new skill sets. The book helps you get started quickly with Power BI. It covers data visualization, collaboration, and governance practices. You will learn about the most practical SQL challenges. And you will learn how to build applications in PowerApps and Power Automate. The book ends with an integrated solution framework that can be adapted to solve a wide range of complex business problems. What You Will Learn Develop reporting solutions and business applications Understand the Power Platform licensing and development environment Apply Data ETL and modeling in Power BI Use Data Storytelling and dashboard design to better visualize data Carry out data operations with SQL and SharePoint lists Develop useful applications using Power Apps Develop automated workflows using Power Automate Integrate solutions with Power BI, Power Apps, and Power Automate to build enterprise solutions Who This Book Is For Next-generation data specialists, including Excel-based users who want to learn Power BI and build internal apps; finance specialists who want to take a different approach to traditional accounting reports; and anyone who wants to enhance their skill set for the future job market.

Why you should not do lead scoring in your marketing automation tools

As your business and number of product lines grow, the out-of-the-box lead scoring in CRM tools starts becoming difficult to work with and lead scoring becomes that more important for sales teams. Join Ben Lewinsky as he shows how Culture Amp approaches multi-product lead scoring in their data warehouse using dbt.

Check the slides here: https://docs.google.com/presentation/d/1NOyZLs1QUf6HQqF6jusx32OjUb-Gi-PTnmiDQ8EFKM8/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Implementing Order to Cash Process in SAP

Immerse yourself in the pivotal Order to Cash (OTC) process in SAP with this comprehensive guide! By leveraging the functionalities of SAP CRM, SAP APO, SAP TMS, and SAP LES, integrated with SAP ECC, this book provides a detailed walkthrough to enhance your business operations and system understanding. What this Book will help me do Understand master data management across different SAP modules to ensure integrated operations. Explore and implement the key functions of sales processes and customer relationship management in SAP CRM. Master the concepts of order fulfillment, including ATP checks, leveraging SAP APO. Dive deep into transportation planning and freight management processes using SAP TMS. Gain insights into logistics execution and customer invoicing using SAP ECC. Author(s) None Agarwal is an experienced SAP consultant specializing in enterprise integration and process optimization. With an extensive background in SAP modules such as CRM, APO, TMS, and LES, Agarwal brings real-world experience into this work. Passionate about helping others leverage SAP software to its fullest, Agarwal writes accessible and actionable guides. Who is it for? This book is tailored for SAP consultants, solution architects, and managers tasked with process optimization in SAP environments. If you're seeking to integrate SAP CRM, TMS, or APO modules effectively into your operations, this book has been designed for you. Readers are expected to have a foundational understanding of SAP ECC and its core principles. Ideal for individuals aiming to enhance their enterprise's OTC processes.

Summary The data warehouse has become the central component of the modern data stack. Building on this pattern, the team at Hightouch have created a platform that synchronizes information about your customers out to third party systems for use by marketing and sales teams. In this episode Tejas Manohar explains the benefits of sourcing customer data from one location for all of your organization to use, the technical challenges of synchronizing the data to external systems with varying APIs, and the workflow for enabling self-service access to your customer data by your marketing teams. This is an interesting conversation about the importance of the data warehouse and how it can be used beyond just internal analytics.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. This episode of Data Engineering Podcast is sponsored by Datadog, a unified monitoring and analytics platform built for developers, IT operations teams, and businesses in the cloud age. Datadog provides customizable dashboards, log management, and machine-learning-based alerts in one fully-integrated platform so you can seamlessly navigate, pinpoint, and resolve performance issues in context. Monitor all your databases, cloud services, containers, and serverless functions in one place with Datadog’s 400+ vendor-backed integrations. If an outage occurs, Datadog provides seamless navigation between your logs, infrastructure metrics, and application traces in just a few clicks to minimize downtime. Try it yourself today by starting a free 14-day trial and receive a Datadog t-shirt after installing the agent. Go to dataengineeringpodcast.com/datadog today to see how you can enhance visibility into your stack with Datadog. Your host is Tobias Macey and today I’m interviewing Tejas Manohar about Hightouch, a data platform that helps you sync your customer data from your data warehouse to your CRM, marketing, and support tools

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what you are building at Hightouch and your motivation for creating it? What are the main points of friction for teams who are trying to make use of customer data? Where is Hightouch positioned in the ecosystem of customer data tools such as Segment, Mixpanel

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery) , David Raab (CDP Institute)

It sometimes seems like there must be a Moore's Law of marketing technology (or "martech," as the cool kids call it, and our site is on a .io domain, so we're definitely the cool kids) whereby the number of platforms available doubles every 6 to 8 weeks. And, every couple of months, it seems, a whole new category emerges. From CMS to DAM to CRM to TMS to DMP to DSP to CDP, it's an alphabet soup of TLAs that no one can make sense of PDQ! On this episode, Michael, Moe, and Tim sat down with the man who coined the name for one of those categories back in 2013: David Raab, the founder of the CDP Institute! It was a lively chat about the messy world of vendor overload and how to frame, assess, and successfully manage martech stacks. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Summary Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage engines, to the current generation of cloud-native analytical engines. SnowflakeDB has been leading the charge to take advantage of cloud services that simplify the separation of compute and storage. In this episode Kent Graziano, chief technical evangelist for SnowflakeDB, explains how it is differentiated from other managed platforms and traditional data warehouse engines, the features that allow you to scale your usage dynamically, and how it allows for a shift in your workflow from ETL to ELT. If you are evaluating your options for building or migrating a data platform, then this is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media and the Python Software Foundation. Upcoming events include the Software Architecture Conference in NYC and PyCOn US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Kent Graziano about SnowflakeDB, the cloud-native data warehouse

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what SnowflakeDB is for anyone who isn’t familiar with it?

How does it compare to the other available platforms for data warehousing? How does it differ from traditional data warehouses?

How does the performance and flexibility affect the data modeling requirements?

Snowflake is one of the data stores that is enabling the shift from an ETL to an ELT workflow. What are the features that allow for that approach and what are some of the challenges that it introduces? Can you describe how the platform is architected and some of the ways that it has evolved as it has grown in popularity?

What are some of the current limitations that you are struggling with?

For someone getting started with Snowflake what is involved with loading data into the platform?

What is their workflow for allocating and scaling compute capacity and running anlyses?

One of the interesting features enabled by your architecture is data sharing. What are some of the most interesting or unexpected uses of that capability that you have seen? What are some other features or use cases for Snowflake that are not as well known or publicized which you think users should know about? When is SnowflakeDB the wrong choice? What are some of the plans for the future of SnowflakeDB?

Contact Info

LinkedIn Website @KentGraziano on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

SnowflakeDB

Free Trial Stack Overflow

Data Warehouse Oracle DB MPP == Massively Parallel Processing Shared Nothing Architecture Multi-Cluster Shared Data Architecture Google BigQuery AWS Redshift AWS Redshift Spectrum Presto

Podcast Episode

SnowflakeDB Semi-Structured Data Types Hive ACID == Atomicity, Consistency, Isolation, Durability 3rd Normal Form Data Vault Modeling Dimensional Modeling JSON AVRO Parquet SnowflakeDB Virtual Warehouses CRM == Customer Relationship Management Master Data Management

Podcast Episode

FoundationDB

Podcast Episode

Apache Spark

Podcast Episode

SSIS == SQL Server Integration Services Talend Informatica Fivetran

Podcast Episode

Matillion Apache Kafka Snowpipe Snowflake Data Exchange OLTP == Online Transaction Processing GeoJSON Snowflake Documentation SnowAlert Splunk Data Catalog

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Summary

Every business with a website needs some way to keep track of how much traffic they are getting, where it is coming from, and which actions are being taken. The default in most cases is Google Analytics, but this can be limiting when you wish to perform detailed analysis of the captured data. To address this problem, Alex Dean co-founded Snowplow Analytics to build an open source platform that gives you total control of your website traffic data. In this episode he explains how the project and company got started, how the platform is architected, and how you can start using it today to get a clearer view of how your customers are interacting with your web and mobile applications.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. You work hard to make sure that your data is reliable and accurate, but can you say the same about the deployment of your machine learning models? The Skafos platform from Metis Machine was built to give your data scientists the end-to-end support that they need throughout the machine learning lifecycle. Skafos maximizes interoperability with your existing tools and platforms, and offers real-time insights and the ability to be up and running with cloud-based production scale infrastructure instantaneously. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat This is your host Tobias Macey and today I’m interviewing Alexander Dean about Snowplow Analytics

Interview

Introductions How did you get involved in the area of data engineering and data management? What is Snowplow Analytics and what problem were you trying to solve when you started the company? What is unique about customer event data from an ingestion and processing perspective? Challenges with properly matching up data between sources Data collection is one of the more difficult aspects of an analytics pipeline because of the potential for inconsistency or incorrect information. How is the collection portion of the Snowplow stack designed and how do you validate the correctness of the data?

Cleanliness/accuracy

What kinds of metrics should be tracked in an ingestion pipeline and how do you monitor them to ensure that everything is operating properly? Can you describe the overall architecture of the ingest pipeline that Snowplow provides?

How has that architecture evolved from when you first started? What would you do differently if you were to start over today?

Ensuring appropriate use of enrichment sources What have been some of the biggest challenges encountered while building and evolving Snowplow? What are some of the most interesting uses of your platform that you are aware of?

Keep In Touch

Alex

@alexcrdean on Twitter LinkedIn

Snowplow

@snowplowdata on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Snowplow

GitHub

Deloitte Consulting OpenX Hadoop AWS EMR (Elastic Map-Reduce) Business Intelligence Data Warehousing Google Analytics CRM (Customer Relationship Management) S3 GDPR (General Data Protection Regulation) Kinesis Kafka Google Cloud Pub-Sub JSON-Schema Iglu IAB Bots And Spiders List Heap Analytics

Podcast Interview

Redshift SnowflakeDB Snowplow Insights Googl

Power BI Data Analysis and Visualization

Power BI Data Analysis and Visualization provides a roadmap to vendor choices and highlights why Microsoft’s Power BI is a very viable, cost effective option for data visualization. The book covers the fundamentals and most commonly used features of Power BI, but also includes an in-depth discussion of advanced Power BI features such as natural language queries; embedding Power BI dashboards; and live streaming data. It discusses real solutions to extract data from the ERP application, Microsoft Dynamics CRM, and also offers ways to host the Power BI Dashboard as an Azure application, extracting data from popular data sources like Microsoft SQL Server and open-source PostgreSQL. Authored by Microsoft experts, this book uses real-world coding samples and screenshots to spotlight how to create reports, embed them in a webpage, view them across multiple platforms, and more. Business owners, IT professionals, data scientists, and analysts will benefit from this thorough presentation of Power BI and its functions.