SQL

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

2023-07-17 · Data Engineering Podcast Listen

podcast_episode

by Bob Muglia (Snowflake; Microsoft) , Tobias Macey

AI/ML Data Engineering Data Management Databricks Fivetran Looker Modern Data Stack Microsoft Fabric Pinecone Python Redshift +3 more

Summary

Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on data to deliver valuable services and drive meaningful change.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Bob Muglia about his recent book about the idea of "Datapreneurs" and the role of data in the modern economy

Interview

Introduction How did you get involved in the area of data management? Can you describe what your concept of a "Datapreneur" is?

How is this distinct from the common idea of an entreprenur?

What do you see as the key inflection points in data technologies and their impacts on business capabilities over the past ~30 years? In your role as the CEO of Snowflake you had a first-row seat for the rise of the "modern data stack". What do you see as the main positive and negative impacts of that paradigm?

What are the key issues that are yet to be solved in that ecosmnjjystem?

For technologists who are thinking about launching new ventures, what are the key pieces of advice that you would like to share? What do you see as the short/medium/long-term impact of AI on the technical, business, and societal arenas? What are the most interesting, innovative, or unexpected ways that you have seen business leaders use data to drive their vision? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the Datapreneurs book? What are your key predictions for the future impact of data on the technical/economic/business landscapes?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Datapreneurs Book SQL Server Snowflake Z80 Processor Navigational Database System R Redshift Microsoft Fabric Databricks Looker Fivetran

Podcast Episode

Databricks Unity Catalog RelationalAI 6th Normal Form Pinecone Vector DB

Podcast Episode

Perplexity AI

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackSupport Data Engineering Podcast

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

2023-07-09 · Data Engineering Podcast Listen

podcast_episode

by Maxime Beauchemin (Preset) , Tobias Macey

Activity Schema AI/ML Airflow Analytics Data Engineering Data Management Data Modelling dbt ETL/ELT GitHub Informatica dimensional modeling +3 more

Summary

For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Max Beauchemin about the concept of entity-centric data modeling for analytical use cases

Interview

Introduction How did you get involved in the area of data management? Can you describe what entity-centric modeling (ECM) is and the story behind it?

How does it compare to dimensional modeling strategies? What are some of the other competing methods Comparison to activity schema

What impact does this have on ML teams? (e.g. feature engineering)

What role does the tooling of a team have in the ways that they end up thinking about modeling? (e.g. dbt vs. informatica vs. ETL scripts, etc.)

What is the impact on the underlying compute engine on the modeling strategies used?

What are some examples of data sources or problem domains for which this approach is well suited?

What are some cases where entity centric modeling techniques might be counterproductive?

What are the ways that the benefits of ECM manifest in use cases that are down-stream from the warehouse?

What are some concrete tactical steps that teams should be thinking about to implement a workable domain model using entity-centric principles?

How does this work across business domains within a given organization (especially at "enterprise" scale)?

What are the most interesting, innovative, or unexpected ways that you have seen ECM used?

What are the most interesting, unexpected, or challenging lessons that you have learned while working on ECM?

When is ECM the wrong choice?

What are your predictions for the future direction/adoption of ECM or other modeling techniques?

Contact Info

mistercrunch on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Entity Centric Modeling Blog Post Max's Previous Apperances

Defining Data Engineering with Maxime Beauchemin Self Service Data Exploration And Dashboarding With Superset Exploring The Evolving Role Of Data Engineers Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

Apache Airflow Apache Superset Preset Ubisoft Ralph Kimball The Rise Of The Data Engineer The Downfall Of The Data Engineer The Rise Of The Data Scientist Dimensional Data Modeling Star Schema Databas

How Data Engineering Teams Power Machine Learning With Feature Platforms

2023-07-03 · Data Engineering Podcast Listen

podcast_episode

by Razi Raziuddin , Tobias Macey

AI/ML Data Engineering Data Management Python SaaS

Summary

Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Razi Raziuddin about how data engineers can empower data scientists to develop and deploy better ML models through feature engineering

Interview

Introduction How did you get involved in the area of data management? What is feature engineering is and why/to whom it matters?

A topic that commonly comes up in relation to feature engineering is the importance of a feature store. What are the tradeoffs for that to be a separate infrastructure/architecture component?

What is the overall lifecycle of a feature, from definition to deployment and maintenance?

How is this distinct from other forms of data pipeline development and delivery? Who are the participants in that workflow?

What are the sharp edges/roadblocks that typically manifest in that lifecycle? What are the interfaces that are needed for data scientists/ML engineers to be able to self-serve their feature management?

What is the role of the data engineer in supporting those interfaces? What are the communication/collaboration channels that are necessary to make the overall process a success?

From an implementation/architecture perspective, what are the patterns that you have seen teams build around for feature development/serving? What are the most interesting, innovative, or unexpected ways that you have seen feature platforms used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on feature engineering? What are the resources that you find most helpful in understanding and designing feature platforms?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

FeatureByte DataRobot Feature Store Feast Feature Store Feathr Kaggle Yann LeCun

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations fo

A Single Pane of Glass on Airflow using Astro Python SDK, Snowflake, dbt, and Cosmos

2023-07-01 · Airflow Summit 2023

session

by Luan Moreno Medeiros Maciel (Pythian)

Airflow Astronomer Cosmos dbt DWH ETL/ELT Python Snowflake

ETL data pipelines are the bread and butter of data teams that must design, develop, and author DAGs to accommodate the various business requirements. dbt is becoming one of the most used tools to perform SQL transformations on the Data Warehouse, allowing teams to harness the power of queries at scale. Airflow users are constantly finding new ways to integrate dbt with the Airflow ecosystem and build a single pane of glass where Data Engineers can manage and administer their pipelines. Astronomer Cosmos, an open-source product, has been introduced to integrate Airflow with dbt Core seamlessly. Now you can easily see your dbt pipelines fully integrated on Airflow. You will learn the following: How to integrate dbt Core with Airflow How to use Cosmos How to build data pipelines at scale

Using Dynamic Task Mapping to Orchestrate dbt

2023-07-01 · Airflow Summit 2023

session

by Pádraic Slattery (Xebia Data)

Airflow Analytics dbt

Airflow, traditionally used by Data Engineers, is now popular among Analytics Engineers who aim to provide analysts with high-quality tooling while adhering to software engineering best practices. dbt, an open-source project that uses SQL to create data transformation pipelines, is one such tool. One approach to orchestrating dbt using Airflow is using dynamic task mapping to automatically create a task for each sub-directory inside dbt’s staging, intermediate, and marts directories. This enables analysts to write SQL code that is automatically added as a dedicated task in Airflow at runtime. Combining this new Airflow feature with dbt best practices offers several benefits, such as analysts not needing to make Airflow changes and engineers being able to re-run subsets of dbt models should errors occur. In this talk, I would like to share some lessons I have learned while successfully implementing this approach for several clients.

Data Engineering with dbt

2023-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Roberto Zagni

Analytics Cloud Computing Data Engineering dbt ETL/ELT Snowflake data data-engineering

Data Engineering with dbt provides a comprehensive guide to building modern, reliable data platforms using dbt and SQL. You'll gain hands-on experience building automated ELT pipelines, using dbt Cloud with Snowflake, and embracing patterns for scalable and maintainable data solutions. What this Book will help me do Set up and manage a dbt Cloud environment and create reliable ELT pipelines. Integrate Snowflake with dbt to implement robust data engineering workflows. Transform raw data into analytics-ready data using dbt's features and SQL. Apply advanced dbt functionality such as macros and Jinja for efficient coding. Ensure data accuracy and platform reliability with built-in testing and monitoring. Author(s) None Zagni is a seasoned data engineering professional with a wealth of experience in designing scalable data platforms. Through practical insights and real-world applications, Zagni demystifies complex data engineering practices. Their approachable teaching style makes technical concepts accessible and actionable. Who is it for? This book is perfect for data engineers, analysts, and analytics engineers looking to leverage dbt for data platform development. If you're a manager or decision maker interested in fostering efficient data workflows or a professional with basic SQL knowledge aiming to deepen your expertise, this resource will be invaluable.

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

2023-06-25 · Data Engineering Podcast Listen

podcast_episode

by Toby Mao (SQLMesh) , Tobias Macey

AI/ML Airflow CDP Data Engineering Data Lake Data Management DataOps dbt GitHub ORC Pandas Python +5 more

Summary

Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects. In this episode Toby Mao explains how it works, the importance of automatic column-level lineage tracking, and how you can start using it today.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack- Your host is Tobias Macey and today I'm interviewing Toby Mao about SQLMesh, an open source DataOps framework designed to scale data transformations with ease of collaboration and validation built in

Interview

Introduction How did you get involved in the area of data management? Can you describe what SQLMesh is and the story behind it?

DataOps is a term that has been co-opted and overloaded. What are the concepts that you are trying to convey with that term in the context of SQLMesh?

What are the rough edges in existing toolchains/workflows that you are trying to address with SQLMesh?

How do those rough edges impact the productivity and effectiveness of teams using those

Can you describe how SQLMesh is implemented?

How have the design and goals evolved since you first started working on it?

What are the lessons that you have learned from dbt which have informed the design and functionality of SQLMesh? For teams who have already invested in dbt, what is the migration path from or integration with dbt? You have some built-in integration with/awareness of orchestrators (currently Airflow). What are the benefits of making the transformation tool aware of the orchestrator? What do you see as the potential benefits of integration with e.g. data-diff? What are the second-order benefits of using a tool such as SQLMesh that addresses the more mechanical aspects of managing transformation workfows and the associated dependency chains? What are the most interesting, innovative, or unexpected ways that you have seen SQLMesh used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on SQLMesh? When is SQLMesh the wrong choice? What do you have planned for the future of SQLMesh?

Contact Info

tobymao on GitHub @captaintobs on Twitter Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

SQLMesh Tobiko Data SAS AirBnB Minerva SQLGlot Cron AST == Abstract Syntax Tree Pandas Terraform dbt

Podcast Episode

SQLFluff

Podcast.init Episode

The intro and outro music is from The Hug by The Freak Fandango Orc

From MLOps to DataOps - Santona Tuli

2023-06-23 · DataTalks.Club Listen

podcast_episode

by Santona Tuli (Upsolver)

AI/ML Data Lakehouse DataOps dbt GitHub HTML Modern Data Stack MLOps

We talked about:

Santona's background Focusing on data workflows Upsolver vs DBT ML pipelines vs Data pipelines MLOps vs DataOps Tools used for data pipelines and ML pipelines The “modern data stack” and today's data ecosystem Staging the data and the concept of a “lakehouse” Transforming the data after staging What happens after the modeling phase Human-centric vs Machine-centric pipeline Applying skills learned in academia to ML engineering Crafting user personas based on real stories A framework of curiosity Santona's book and resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/santona-tuli/ Upsolver website: upsolver.com Why we built a SQL-based solution to unify batch and stream workflows: https://www.upsolver.com/blog/why-we-built-a-sql-based-solution-to-unify-batch-and-stream-workflows

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Hands-on Workshop and Demo

2023-06-22 · ML Meetup (In-person): Scaling Real-time Data Processing

workshop

ai-assisted sql chatgpt tidb tidb cloud

Bring your laptop if you are eager to take part in the hands-on workshop. In this workshop session, you will learn:\n- Introduction to TiDB managed Cloud platform\n- Deploy a serverless TiDB Cluster in the Cloud\n- Explore TiDB Cloud console functions & features\n- Use SQL Smart Optimizer delivers fast query results\n- Leverage AI to generate SQL queries with ChatGPT

Oracle PL/SQL by Example, 6th Edition

2023-06-09 · O'Reilly SQL Books O'Reilly Amazon

book

by Benjamin Rosenzweig , Elena Rakhimov

Cloud Computing Oracle pl/sql

Using PL/SQL for Oracle Database 21c, you can build solutions that deliver unprecedented performance and efficiency in any environment, including the cloud. Oracle PL/SQL by Example, Sixth Edition, teaches all the PL/SQL skills you'll need, through real-world labs and extensive examples. Now fully updated for the newest version of PL/SQL 21c, it covers everything from basic syntax and program control through the latest optimization and tuning enhancements. Step by step, you'll walk through every key task, mastering today's most valuable Oracle 21c PL/SQL programming techniques on your own. Start by downloading the supporting schema and exercises from informit.com/title/9780138062835. Once you've done an exercise, the author doesn't just present the answer: She offers an in-depth discussion introducing deeper insights and modern best practices. This book's approach fully reflects the author's award-winning experience teaching PL/SQL to professionals at Columbia University in New York City. New database developers and DBAs can use it to get productive fast; experienced PL/SQL programmers will find it to be a superb Oracle Database 21c solutions reference. New in This Edition Updated code examples throughout New iteration controls for the FOR LOOP statement, such as stepped range, multiple iterations, collection, and cursor iterations Enhancements for PL/SQL qualified expressions Performance enhancements for PL/SQL functions, such as SQL macro, and better control of the result cache Other Topics Covered Mastering basic PL/SQL concepts and language fundamentals, and understanding SQL's role in PL/SQL Using conditional and iterative program controls Efficiently handling errors and exceptions Working with cursors and triggers, including compound triggers Using stored procedures, functions, and packages to write modular code that other programs can run Working with collections, object-relational features, native dynamic SQL, bulk SQL, and other advanced features ...

Getting Started with SQL and Databases: Managing and Manipulating Data with SQL

2023-06-07 · O'Reilly SQL Books O'Reilly Amazon

book

by Mark Simon

MariaDB Microsoft MySQL Oracle SQL Server postgresql

Learn the basics of writing SQL scripts. Using Standard SQL as the starting point, this book teaches writing SQL in various popular dialects, including PostgreSQL, MySQL/MariaDB, Microsoft SQL Server, Oracle, and SQLite. The book starts with a general introduction to writing SQL and covers the basic concepts. Author Mark Simon then covers database principles, and how database tables are designed. He teaches you how to filter data using the WHERE clause, and you will work with NULL, numbers, dates, and strings. You will also understand sorting results using the ORDER BY clause, sorting by calculated columns, and limiting the number of results. By the end of the book, you will know how to insert and update data, and summarize data with aggregate functions and groups. Three appendices cover differences between SQL dialects, working with tables, and a crash course in PDO. What You Will Learn Filter, sort, andcalculate data Summarize data with aggregate functions Modify data with insert, update, and delete statements Study design principles in developing a database Who This Book Is For Developers and analysts working with SQL, as well as web developers who want a stronger understanding of working with databases

Data Modeling with Snowflake

2023-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Serge Gershkovich (SQL DBM)

Agile/Scrum Cloud Computing Data Management Data Modelling Data Vault dimensional modeling Snowflake data data-engineering

This comprehensive guide, "Data Modeling with Snowflake", is your go-to resource for mastering the art of efficient data modeling tailored to the capabilities of the Snowflake Data Cloud. In this book, you will learn how to design agile and scalable data solutions by effectively leveraging Snowflake's unique architecture and advanced features. What this Book will help me do Understand the core principles of data modeling and how they apply to Snowflake's cloud-native environment. Learn to use Snowflake's features, such as time travel and zero-copy cloning, to create efficient data solutions. Gain hands-on experience with SQL recipes that outline practical approaches to transforming and managing Snowflake data. Discover techniques for modeling structured and semi-structured data for real-world business needs. Learn to integrate universal modeling frameworks like Star Schema and Data Vault into Snowflake implementations for scalability and maintainability. Author(s) The author, Serge Gershkovich, is a seasoned expert in database design and Snowflake architecture. With years of experience in the data management field, Serge has dedicated himself to making complex technical subjects approachable to professionals at all levels. His insights in this book are informed by practical applications and real-world experience. Who is it for? This book is targeted at data professionals, ranging from newcomers to database design to seasoned SQL developers seeking to specialize in Snowflake. If you are looking to understand and apply data modeling practices effectively within Snowflake's architecture, this book is for you. Whether you're refining your modeling skills or getting started with Snowflake, it provides the practical knowledge you need to succeed.

Geo at the time of AI | Javier de la Torre | Founder & CSO of CARTO

2023-05-24 · Spatial Data Science Conference 2023 Watch

video

by Javier de la Torre (CARTO)

AI/ML Data Science GIS

Javier de la Torre, Founder and CSO of CARTO, kicks off the Spatial Data Science Conference 2023 highlighting the nuances in geospatial current era of artificial intelligence. He demonstrates several uses such as using GPT4 to generate OpenStreetMap SQL queries to grab data and perform analysis, creating GIS systems based on prompts and more.

For more information, check out our website: https://carto.com/

MySQL Crash Course

2023-05-23 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rick Silva

Java MySQL Python data data-engineering relational-databases

MySQL Crash Course is a fast-paced, no-nonsense introduction to relational database development. It’s filled with practical examples and expert advice that will have you up and running quickly. You’ll learn the basics of SQL, how to create a database, craft SQL queries to extract data, and work with events, procedures, and functions. You’ll see how to add constraints to tables to enforce rules about permitted data and use indexes to accelerate data retrieval. You’ll even explore how to call MySQL from PHP, Python, and Java. Three final projects will show you how to build a weather database from scratch, use triggers to prevent errors in an election database, and use views to protect sensitive data in a salary database. You’ll also learn how to: •Query database tables for specific information, order the results, comment SQL code, and deal with null values •Define table columns to hold strings, integers, and dates, and determine what data types to use •Join multiple database tables as well as use temporary tables, common table expressions, derived tables, and subqueries •Add, change, and remove data from tables, create views based on specific queries, write reusable stored routines, and automate and schedule events The perfect quick-start resource for database developers, MySQL Crash Course will arm you with the tools you need to build and manage fast, powerful, and secure MySQL-based data storage systems.

Hot or Not: Latest Trends & Buzzwords in Data | Panel: dbt labs, Hex, West Marin Data

2023-05-15 · Data Council 2023 Watch

video

by Barry McCardel (Hex) , Drew Banin (Fishtown Analytics) , Pedram Navid (West Marin Data) , Julia Schottenstein (dbt labs)

AI/ML Analytics Data Engineering dbt Marketing Data Streaming

ABOUT THE TALK: What are the latest trends and buzzwords in Data?

Barry McCordel welcomes panelists from Hex, DBT Labs and West Marin Data to discuss their thoughts on the latest trends and buzzwords in Data.

Learn about the latest in the world of streaming, data teams doing more with less, data meshes, innovations in different kids of SQL plus more!

ABOUT THE SPEAKERS: Julia Schottenstein is the Product Manager at dbt labs. Prior to this, she worked in Venture Capital as a Principal at NEA.

Drew Banin is the co-founder of dbt labs. He has built event collection systems that scaled to billions of events per month, implemented Markov-based marketing attribution models on millions of dollars of marketing spend, and dreams in NetworkX graphs.

Barry McCardel is the CEO and co-founder of Hex. He previously worked at TrialSpark leading operation and Palantir Technologies where he led teams at the intersection of product development and real-world impact.

Pedram Navid is the Founder of West Marin Data. In his role he helps startups implement their data stack. He also supports them with product, marketing and community-building.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil

How to End the Long tail of Most Data Requests | Narrator

2023-05-11 · Data Council 2023 Watch

video

by Ahmed Elsamadisi (Narrator)

AI/ML Analytics Data Engineering

ABOUT THE TALK: Modern data stacks focus on the most common use-cases and dashboards, but what about all the ad-hoc requests that come? The current tool set fails to allow data analysts to iterate easily with stakeholders. In this talk, we will discuss that without an ad-hoc layer, data analysts are left to answer questions with hacky live SQL or have every request go through the resource-intensive and expensive production processes and workflows.

An ad-hoc layer solves this by allowing data analysts to answer data questions, change their mind, and deliver data dumps or simple analyses incredibly fast and reliably. Allowing them to prioritize putting it into production only if it needs to be reused.

ABOUT THE SPEAKER: Ahmed Elsamadisi is the founder and CEO of Narrator. Narrator enables companies to make better decisions by providing them with the ability to answer any question in under 10 minutes. Ahmed started his career building algorithms for self-driving cars and human-robot interaction. He then joined Raytheon to develop AI algorithms for missile defense, focusing on tracking and discrimination. In 2015, Ahmed joined WeWork as the first hire on their data team. He built their data engineering infrastructure and grew the team of data engineers and analysts.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Cubing and Metrics in SQL, Oh My!

2023-05-11 · Data Council 2023 Watch

video

by Julian Hyde (Google)

AI/ML Analytics BigQuery Data Engineering Data Management Looker

ABOUT THE TALK Apache Calcite has extended SQL to support metrics (which we call ‘measures’), filter context, and analytic expressions. With these concepts you can define data models (which we call Analytic Views) that contain metrics, use them in queries, and define new metrics in queries.

This talk, hosted by the original developer of Apache Calcite describes the SQL syntax extensions for metrics, and how to use them for cross-dimensional calculations such as period-over-period, percent-of-total, non-additive and semi-additive measures. It details how we got around fundamental limitations in SQL semantics, and approaches for optimizing queries that use metrics.

ABOUT THE SPEAKER Julian Hyde is the original developer of Apache Calcite, an open source framework for building data management systems, and Morel, a functional query language. Previously he created Mondrian, an analytics engine, and SQLstream, an engine for continuous queries. He is a staff engineer at Google, where he works on Looker and BigQuery.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

How to Build a Streaming Database in Three Challenging Steps | Materialize

2023-05-11 · Data Council 2023 Watch

video

by Frank McSherry (Materialize)

AI/ML Analytics Computer Science Data Engineering Dataflow Rust Data Streaming

ABOUT THE TALK: A streaming database is a potentially intimidating product to build. Frank McSherry, Chief Scientist at Materialize, breaks down the manageable parts, through three foundational choices that fit together well. Frank also talks about the trade-offs, and how their simplifications lead to a much more manageable streaming database.

ABOUT THE SPEAKER: Frank McSherry is Chief Scientist at Materialize, where he (and others) convert SQL into scale-out, streaming, and interactive dataflows. Before this, he developed the timely and differential dataflow Rust libraries (with colleagues at ETHZ), and led the Naiad research project and co-invented differential privacy while at MSR Silicon Valley. He has a PhD in computer science from the University of Washington.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

CDC Stream Processing with Apache Flink

2023-05-11 · Data Council 2023 Watch

video

by Timo Walther (Data Artisans, Ververica, Immerok)

AI/ML Analytics Flink Data Engineering Kafka

ABOUT THE TALK: In this talk, we highlight what it means for Apache Flink to be a general data processor that acts as a data integration hub. Looking under the hood, we demonstrate Flink's SQL engine as a changelog processor that ships with an ecosystem tailored to processing CDC data and maintaining materialized views. We will discuss the semantics of different data sources and how to perform joins or stream enrichment between them. This talk illustrates how Flink can be used with systems such as Kafka (for upsert logging), Debezium, JDBC, and others.

ABOUT THE SPEAKER: Timo Walther is a long-term member of the management committee and among the top committers in the Apache Flink project. Timo worked as a software engineer at Data Artisans and lead of the SQL team at Ververica. He was a Co-Founder of Immerok which was acquired by Confluent in 2023. In Flink, he is working on various topics in the Table & SQL ecosystem to make stream processing accessible for everyone.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Making Moves with Arrow Data: Introducing Arrow Database Connectivity (ADBC) | Voltron Data

2023-05-11 · Data Council 2023 Watch

video

by Matthew Topol (Voltron Data)

AI/ML Analytics API Arrow Data Engineering Go Parquet postgresql

ABOUT THE TALK: In this talk, we'll dive into one of the newest Apache Arrow subprojects, Arrow Database Connectivity (ADBC), an API specification for Arrow-based database access.

Over the course of this session, you’ll get a crash course in ADBC and learn how it communicates with different data APIs (like Arrow Flight SQL and Postgres) using Arrow-native in-memory data. By the end, you’ll understand the use cases it can conquer and know where to access the resources you need to get started.

This talk will cover goals, use-cases, and examples of using ADBC to communicate with different Data APIs (such as Flight SQL or postgres) with Arrow Native in-memory data.

ABOUT THE SPEAKER: Matthew Topol is a committer for the Apache Arrow project, frequently enhancing the Golang Arrow and Parquet libraries among other enhancements and helping to grow the Arrow Community. Recently, Matt has joined Voltron Data in order to work on the Apache Arrow libraries full time and grow the Arrow Golang community. In June 2022, Matt's first book was published, which is the first (and currently only) book on Apache Arrow titled "In-Memory Analytics with Apache Arrow".

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

talk-data.com

Activity Trend

Top Events

Top Speakers

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

How Data Engineering Teams Power Machine Learning With Feature Platforms

A Single Pane of Glass on Airflow using Astro Python SDK, Snowflake, dbt, and Cosmos

Using Dynamic Task Mapping to Orchestrate dbt

Data Engineering with dbt

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

From MLOps to DataOps - Santona Tuli

Hands-on Workshop and Demo

Oracle PL/SQL by Example, 6th Edition

Getting Started with SQL and Databases: Managing and Manipulating Data with SQL

Data Modeling with Snowflake

Geo at the time of AI | Javier de la Torre | Founder & CSO of CARTO

MySQL Crash Course

Hot or Not: Latest Trends & Buzzwords in Data | Panel: dbt labs, Hex, West Marin Data

How to End the Long tail of Most Data Requests | Narrator

Cubing and Metrics in SQL, Oh My!

How to Build a Streaming Database in Three Challenging Steps | Materialize

CDC Stream Processing with Apache Flink

Making Moves with Arrow Data: Introducing Arrow Database Connectivity (ADBC) | Voltron Data