talk-data.com talk-data.com

Topic

Data Engineering

etl data_pipelines big_data

1127

tagged

Activity Trend

127 peak/qtr
2020-Q1 2026-Q1

Activities

1127 activities · Newest first

Many data engineers already use large language models to assist data ingestion, transformation, DataOps, and orchestration. This blog commences a series that explores the emergence of ChatGPT, Bard, and LLM tools from data pipeline vendors, and their implications for the discipline of data engineering. Published at: https://www.eckerson.com/articles/should-ai-bots-build-your-data-pipelines-examining-the-role-of-chatgpt-and-large-language-models-in-data-engineering

Summary

Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argue that stream processing engines can easily handle batched workloads, but the reverse isn't true. The batch world has been the default for years because of the complexities of running a reliable streaming system at scale. In order to remove that barrier, the team at Estuary have built the Gazette and Flow systems from the ground up to resolve the pain points of other streaming engines, while providing an intuitive interface for data and application engineers to build their streaming workflows. In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing David Yaffe and Johnny Graettinger about using streaming data to build a real-time data lake and how Estuary gives you a single path to integrating and transforming your various sources

Interview

Introduction How did you get involved in the area of data management? Can you describe what Estuary is and the story behind it? Stream processing technologies have been around for around a decade. How would you characterize the current state of the ecosystem?

What was missing in the ecosystem of streaming engines that motivated you to create a new one from scratch?

With the growth in tools that are focused on batch-oriented data integration and transformation, what are the reasons that an organization should still invest in streaming?

What is the comparative level of difficulty and support for these disparate paradigms?

What is the impact of continuous data flows on dags/orchestration of transforms? What role do modern table formats have on the viability of real-time data lakes? Can you describe the architecture of your Flow platform?

What are the core capabilities that you are optimizing for in its design?

What is involved in getting Flow/Estuary deployed and integrated with an organization's data systems? What does the workflow look like for a team using Estuary?

How does it impact the overall system architecture for a data platform as compared to other prevalent paradigms?

How do you manage the translation of poll vs. push availability and best practices for API and other non-CDC sources? What are the most interesting, innovative, or unexpected ways that you have seen Estuary used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Estuary? When is Estuary the wrong choice? What do you have planned for the future of Estuary?

Contact Info

Dave Y Johnny G

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcas

In this episode, I go through the reasons why I put a stop to most meetings going forward. I urge you to figure out how you can better control your time since it's the one thing you never get back.

Also, I've got a new weekend newsletter dropping tomorrow! Sign up at joereis.substack.com


If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Subscribe to my Substack: https://joereis.substack.com/

We talked about:

Katharine's background Katharine's ML privacy startup GDPR, CCPA, and the “opt-in as the default” approach What is data privacy? Finding Katharine's book – Practical Data Privacy The various definitions of data privacy and “user profiles” Privacy engineering and privacy-enhancing technologies Why data privacy is important What is differential privacy? The importance of keeping privacy in mind when designing systems Data privacy on the example of ChatGPT Katharine's resource suggestions for learning about data privacy

Links:

LinkedIn: https://www.linkedin.com/in/katharinejarmul/

Twitter: https://twitter.com/kjam

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Real time Schema Discovery | Streamdal

ABOUT THE TALK: In this talk, Dan Selans shows you how we developed a schema discovery process that is able to automatically evolve schemas in a complex distributed system that is processing upwards of a 100,000 messages per second.

He dives deep into the details of schema versioning, detecting schema conflicts, compatibility and normalization, all without the use of any batching processes.

He shows how they developed a schema discovery process that is able to automatically evolve schemas in a complex distributed system that is processing upwards of a 100,000 messages per second. He also details how to detect schema drift, determine compatibility and ultimately how to do all of this, without having to involve batching.

ABOUT THE SPEAKER: Daniel Selans is the co-founder and CTO of Streamdal.com, a streaming data performance monitoring company. Dan previously wrote software at companies such as InVisionApp, New Relic and DigitalOcean and before that, spent over 10 years doing integration and R&D work at data centers.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil

podcast_episode
by Jon Cooke (Dataception) , Joe Reis (DeepLearning.AI)

Data products are a very popular topic these days. The challenge is we need new thinking and approaches that differ from how we've worked with data in the past. Jon Cooke and I chat about the mindset shift needed to make data products successful.

Jon Cooke's LinkedIn: https://www.linkedin.com/in/jon-cooke-096bb0/


If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Subscribe to my Substack: https://joereis.substack.com/

Hot or Not: Latest Trends & Buzzwords in Data | Panel: dbt labs, Hex, West Marin Data

ABOUT THE TALK: What are the latest trends and buzzwords in Data?

Barry McCordel welcomes panelists from Hex, DBT Labs and West Marin Data to discuss their thoughts on the latest trends and buzzwords in Data.

Learn about the latest in the world of streaming, data teams doing more with less, data meshes, innovations in different kids of SQL plus more!

ABOUT THE SPEAKERS: Julia Schottenstein is the Product Manager at dbt labs. Prior to this, she worked in Venture Capital as a Principal at NEA.

Drew Banin is the co-founder of dbt labs. He has built event collection systems that scaled to billions of events per month, implemented Markov-based marketing attribution models on millions of dollars of marketing spend, and dreams in NetworkX graphs.

Barry McCardel is the CEO and co-founder of Hex. He previously worked at TrialSpark leading operation and Palantir Technologies where he led teams at the intersection of product development and real-world impact.

Pedram Navid is the Founder of West Marin Data. In his role he helps startups implement their data stack. He also supports them with product, marketing and community-building.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil

Summary

All of the advancements in our technology is based around the principles of abstraction. These are valuable until they break down, which is an inevitable occurrence. In this episode the host Tobias Macey shares his reflections on recent experiences where the abstractions leaked and some observances on how to deal with that situation in a data platform architecture.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm sharing some thoughts and observances about abstractions and impedance mismatches from my experience building a data lakehouse with an ELT workflow

Interview

Introduction impact of community tech debt

hive metastore new work being done but not widely adopted

tensions between automation and correctness data type mapping

integer types complex types naming things (keys/column names from APIs to databases)

disaggregated databases - pros and cons

flexibility and cost control not as much tooling invested vs. Snowflake/BigQuery/Redshift

data modeling

dimensional modeling vs. answering today's questions

What are the most interesting, unexpected, or challenging lessons that you have learned while working on your data platform? When is ELT the wrong choice? What do you have planned for the future of your data platform?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

dbt Airbyte

Podcast Episode

Dagster

Podcast Episode

Trino

Podcast Episode

ELT Data Lakehouse Snowflake BigQuery Redshift Technical Debt Hive Metastore AWS Glue

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack: Rudderstack

RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team.

RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again.

Visit dataengineeringpodcast.com/rudderstack to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.Support Data Engineering Podcast

I recap the Joe Reis + dbt roadshow in Denver (thanks to everyone who showed up) and discuss the divide between IT and "The Business."


If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Check out my substack: https://joereis.substack.com/

We talked about:

Arseny's background Working on machine learning in startups What is Machine Learning System Design? Constraints and requirements Known unknowns vs unknown unknowns (Design stage) Writing a design document Technical problems vs product-oriented problems The solution part of the Design Document What motivated Arseny to write a book on ML System Design Examples of a Design Document in the book The types of readers for ML System Design Working with the co-author Reacting to constraints and feedback when writing a book Arseny's favorite chapter of the book Other resources where you can learn about ML System Design Twitter Giveaway

Links:

Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter Discount: poddatatalks21 (35% off)

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

The Road to Exceptional Data Correctness

ABOUT THE TALK: In this lightning talk, Emma Tang shares learnings from Stripe’s early efforts to tackle data correctness. As a financial technology company, data correctness is paramount to the operation of the company. This low tolerance for data inaccuracy poses unique constraints to how infrastructure is designed. Emma shares strategies as well as the trade-offs made in order to achieve this high level of correctness.

ABOUT THE SPEAKER: Emma Tang led Big Data Infrastructure at Stripe helping the company build and scale data infrastructure systems to support the 14x revenue growth and 6x headcount growth during her time there.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

How to End the Long tail of Most Data Requests | Narrator

ABOUT THE TALK: Modern data stacks focus on the most common use-cases and dashboards, but what about all the ad-hoc requests that come? The current tool set fails to allow data analysts to iterate easily with stakeholders. In this talk, we will discuss that without an ad-hoc layer, data analysts are left to answer questions with hacky live SQL or have every request go through the resource-intensive and expensive production processes and workflows.

An ad-hoc layer solves this by allowing data analysts to answer data questions, change their mind, and deliver data dumps or simple analyses incredibly fast and reliably. Allowing them to prioritize putting it into production only if it needs to be reused.

ABOUT THE SPEAKER: Ahmed Elsamadisi is the founder and CEO of Narrator. Narrator enables companies to make better decisions by providing them with the ability to answer any question in under 10 minutes. Ahmed started his career building algorithms for self-driving cars and human-robot interaction. He then joined Raytheon to develop AI algorithms for missile defense, focusing on tracking and discrimination. In 2015, Ahmed joined WeWork as the first hire on their data team. He built their data engineering infrastructure and grew the team of data engineers and analysts.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

What I Don't Want to Exist in the Data World in 5 Years | Seattle Data Guy

ABOUT THE TALK: Whether consulting or working as an employee there are certain tools, patterns and practices many of us would like to disappear in the next few years. Many of them delay projects, frustrate data engineers and yet we continue to rely on them. Whether it be transferring data via SFTP or joining teams without coding standards, some companies, even those that may be considered cutting edge, still have these patterns.

In this talk Ben Rogojan explores some of these tools, patterns and practices as well as why he hopes he doesn’t see them around in a few years.

ABOUT THE SPEAKER: Ben Rogojan has spent his career focused on helping companies develop end-to-end data solutions that are simple and maintainable. He has worked in various industries such as healthcare, finance, and e-commerce. In addition, he has worked for companies including Facebook as a data engineer. Using his broad experiences he has helped companies develop, improve, modernize, and migrate their data infrastructure.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Data Contracts in the Modern Data Stack  | Whatnot

ABOUT THE TALK: After two years, three rounds of funding, and hundreds of new employees — Whatnot’s modern data stack has come from not existing to processing tens of millions of events across hundreds of different event types each day.

How does their small (but mighty!) team keep up? This talk explores data contracts — it covers the use of Interface Definition Language (Protobuf) to serve as the source of truth for event definitions, govern event construction in production, automatically generate DBT models in the data warehouse.

ABOUT THE SPEAKER: Zack Klein is a software engineer at Whatnot, where he thoroughly enjoys building data products and narrowly avoiding breaking production each day. Previously, he worked on big data platforms at Blackstone and HBO.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Incident Management for Data People | Bigeye

ABOUT THE TALK: Incident management is a key practice used by DevOps and SRE teams to keep software reliable—but it's still uncommon among data teams! Datadog says incident management can "streamline their response procedures, reducing mean time to repair (MTTR) and minimizing any impact on end users."

In this talk, Kyle Kirwan, co-founder of data observability company Bigeye, will explain the basics of incident management and how data teams can use it to reduce disruptions to analytics and machine learning applications.

ABOUT THE SPEAKER: Kyle Kirwan is the co-founder and CEO of Bigeye. He began his career as a data scientist, went on to lead the development of Uber's internal data catalog/lineage/quality tools, and now helps data teams use data observability to improve pipeline reliability and data quality.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Automatically Fix Data Issues & Label Errors in Most ML Datasets | Cleanlab

ABOUT THE TALK: In this talk, we discuss cleanlab open-source (github.com/cleanlab/cleanlab) and Cleanlab Studio (https://cleanlab.ai/studio). Cleanlab open-source is a fast-growing python framework for data-centric AI that automatically detects issues in ML datasets. Cleanlab Studio is a no-code web interface used by universities and fortune 500 companies for dataset issue detection and fixing. Cleanlab algorithms have theoretical support for improved accuracy on real-world, messy data.

ABOUT THE SPEAKER: Curtis Northcutt is an American computer scientist and entrepreneur focusing on machine learning and AI to empower people. He is the CEO and co-founder of Cleanlab, an AI software company that improves machine learning model performance by automatically fixing data and label issues in real-world, messy datasets. Curtis completed his PhD at MIT where he invented Cleanlab’s algorithms for automatically finding and fixing label issues in any dataset.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Continuous Data Pipeline for Real time Benchmarking & Data Set Augmentation | Teleskope

ABOUT THE TALK: Building and curating representative datasets is crucial for accurate ML systems. Monitoring metrics post-deployment helps improve the model. Unstructured language models may face data shifts, leading to unpredictable inferences. Open-source APIs and annotation tools streamline annotation and reduce analyst workload.

This talk discusses generating datasets and real-time precision/recall splits to detect data shifts, prioritize data collection, and retrain models.

ABOUT THE SPEAKER: Ivan Aguilar is a data scientist at Teleskope focused on building scalable models for detecting PII/PHI/Secrets and other compliance related entities within customers' clouds. Prior to joining Teleskope, Ivan was a ML Engineer at Forge.AI, a Boston based shop working on information extraction, content extraction, and other NLP related tasks.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Data Product Success: Aligning with Data's Core Purpose | Entera

ABOUT THE TALK: During this talk, we'll make the argument that by aligning your product with data's core purpose, you increase adoption of your product and accelerate growth.

We'll propose a framework for Data Product Management that ensures this vital alignment is consistently held while catalyzing development and shortening time-to-outcome.

Along the way, we will show how to best structure your company's data org based on your current stage of growth in pursuit of improving the delivery of data products and enhancing outcomes for customers/end-users.

ABOUT THE SPEAKER: Ricky Saporta is passionate how people learn to make great decisions. A builder of data teams, Ricky is currently serving as SVP of Data at Entera. He spent the prior four years at The Farmer's Dog as Head of Data Strategy.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

ML in Production: What Does Production Even Mean | Dagshub

ABOUT THE TALK: While giving a talk to a group of up-and-coming data scientists, a question that surprised Dean Pleban was: "When you say “production”, what exactly do you mean?"

In this talk, Dean defines what production actually means. I’ll present a first-principles, step-by-step approach to thinking about deploying a model to production. He will talk about challenges you might face in each step, and provides further reading if you want to dive deeper into each one.

ABOUT THE SPEAKER: Dean Pleban has a background combining physics and computer science. He’s worked on quantum optics and communication, computer vision, software development and design. He’s currently CEO at DagsHub, where he builds products that enable data scientists to work together and get their models to production, using popular open source tools. He’s also the host of the MLOps Podcast, where he speaks with industry experts about ML in production.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Data Products Aren't Just for Data Teams! Lightdash

ABOUT THE TALK: Building data tools requires us to not only think about the data team, but also about the people that the data team is serving: business users, or "non-data team people".

This talk will go over how it's super important to consider these two personas when building data tools, but it can also be a bit complicated. We will talk through a few principles we can use to build data products that are great for everyone (not just the data team!)

ABOUT THE SPEAKER: As a product manager with a background in data science, Katie Hindson loves building data products. Currently, she's working at Lightdash, an open-source BI tool that instantly turns your dbt project into a full-stack BI platform. Katie is really interested in the interaction between data teams, their tools, and the rest of the company - because the best data teams are the ones that can help everyone at the company make better decisions, faster.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/