Data Collection

#259 Getting the Data For Your Data-Driven Decisions with Jonathan Bloch & Scott Voigt

2024-11-07 · DataFramed Listen

podcast_episode

by Scott Voigt (Fullstory) , Jonathan Bloch (Exchange Data International (EDI)) , Richie (DataCamp)

AI/ML Data Management IBM Marketing

We’re improving DataFramed, and we need your help! We want to hear what you have to say about the show, and how we can make it more enjoyable for you—find out more here. Understanding where the data you use comes from, how to use it responsibly, and how to maximize its value has become essential. But as data sources multiply, so do the complexities around data privacy, customization, and ownership. How can companies capture and leverage the right data to create meaningful customer experiences while respecting privacy? And as data drives more personalized interactions, what steps can businesses take to protect sensitive information and navigate the increasingly complex regulatory picture? Jonathan Bloch is CEO at Exchange Data International (EDI) and a seasoned businessman with 40 years experience in information provision. He started work in the newsletter industry and ran the US subsidiary of a UK public company before joining its main board as head of its publishing division. He has been a director and/or chair of several companies and is currently a non executive director of an FCA registered investment bank. In 1994 he founded Exchange Data International (EDI) a London based financial data provider. EDI now has over 450 clients across three continents and is based in the UK, USA, India and Morocco employing 500 people. Scott Voigt is CEO and co-founder at Fullstory. Scott has enjoyed helping early-stage software businesses grow since the mid 90s, when he helped launch and take public nFront—one of the world's first Internet banking service providers. Prior to co-founding Fullstory, Voigt led marketing at Silverpop before the company was acquired by IBM. Previously, he worked at Noro-Moseley Partners, the Southeast's largest Venture firm, and also served as COO at Innuvo, which was acquired by Google. Scott teamed up with two former Innuvo colleagues, and the group developed the earliest iterations of Fullstory to understand how an existing product was performing. It was quickly apparent that this new platform provided the greatest value—and the rest is history. In the episode, Richie, Jonathan and Scott explore first-party vs third-party data, protecting corporate data, behavioral data, personalization, data sourcing strategies, platforms for storage and sourcing, data privacy, synthetic data, regulations and compliance, the future of data collection and storage, and much more. Links Mentioned in the Show: FullstoryExchange Data InternationalConnect with Jonathan and ScottCourse: Understanding GDPRRelated Episode: How Data and AI are Changing Data Management with Jamie Lerner, CEO, President, and Chairman at QuantumSign up to RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile...

Using Data to Create Liveable Cities - Rachel Lim

2024-11-01 · DataTalks.Club Listen

podcast_episode

by Rachel Lim

AI/ML Data Engineering Data Science GenAI HTML SQL

We talked about:

00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science

About the speaker:

Rachel is an urban data scientist dedicated to creating liveable cities through the innovative use of data. With a background in geography, and a masters in urban data science, she blends qualitative and quantitative analysis to tackle urban challenges. Her aim is to integrate data driven techniques with urban design to foster sustainable and equitable urban environments.

Links: - https://datamall.lta.gov.sg/content/datamall/en/dynamic-data.html

00:00 DataTalks.Club intro 01:56 Using data to create livable cities 02:52 Rachel's career journey: from geography to urban data science 04:20 What does a transport scientist do? 05:34 Short-term and long-term transportation planning 06:14 Data sources for transportation planning in Singapore 08:38 Rachel's motivation for combining geography and data science 10:19 Urban design and its connection to geography 13:12 Defining a livable city 15:30 Livability of Singapore and urban planning 18:24 Role of data science in urban and transportation planning 20:31 Predicting travel patterns for future transportation needs 22:02 Data collection and processing in transportation systems 24:02 Use of real-time data for traffic management 27:06 Incorporating generative AI into data engineering 30:09 Data analysis for transportation policies 33:19 Technologies used in text-to-SQL projects 36:12 Handling large datasets and transportation data in Singapore 42:17 Generative AI applications beyond text-to-SQL 45:26 Publishing public data and maintaining privacy 45:52 Recommended datasets and projects for data engineering beginners 49:16 Recommended resources for learning urban data science

Join our slack: https: //datatalks.club/slack.html

Book Review: The Privacy Engineer’s Manifesto

2024-10-05 · Deep Dive into Data Privacy Listen

podcast_episode

Ever feel like you're clicking "agree" online without knowing what's happening behind the scenes? We break down the Privacy Engineer's Manifesto and how it aims to build real-world data protection into the systems we use everyday. From minimizing data collection to respecting user control, we explore what's needed to make privacy a reality, not just a promise.

Data Equality: Building an Inclusive Future

2024-09-19 · Big Data LDN 2024

Face To Face

by Reema Vadoliya

AI/ML

Join Reema Vadoliya as she explores the transformative potential of inclusive data practices in shaping a more equitable future. Reema delves into the challenges faced across the data industry and society, drawing from her personal experiences and insights. Through practical examples and case studies, she demonstrates how challenging bias in AI begins with fostering inclusivity and representation in data collection. By envisioning a future where data is crafted with inclusivity in mind, Reema inspires participants to embark on a journey towards building a more ethical and inclusive AI ecosystem. Key Takeaways: - Empowering Data Practices: Reema highlights the transformative potential of inclusive data practices, empowering organisations to challenge bias in AI through prioritising inclusivity and consent in data collection - Insightful Data Insights: Reema demonstrates how inclusive data practices lead to impactful insights, showing attendees how embracing diversity in data collection results in higher response rates and deeper audience understanding. - Vision for Ethical AI: Reema inspires attendees to envision a future where data is crafted with inclusivity, fostering fairness and transparency in data-driven decision-making to drive towards an ethical and equitable AI ecosystem.

Is Your Data Office Ready To Mitigate Data Risk?

2024-09-19 · Big Data LDN 2024

Face To Face

by Anusha Adige (EY)

Analytics LLM

In the last decade data has served as a guide to learn from the past, make decisions in the present and the drive insights for the future. The Art of possible that ChatGPT demonstrated in 2023 Channeled investments towards improving data capabilities. Peer competition, emergence of challenger organisations, advance analytics has increased customer expectation and exerted increased pressure on data analysis and exploration .

These increased expectations has translated into new way of working with data and has demanded teams to be more data driven. This has resulted in emergence of data risk. No matter the expectation there is always a boundary on what data can deliver and cannot deliver. This boundary is directly related to the original intent of data collection and organisational data policies, risk policies and risk appetite. As all part of the organisation touch data and it has become increasingly challenging to mitigate data risks. Acknowledging this major Banks have elevated data risk to Principle risk. This has allowed data office to have more control on how data is being used and accessed within an organisation and most importantly embed business accountability for data as required by most regulations such as BCBS239, GDRP expect.

In this 30 minutes we will explore

What is Data Risk?
How to identify Data Risk and design Data Risk Taxonomy?
Who are the key stakeholders within an organisation responsible to mitigating data risk?
How to design risk appetite for Data Risk?
Explore how key data risk controls should look like?

Unlock the Secrets of Data-Driven Strategies That Drive Profit Growth

2024-09-18 · Big Data LDN 2024

Face To Face

by Barry Panayi (John Lewis Partnership) , Dr. Simon Jury , Emma Duckworth , Claire Williams (Capgemini) , Steph Bell

AI/ML Analytics Big Data

Join us as we unlock the secrets of data-driven strategies that drive profit, loyalty, and hyper-personalised experiences, with Capgemini and a Women in Data leadership panel.

At this year’s Big Data London, Women in Data & Capgemini are back with another must-see panel, featuring a diverse and engaging group of female data leaders and their allies from across the Retail & CPG worlds. Last year’s session was one of the most oversubscribed events of the day, with standing room only, thanks to its thought-provoking and honest discussions. This year’s panel promises the same dynamic as they tackle the conundrum of balancing margin focus with rewarding customer loyalty and how data plays a key role.

The panellists, as well as sharing their own career journeys and experience, will explore how they’ve approached bold strategies that move beyond immediate profits to emphasise the long-term value of customer data and loyalty. They’ll explore how data, analytics & ai can uncover deep insights into customer behaviours and preferences, enabling brands to create personalised experiences and loyalty programs that boost engagement and build lasting trust.

The discussion will highlight the importance of seeing customer data as a strategic asset. By investing in data collection and analysis, companies can identify trends, predict future behaviours, and tailor their offerings to meet evolving customer needs. This approach can drive repeat business and increase customer lifetime value, ultimately leading to higher margins over time.

This year’s panel will explore the how data and boldness are key for a balanced strategy that blends margin management with a robust focus on customer loyalty. Using data smartly is key to achieving sustainable profit growth and strengthening brand loyalty. Don’t miss out on what promises to be an inspiring and insightful discussion!

Mastering CSRD Compliance: CIO Strategies for Effective ESG Data Management and Double Materiality Assessment

2024-09-18 · Big Data LDN 2024

Face To Face

by Errol Rodericks

Data Governance Data Management

In the wake of the Corporate Sustainability Reporting Directive (CSRD), organisations are tasked with enhancing their ESG data management and reporting frameworks. This presentation will guide data leaders through the complexities of CSRD compliance, focusing on the pivotal role of effective ESG data governance. We will explore the three core challenges:

1. Double Materiality Assessments: Evaluating both financial and impact materiality to align with regulatory expectations.

2. Comprehensive ESG Data Collection: Sourcing data across the entire business ecosystem to ensure comprehensive reporting.

3. External Assurance: Ensuring third-party certification of ESG reports, highlighting the importance of data accuracy and governance.

Attendees will gain practical insights into implementing a logical data management approach that addresses these challenges, while also aligning ESG data practices with broader enterprise data strategies.

#253: Adopting a Just In Time, Just Enough Data Mindset with Matt Gershoff

2024-09-03 · The Analytics Power Hour Listen

podcast_episode

by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Matt Gershoff (Conductrics, New York - USA) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

While we don't often call it out explicitly, the driving force behind much of what and how much data we collect is driven by a "just in case" mentality: we don't know exactly HOW that next piece of data will be put to use, but we better collect it to minimize the potential for future regret about NOT collecting it. Data collection is an optionality play—we strive to capture "all the data" so that we have as many potential options as possible for how it gets crunched somewhere down the road. On this episode, we explored the many ways this deeply ingrained and longstanding mindset is problematic, and we were joined by the inimitable Matt Gershoff from Conductrics for the discussion! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Taming AI, or how we build the alignment pipeline

2024-07-25 · MLOps.community Berlin - Summer Edition ☀️☀️

talk

AI/ML LLM

In this presentation, we will explore into the key aspects of aligning Large Language Models (LLMs) and explore how to set up the necessary infrastructure to maintain a versatile alignment pipeline. Specifically, we will cover: Incorporating LLMs into the data collection for supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to maximize efficiency. Techniques for instilling desired behaviors in LLMs with the use of prompt tuning. A cutting-edge workflow management approach, and how it facilitates rapid prototyping of highly-intensive distributed training procedures. This session is tailored for machine learning engineers who are deploying their LLMs and seeking to improve their models.

Activating operational metadata with Airflow, Atlan and OpenLineage

2024-07-01 · Airflow Summit 2024

session

by Kacper Muda

Airflow AWS Azure dbt GCP Python Spark SQL

OpenLineage is an open standard for lineage data collection, integrated into the Airflow codebase, facilitating lineage collection across providers like Google, Amazon, and more. Atlan Data Catalog is a 3rd generation active metadata platform that is a single source of trust unifying cataloging, data discovery, lineage, and governance experience. We will demonstrate what OpenLineage is and how, with minimal and intuitive setup across Airlfow and Atlan, it presents unified workflows view, efficient cross-platform lineage collection, including column level, in various technologies (Python, Spark, dbt, SQL etc.) and clouds (AWS, Azure, GCP, etc.) - all orchestrated by Airflow. This integration enables further use case unlocks on automated metadata management by making the operational pipelines dataset-aware for self-service exploration. It also will demonstrate real world challenges and resolutions for lineage consumers in improving audit and compliance accuracy through column-level lineage traceability across the data estate. The talk will also briefly overview the most recent OpenLineage developments and planned future enhancements.

The Decision Maker's Handbook to Data Science: AI and Data Science for Non-Technical Executives, Managers, and Founders

2024-06-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Stylianos Kampakis

AI/ML Data Science LLM data data-science

Data science is expanding across industries at a rapid pace, and the companies first to adopt best practices will gain a significant advantage. To reap the benefits, decision makers need to have a confident understanding of data science and its application in their organization. This third edition delves into the latest advancements in AI, particularly focusing on large language models (LLMs), with clear distinctions made between AI and traditional data science, including AI's ability to emulate human decision-making. Author Stylianos Kampakis introduces you to the critical aspect of ethics in AI, an area of growing importance and scrutiny. The narrative examines the ethical considerations intrinsic to the development and deployment of AI technologies, including bias, fairness, transparency, and accountability. You’ll be provided with the expertise and tools required to develop a solid data strategy that is continuously effective. Ethics and legal issues surrounding data collection and algorithmic bias are some common pitfalls that Kampakis helps you avoid, while guiding you on the path to build a thriving data science culture at your organization. This updated edition also includes plenty of case studies, tools for project assessment, and expanded content for hiring and managing data scientists. Data science is a language that everyone at a modern company should understand across departments. Friction in communication arises most often when management does not connect with what a data scientist is doing or how impactful data collection and storage can be for their organization. The Decision Maker’s Handbook to Data Science bridges this gap and readies you for both the present and future of your workplace in this engaging, comprehensive guide. What You Will Learn Integrate AI with other innovative technologies Explore anticipated ethical, regulatory, and technical landscapes that will shape the future of AI and data science Discover how to hire and manage data scientists Build the right environment in order to make your organization data-driven Who This Book Is For Startup founders, product managers, higher level managers, and any other non-technical decision makers who are thinking to implement data science in their organization and hire data scientists. A secondary audience includes people looking for a soft introduction into the subject of data science.

Mastering Marketing Data Science

2024-04-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Iain Brown

AI/ML Analytics Data Science GenAI Marketing NLP Python SAS data data-science

Unlock the Power of Data: Transform Your Marketing Strategies with Data Science In the digital age, understanding the symbiosis between marketing and data science is not just an advantage; it's a necessity. In Mastering Marketing Data Science: A Comprehensive Guide for Today's Marketers, Dr. Iain Brown, a leading expert in data science and marketing analytics, offers a comprehensive journey through the cutting-edge methodologies and applications that are defining the future of marketing. This book bridges the gap between theoretical data science concepts and their practical applications in marketing, providing readers with the tools and insights needed to elevate their strategies in a data-driven world. Whether you're a master's student, a marketing professional, or a data scientist keen on applying your skills in a marketing context, this guide will empower you with a deep understanding of marketing data science principles and the competence to apply these principles effectively. Comprehensive Coverage: From data collection to predictive analytics, NLP, and beyond, explore every facet of marketing data science. Practical Applications: Engage with real-world examples, hands-on exercises in both Python & SAS, and actionable insights to apply in your marketing campaigns. Expert Guidance: Benefit from Dr. Iain Brown's decade of experience as he shares cutting-edge techniques and ethical considerations in marketing data science. Future-Ready Skills: Learn about the latest advancements, including generative AI, to stay ahead in the rapidly evolving marketing landscape. Accessible Learning: Tailored for both beginners and seasoned professionals, this book ensures a smooth learning curve with a clear, engaging narrative. Mastering Marketing Data Science is designed as a comprehensive how-to guide, weaving together theory and practice to offer a dynamic, workbook-style learning experience. Dr. Brown's voice and expertise guide you through the complexities of marketing data science, making sophisticated concepts accessible and actionable.

Build Your Second Brain One Piece At A Time

2024-04-28 · Data Engineering Podcast Listen

podcast_episode

by Tsavo Knott (Pieces) , Tobias Macey

AI/ML Analytics Cloud Computing Dagster Data Engineering Data Lake Data Lakehouse Delta GenAI Hudi Iceberg Python +3 more

Summary Generative AI promises to accelerate the productivity of human collaborators. Currently the primary way of working with these tools is through a conversational prompt, which is often cumbersome and unwieldy. In order to simplify the integration of AI capabilities into developer workflows Tsavo Knott helped create Pieces, a powerful collection of tools that complements the tools that developers already use. In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementDagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Tsavo Knott about Pieces, a personal AI toolkit to improve the efficiency of developersInterview IntroductionHow did you get involved in machine learning?Can you describe what Pieces is and the story behind it?The past few months have seen an endless series of personalized AI tools launched. What are the features and focus of Pieces that might encourage someone to use it over the alternatives?model selectionsarchitecture of Pieces applicationlocal vs. hybrid vs. online modelsmodel update/delivery processdata preparation/serving for models in context of Pieces appapplication of AI to developer workflowstypes of workflows that people are building with piecesWhat are the most interesting, innovative, or unexpected ways that you have seen Pieces used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Pieces?When is Pieces the wrong choice?What do you have planned for the future of Pieces?Contact Info LinkedInParting Question From your perspective, what is the biggest barrier to adoption of machine learning today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story.Links PiecesNPU == Neural Processing UnitTensor ChipLoRA == Low Rank AdaptationGenerative Adversarial NetworksMistralEmacsVimNeoVimDartFlutte

What it Takes to Build Your Own Analytics Solution

2024-04-10 · Data Universe 2024

Face To Face

by Kishore Banala

Analytics

Join Netflix' Kishore Banala for a deep-dive into strategies for building custom analytics solutions. From data collection approaches, to the nuances of real-time vs batch data movement, to leveraging columnar data stores tailored for analytical use cases, Kishore will provide practical examples and case studies such as identifying unique site visitors. We'll conclude with a look at the art of visualizing data with open-source tools that empower you to create compelling visualizations. Come along for insights and suggestions drawn from practical experience in building large scale analytics solutions.

Healthcare Big Data Analytics

2024-03-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Rutvij H. Jhaveri , Victor, de Albuquerque , Akash Kumar Bhoi , Ranjit Panigrahi

Analytics Big Data Data Analytics IoT data data-science healthcare-analytics

This book highlights how optimized big data applications can be used for patient monitoring and clinical diagnosis. In fact, IoT-based applications are data-driven and mostly employ modern optimization techniques. The book also explores challenges, opportunities, and future research directions, discussing the stages of data collection and pre-processing, as well as the associated challenges and issues in data handling and setup.

#174 The Future of Marketing Analytics with Cory Munchbach, CEO at BlueConic

2024-01-18 · DataFramed Listen

podcast_episode

by Richie (DataCamp) , Cory Munchbach (BlueConic)

Analytics GDPR/CCPA Marketing

Cookies were invented to help online shoppers, simply as an identifier so that online carts weren’t lost to the ether. Marketers quickly saw the power of using cookies for more than just maintaining session states, and moved to use them as part of their targeted advertising. Before we knew it, our online habits were being tracked, without our clear consent. The unregulated cookie-boom lasted until 2018 with the advent of GDPR and the CCPA. Since then marketers have been evolving their practices, looking for alternatives to cookie-tracking that will perform comparatively, and with the cookie being phased out in 2024, technologies like fingerprinting and new privacy-centric marketing strategies will play a huge role in how products meet users in the future. Cory Munchbach has spent her career on the cutting edge of marketing technology and brings years working with Fortune 500 clients from various industries to BlueConic. Prior to BluConic, she was an analyst at Forrester Research where she covered business and consumer technology trends and the fast-moving marketing tech landscape. A sought-after speaker and industry voice, Cory’s work has been featured in Financial Times, Forbes, Raconteur, AdExchanger, The Drum, Venture Beat, Wired, AdAge, and Adweek. A life-long Bostonian, Cory has a bachelor’s degree in political science from Boston College and spends a considerable amount of her non-work hours on various volunteer and philanthropic initiatives in the greater Boston community. In the episode, Richie and Cory cover successful marketing strategies and their use of data, the types of data used in marketing, how data is leveraged during different stages of the customer life cycle, the impact of privacy laws on data collection and marketing strategies, tips on how to use customer data while protecting privacy and adhering to regulations, the importance of data skills in marketing, the future of marketing analytics and much more. Links Mentioned in the Show: BlueConicMattel CreationsGoogle: Prepare for third-party cookie restrictionsData Clean Rooms[Course] Marketing Analytics for Business

Using Data To Illuminate The Intentionally Opaque Insurance Industry

2023-10-09 · Data Engineering Podcast Listen

podcast_episode

by Max Cho (CoverageCat) , Tobias Macey

AI/ML Analytics BI CI/CD Cloud Computing Data Engineering Data Management Data Quality Data Science Datafold dbt Modern Data Stack +4 more

Summary

The insurance industry is notoriously opaque and hard to navigate. Max Cho found that fact frustrating enough that he decided to build a business of making policy selection more navigable. In this episode he shares his journey of data collection and analysis and the challenges of automating an intentionally manual industry.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Max Cho about the wild world of insurance companies and the challenges of collecting quality data for this opaque industry

Interview

Introduction How did you get involved in the area of data management? Can you describe what CoverageCat is and the story behind it? What are the different sources of data that you work with?

What are the most challenging aspects of collecting that data? Can you describe the formats and characteristics (3 Vs) of that data?

What are some of the ways that the operational model of insurance companies have contributed to its opacity as an industry from a data perspective? Can you describe how you have architected your data platform?

How have the design and goals changed since you first started working on it? What are you optimizing for in your selection and implementation process?

What are the sharp edges/weak points that you worry about in your existing data flows?

How do you guard against those flaws in your day-to-day operations?

What are the

Data Engineering and Data Science

2023-09-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vinay Jha Pillai , Niranjanamurthy M. , Kukatlapalli Pradeep Kumar , Hari Murthy , Aynur Unal

AI/ML Data Engineering Data Science data data-science

DATA ENGINEERING and DATA SCIENCE Written and edited by one of the most prolific and well-known experts in the field and his team, this exciting new volume is the “one-stop shop” for the concepts and applications of data science and engineering for data scientists across many industries. The field of data science is incredibly broad, encompassing everything from cleaning data to deploying predictive models. However, it is rare for any single data scientist to be working across the spectrum day to day. Data scientists usually focus on a few areas and are complemented by a team of other scientists and analysts. Data engineering is also a broad field, but any individual data engineer doesn’t need to know the whole spectrum of skills. Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. In this exciting new volume, the team of editors and contributors sketch the broad outlines of data engineering, then walk through more specific descriptions that illustrate specific data engineering roles. Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This book brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must-have for any library.

Learning Data Science

2023-09-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Sam Lau , Joseph Gonzalez , Deborah Nolan

Data Science Pandas Python data data-science

As an aspiring data scientist, you appreciate why organizations rely on data for important decisions—whether it's for companies designing websites, cities deciding how to improve services, or scientists discovering how to stop the spread of disease. And you want the skills required to distill a messy pile of data into actionable insights. We call this the data science lifecycle: the process of collecting, wrangling, analyzing, and drawing conclusions from data. Learning Data Science is the first book to cover foundational skills in both programming and statistics that encompass this entire lifecycle. It's aimed at those who wish to become data scientists or who already work with data scientists, and at data analysts who wish to cross the "technical/nontechnical" divide. If you have a basic knowledge of Python programming, you'll learn how to work with data using industry-standard tools like pandas. Refine a question of interest to one that can be studied with data Pursue data collection that may involve text processing, web scraping, etc. Glean valuable insights about data through data cleaning, exploration, and visualization Learn how to use modeling to describe the data Generalize findings beyond the data

#143 Fighting the Climate Crisis with Data

2023-06-26 · DataFramed Listen

podcast_episode

by Jean-Pierre Pélicier (ENGIE)

Analytics

Every year we become increasingly aware of the urgency of the climate crisis, and with that, the need to usher in renewable energies and scale their adoption has never been more important. However, as we look at the ways to scale the adoption of renewable energy, data stands out as a key lever to accelerate a greener future. Today’s guest is Jean-Pierre Pélicier, CDO at ENGIE. ENGIE is one of the largest energy producers in the world and definitely one of the largest in Europe. They operate in more than 48 countries and have committed to becoming carbon neutral by 2045. Data plays a crucial part in these plans. In the episode, Jean-Pierre shares his unique perspective on how data is not just transforming the renewable energy industry but also redefining the way we approach the climate crisis. From harnessing the power of data to optimize energy production and distribution to leveraging advanced analytics to predict and mitigate environmental impacts, Jean-Pierre highlights the ways data continues to be an invaluable tool in our quest for a sustainable future. Also discussed in the episode are the challenges of data collection and quality in the energy sector, the importance of fostering a data culture within an organization, and aligning data strategy with a company's strategic objectives.

talk-data.com

Activity Trend

Top Events

Top Speakers

#259 Getting the Data For Your Data-Driven Decisions with Jonathan Bloch & Scott Voigt

Using Data to Create Liveable Cities - Rachel Lim

Book Review: The Privacy Engineer’s Manifesto

Data Equality: Building an Inclusive Future

Is Your Data Office Ready To Mitigate Data Risk?

Unlock the Secrets of Data-Driven Strategies That Drive Profit Growth

Mastering CSRD Compliance: CIO Strategies for Effective ESG Data Management and Double Materiality Assessment

#253: Adopting a Just In Time, Just Enough Data Mindset with Matt Gershoff

Taming AI, or how we build the alignment pipeline

Activating operational metadata with Airflow, Atlan and OpenLineage

The Decision Maker's Handbook to Data Science: AI and Data Science for Non-Technical Executives, Managers, and Founders

Mastering Marketing Data Science

Build Your Second Brain One Piece At A Time

What it Takes to Build Your Own Analytics Solution

Healthcare Big Data Analytics

#174 The Future of Marketing Analytics with Cory Munchbach, CEO at BlueConic

Using Data To Illuminate The Intentionally Opaque Insurance Industry

Data Engineering and Data Science

Learning Data Science

#143 Fighting the Climate Crisis with Data