Analytics

Banner Inflation, Banking Braves Out

2023-08-11 · Moody's Talks - Inside Economics Listen

podcast_episode

by Sebnem Kalemli-Ozcan (University of Maryland) , Cris deRitis , Bernard Yaros (Moody's Analytics) , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

The Inside Economics team dissects the July report on consumer price inflation and concludes that inflation is on track to be back to the Fed’s inflation target by this time next year. Well, OK, Cris thought more likely the end of next year. The discussion then turned to modest fallout from the banking crisis earlier this year (at least so far) and Fed policy with University of Maryland economics professor Sebnem Kalemli-Ozcan. For more from Sebnem Kalemli-Ozcan, click here For the full transcript, click here Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Ramp's $8 Billion Data Strategy (W/ Ian Macomber and Ryan Delgado)

2023-08-11 · The Analytics Engineering Podcast Listen

podcast_episode

by Tristan Handy (dbt Labs) , Ian Macomber (Ramp) , Julia Schottenstein (dbt labs) , Ryan Delgado (Ramp)

Analytics Engineering Data Engineering Data Science dbt

Ian Macomber, head of analytics engineering and data science at Ramp and formerly the VP of analytics and data engineering at Drizly, and Ryan Delgado, a staff software engineer at Ramp, have played pivotal roles in establishing Ramp's data team from the ground up and are spearheading the development of their comprehensive roadmap. In this conversation with Tristan and Julia, Ian and Ryan share insights on how Ramp's data team transformed unstructured data from contracts into valuable insights to enable faster decision-making. The $8 billion company values speed and empowers teams to build, ship, and measure products quickly. Ian and Ryan also talked about their approach to adopting new tech and elevating data as an equal player alongside product engineering and design. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

70: He’s Hired 30 Data Analysts; Here’s What You Should Know

2023-08-09 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith , Jesse Morris

AI/ML Data Analytics

Join me and the man who has interviewed 300 data analysts, Jesse Morris, in this episode as we discuss what it is like to get hired as a data analyst.

Throughout the episode, Jesse imparts golden nuggets of wisdom, shedding light on what employers seek in prospective candidates, why being a data analyst is awesome, and what tools you should use along the way.

Tune in now! 🎧

🤝Connect with Jesse Morris

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(05:53) - The Data Analyst Hiring Process 💎

(17:48) - Attitude, passion, & communication > Technical Skills 😎

(25:03) - Things you can do to stand out in the job hung 📈

(31:44) - Volunteering for a non-profit can help you land a job 🤝

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

#149 Expanding the Scope of Generative AI in the Enterprise with Bal Heroor, CEO and Principal at Mactores

2023-08-07 · DataFramed Listen

podcast_episode

by Bal Heroor (Mactores) , Richie (DataCamp)

AI/ML Data Analytics Data Governance GenAI LLM

Generative AI is here to stay—even in the 8 months since the public release of ChatGPT, there are an abundance of AI tools to help make us more productive at work and ease the stress of planning and execution of our daily lives among other things. Already, many of us are wondering what is to come in the next 8 months, the next year, and the next decade of AI’s evolution. In the grand scheme of things, this really is just the beginning. But what should we expect in this Cambrian explosion of technology? What are the use cases being developed behind the scenes? What do we need to be mindful of when training the next generations of AI? Can we combine multiple LLMs to get better results? Bal Heroor is CEO and Principal at Mactores and has led over 150 business transformations driven by analytics and cutting-edge technology. His team at Mactores are researching and building AI, AR/VR, and Quantum computing solutions for business to gain a competitive advantage. Bal is also the Co-Founder of Aedeon—the first hyper-scale Marketplace for Data Analytics and AI talent. In the episode, Richie and Bal explore common use cases for generative AI, how it's evolving to solve enterprise problems, challenges of data governance and the importance of explainable AI, the challenges of tracking the lineage of AI and data in large organizations. Bal also touches on the shift from general-purpose generative AI models to more specialized models, fascinating use cases in the manufacturing industry, what to consider when adopting AI solutions in business, and much more. Links mentioned in the show: PulsarTrifactaAWS Clarify[Course] Introduction to ChatGPT[Course] Implementing AI Solutions in Business[Course] Generative AI Concepts

Quantifying The Return On Investment For Your Data Team

2023-08-06 · Data Engineering Podcast Listen

podcast_episode

by Barr Moses (Monte Carlo) , Anna Filippova (dbt Labs) , Tobias Macey

AI/ML Data Engineering Data Management dbt GenAI Modern Data Stack Monte Carlo Python SaaS Snowflake SQL

Summary

As businesses increasingly invest in technology and talent focused on data engineering and analytics, they want to know whether they are benefiting. So how do you calculate the return on investment for data? In this episode Barr Moses and Anna Filippova explore that question and provide useful exercises to start answering that in your company.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Barr Moses and Anna Filippova about how and whether to measure the ROI of your data team

Interview

Introduction How did you get involved in the area of data management? What are the typical motivations for measuring and tracking the ROI for a data team?

Who is responsible for collecting that information? How is that information used and by whom?

What are some of the downsides/risks of tracking this metric? (law of unintended consequences) What are the inputs to the number that constitutes the "investment"? infrastructure, payroll of employees on team, time spent working with other teams? What are the aspects of data work and its impact on the business that complicate a calculation of the "return" that is generated? How should teams think about measuring data team ROI? What are some concrete ROI metrics data teams can use?

What level of detail is useful? What dimensions should be used for segmenting the calculations?

How can visibility into this ROI metric be best used to inform the priorities and project scopes of the team? With so many tools in the modern data stack today, what is the role of technology in helping drive or measure this impact? How do your respective solutions, Monte Carlo and dbt, help teams measure and scale data value? With generative AI on the upswing of the hype cycle, what are the impacts that you see it having on data teams?

What are the unrealistic expectations that it will produce? How can it speed up time to delivery?

What are the most interesting, innovative, or unexpected ways that you have seen data team ROI calculated and/or used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on measuring the ROI of data teams? When is measuring ROI the wrong choice?

Contact Info

Barr

Anna

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Monte Carlo

Podcast Episode

dbt

Podcast Episode

JetBlue Snowflake Con Presentation Generative AI Large Language Models

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guessw

Near Perfect, New Probability

2023-08-04 · Moody's Talks - Inside Economics Listen

podcast_episode

by Dante DeAntonio (Moody's Analytics) , Cris deRitis , Mark Zandi (Moody's Analytics)

It’s jobs Friday, and Mark, Cris and Dante discuss the near perfect (Dante’s description) July employment report. Job growth remains strong, but it is moderating, and should help convince the Federal Reserve that its interest rates hikes are over. The group identified a few nits in the numbers, but the report was so good Cris lowered his odds the economy will suffer a recession in the coming year. For the full transcript, click here Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

69: Skills, Networking, Portfolio: How Brad Yarbro Landed a Data Job

2023-08-02 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith , Brad Yarbro (Protective Life)

AI/ML Data Analytics

In today's episode, I had the privilege of interviewing the incredible Brad Yarbro, a senior data scientist at Protective Life. 🎙️

His journey from an economics student to a data professional is beyond inspiring.

Listen and get inspired to kickstart your own data career! 📊💼

⁠🤝Connect with Brad

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(05:18) - 🌟 Gain Work Experience: Opportunities for Students!

(12:18) - 💼 From Analyst to Supply Chain Guru: A Journey

(16:37) - 📊 Data Pros Unite: Meet the Data and Business Teams

(22:06) - 📸 Quality Analysis with Cutting-Edge Tech in Defense

(29:12) - 💡 Data Career Advice: Networking Leads to Success

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

ANI / AGI / ASI

2023-08-02 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

by Tim , Kevin Missoorten , Berg

AI/ML

Send us a text Datatopics is a podcast presented by Kevin Missoorten to talk about the fuzzy and misunderstood concepts in the world of data, analytics, and AI and get to the bottom of things.

In this episode, together with guests Berg and Tim, we dive deep into the fascinating world of Artificial General Intelligence (AGI). The AI applications we see today are often referred to as “narrow AI” (ANI), basically excellent at performing one task and one task only. AGI or Strong AI, refers to the hypothetical intelligence of a machine that exhibits the ability to understand, learn, and apply knowledge in a way comparable to a human being. But how do we define “comparable to a human being”? Which human being? Are there other definitions? How do topics like alignment and singularity come into the picture? How can we contribute to making sure when we reach this state, we are able to harness this capability? Do we need a moratorium?

Tune in to DataTopics to hear our discussion on these topics and more!

Datatopics is brought to you by Dataroots Music: The Gentlemen - DivKidThe thumbnail is generated by Midjourney

Strategies For A Successful Data Platform Migration

2023-07-31 · Data Engineering Podcast Listen

podcast_episode

by Rob Goretsky , Gleb Mezhanskiy (Datafold) , Tobias Macey

AI/ML Airflow Amazon EMR BigQuery Dagster Data Engineering Data Management Data Science Datafold dbt ELK GitHub +9 more

Summary

All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Modern data teams are using Hex to 10x their data impact. Hex combines a notebook style UI with an interactive report builder. This allows data teams to both dive deep to find insights and then share their work in an easy-to-read format to the whole org. In Hex you can use SQL, Python, R, and no-code visualization together to explore, transform, and model data. Hex also has AI built directly into the workflow to help you generate, edit, explain and document your code. The best data teams in the world such as the ones at Notion, AngelList, and Anthropic use Hex for ad hoc investigations, creating machine learning models, and building operational dashboards for the rest of their company. Hex makes it easy for data analysts and data scientists to collaborate together and produce work that has an impact. Make your data team unstoppable with Hex. Sign up today at dataengineeringpodcast.com/hex to get a 30-day free trial for your team! Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy and Rob Goretsky about when and how to think about migrating your data stack

Interview

Introduction How did you get involved in the area of data management? A migration can be anything from a minor task to a major undertaking. Can you start by describing what constitutes a migration for the purposes of this conversation? Is it possible to completely avoid having to invest in a migration? What are the signals that point to the need for a migration?

What are some of the sources of cost that need to be accounted for when considering a migration? (both in terms of doing one, and the costs of not doing one) What are some signals that a migration is not the right solution for a perceived problem?

Once the decision has been made that a migration is necessary, what are the questions that the team should be asking to determine the technologies to move to and the sequencing of execution? What are the preceding tasks that should be completed before starting the migration to ensure there is no breakage downstream of the changing component(s)? What are some of the ways that a migration effort might fail? What are the major pitfalls that teams need to be aware of as they work through a data platform migration? What are the opportunities for automation during the migration process? What are the most interesting, innovative, or unexpected ways that you have seen teams approach a platform migration? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data platform migrations? What are some ways that the technologies and patterns that we use can be evolved to reduce the cost/impact/need for migraitons?

Contact Info

Gleb

LinkedIn @glebmm on Twitter

Rob

LinkedIn RobGoretsky on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Datafold

Podcast Episode

Informatica Airflow Snowflake

Podcast Episode

Redshift Eventbrite Teradata BigQuery Trino EMR == Elastic Map-Reduce Shadow IT

Podcast Episode

Mode Analytics Looker Sunk Cost Fallacy data-diff

Podcast Episode

SQLGlot Dagster dbt

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Hex: Hex Tech Logo

Hex is a collaborative workspace for data science and analytics. A single place for teams to explore, transform, and visualize data into beautiful interactive reports. Use SQL, Python, R, no-code and AI to find and share insights across your organization. Empower everyone in an organization to make an impact with data. Sign up today at [dataengineeringpodcast.com/hex](https://www.dataengineeringpodcast.com/hex} and get 30 days free!Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackSupport Data Engineering Podcast

Data Wrangling with SQL

2023-07-31 · O'Reilly SQL Books O'Reilly Amazon

book

by Shivangi Saxena , Raghav Kandarpa

Data Analytics SQL

Develop a comprehensive understanding of data wrangling with SQL to transform raw data into actionable insights. This hands-on guide, 'Data Wrangling with SQL,' leads you through fundamentals to advanced techniques for cleaning, analyzing, and engineering data. By mastering these techniques, you'll improve your data analysis capabilities and solve real-world data challenges efficiently. What this Book will help me do Understand and implement data wrangling steps using SQL, including handling missing data and optimizing queries. Master advanced SQL features like subqueries, aggregate functions, and common table expressions for effective data transformations. Apply data cleaning techniques to ensure data consistency and prepare it for deeper analysis and reporting. Optimize the structure and performance of SQL queries to work seamlessly with large datasets and improve decision-making processes. Gain practical skills with hands-on examples and exercises to consolidate your SQL abilities for real-world applications. Author(s) Raghav Kandarpa and Shivangi Saxena are experienced professionals in data analytics and database management. Their combined expertise in teaching SQL and working on real-world data analysis projects makes them ideal mentors for learning practical data wrangling concepts. They emphasize simplicity and clarity in their approach, offering a practical learning experience. Who is it for? This book is designed for data analysts, data scientists, and professionals dealing with business insights who aim to enhance their SQL skills for data wrangling and transformation. It suits those with basic SQL knowledge looking to refine their grasp of data manipulation techniques. Beginners to intermediate-level practitioners in data analysis will find practical guidance here for real-world data challenges. Readers aspiring to use SQL effectively for database analysis and decision-making will benefit greatly.

Perfect PCE, Problematic Politics

2023-07-28 · Moody's Talks - Inside Economics Listen

podcast_episode

by Matt Robison (Beyond Politics) , Cris deRitis , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics)

Mark, Cris and Marisa (yes, she is back) welcome Matt Robison of the Beyond Politics podcast to talk policy and politics. The discussion ranges from the risk of a government shutdown and Bidenomics to a consideration of whether the nation’s politics are as fractured as they seem and who is going to be the next President. It goes without saying there was also a fulsome conversation about this past week’s economic data - could the numbers have been any better? Even “supercore” inflation was up just 0.2% month over month and 4.2% year over year in June. For more from Matt Robison, click here For the full transcript, click here Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight.

Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

dbt Labs on dbt (w/ Daniel Le)

2023-07-28 · The Analytics Engineering Podcast Listen

podcast_episode

by Daniel Le (dbt Labs) , Julia Schottenstein (dbt labs)

Analytics Engineering Cloud Computing dbt SaaS

Daniel Le is the CFO at dbt Labs where he has built multiple teams. He is also the former head of FP&A and operations at Zoom, and he helped scale FP&A as the former finance director at Okta. In this conversation with Julia, Daniel shares his view as CFO on the challenges SaaS companies face and the importance of finance teams creating a holistic view of their business. Daniel gives advice to data leaders about how they can automate business processes with dbt Cloud and use self-service analytics to automate revenue recognition, generate consistent headcount analytics, and more to impact their organization. Read more about Daniel's story here. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Cross-Platform Data Lineage with OpenLineage

2023-07-28 · Databricks DATA + AI Summit 2023 Watch

video

by Willy Lulciuc (WeWork) , Julien Le Dem (Astronomer)

AI/ML Airflow Flink Data Quality Databricks dbt Spark

There are more data tools available than ever before, and it is easier to build a pipeline than it has ever been. These tools and advancements have created an explosion of innovation, resulting in data within today's organizations becoming increasingly distributed and can't be contained within a single brain, a single team, or a single platform. Data lineage can help by tracing the relationships between datasets and providing a map of your entire data universe.

OpenLineage provides a standard for lineage collection that spans multiple platforms, including Apache Airflow, Apache Spark™, Flink®, and dbt. This empowers teams to diagnose and address widespread data quality and efficiency issues in real time. In this session, we will show how to trace data lineage across Apache Spark and Apache Airflow. There will be a walk-through of the OpenLineage architecture and a live demo of a running pipeline with real-time data lineage.

Talk by: Julien Le Dem,Willy Lulciuc

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Internet-Scale Analytics: Migrating a Mission Critical Product to the Cloud

2023-07-28 · Databricks DATA + AI Summit 2023 Watch

video

by Yaniv Kunda

Cloud Computing Data Lakehouse Databricks DWH Cyber Security

While we may not all agree on a “If it ain’t broke, don’t fix it” approach, we can all agree that “If it shows any crack, migrate it to the cloud and completely re-architect it.” Akamai’s CSI (Cloud Security Intelligence) group is responsible for processing massive amounts of security events arriving from our edge network, which is estimated to process 30% of internet traffic, making it accessible by various internal consumers powering customer-facing products.

In this session, we will visit the reasons for migrating one of our mission critical security products and its 10GB ingest pipeline to the cloud, examine our new architecture and its benefits and touch on the challenges we faced during the process (and still do). While our requirements are unique and our solution contains a few proprietary components, this session will provide you with several concepts involving popular off-the-shelf products you can easily use in your own cloud environment.

Talk by: Yaniv Kunda

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

If a Duck Quacks in the Forest and Everyone Hears, Should You Care?

2023-07-28 · Databricks DATA + AI Summit 2023 Watch

video

by Ryan Boyd (Databricks)

Cloud Computing Data Lakehouse Databricks DuckDB DWH Pandas SQL

YES! "Duck posting" has become an internet meme for praising DuckDB on Twitter. Nearly every quack using DuckDB has done it once or twice. But, why all the fuss? With advances in CPUs, memory, SSDs, and the software that enables it all, our personal machines are powerful beasts relegated to handling a few Chrome tabs and sitting 90% idle. As data engineers and data analysts, this seems like a waste that's not only expensive, but also impacting the environment.

In this session, you will see how DuckDB brings SQL analytics capabilities to a 2MB standalone executable on your laptop that only recently required a large cluster. This session will explain the architecture of DuckDB that enables high performance analytics on a laptop: great query optimization, vectorized execution, continuous improvements in compression and more. We will show its capabilities using live demos, from the pandas library to WASM, to the command-line. We'll demonstrate performance on large datasets, and talk about how we're exploring using the laptop to augment cloud analytics workloads.

Talk by: Ryan Boyd

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Lakehouse to Fight Cancer:Ontada’s Journey to Establish a RWD Platform on Databricks Lakehouse

2023-07-28 · Databricks DATA + AI Summit 2023 Watch

video

by Donghwa Kim

BI Data Lakehouse Databricks DWH NLP Oracle

Ontada, a McKesson business, is an oncology real-world data and evidence, clinical education and provider of technology business dedicated to transforming the fight against cancer. Core to Ontada’s mission is using real-world data (RWD) and evidence generation to improve patient health outcomes and to accelerate life science research.

To support its mission, Ontada embarked on a journey to migrate its enterprise data warehouse (EDW) from an on-premise Oracle database to Databricks Lakehouse. This move allows Ontada to now consume data from any source, including structured and unstructured data from its own EHR and genomics lab results, and realize faster time to insight. In addition, using the Lakehouse has helped Ontada eliminate data silos, enabling the organization to realize the full potential of RWD – from running traditional descriptive analytics to extracting biomarkers from unstructured data. The session will cover the following topics:

Oracle to Databricks: migration best practices and lessons learned
People, process, and tools: expediting innovation while protecting patient information using Unity Catalog
Getting the most out of the Databricks Lakehouse: from BI to genomics, running all analytics under one platform
Hyperscale biomarker abstraction: reducing the manual effort needed to extract biomarkers from large unstructured data (medical notes, scanned/faxed documents) using spaCY and John Snow Lab NLP libraries

Join this session to hear how Ontada is transforming RWD to deliver safe and effective cancer treatment.

Talk by: Donghwa Kim

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Architecture: IQVIA's Migration to Databricks Lakehouse for High-Performance Analytics

2023-07-27 · Databricks DATA + AI Summit 2023 Watch

video

by Venkat Dasari , William Zanine

AI/ML Data Lakehouse Databricks ETL/ELT

As the healthcare and life science (HLS) industry has grown and evolved, a need has emerged for scalable and cost-effective ETL solutions capable of processing billions of records at terabyte scale. IQVIA has the largest global healthcare data networks in the world, with over one million data sources providing access to 1.2B non-identified patient records and 100 billion healthcare records processed annually in over 100 countries. IQVIA’s ability to combine, centralize, and integrate various sources of HLS data enables clinical-to-commercial operational intelligence and omnichannel analytics for its clients. Databricks Lakehouse allows IQVIA to onboard the rapidly growing number of clients while delivering strong business value to customers, cost-efficiently and at scale.

During this session, you will learn more about how IQVIA is leveraging Databricks Lakehouse as well as how HLS organizations can soon access IQVIA data assets though the Databricks Marketplace for quick and secure data sharing.

Talk by: Venkat Dasari and William Zanine

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Caching Strategies for Data Analytics and AI

2023-07-27 · Databricks DATA + AI Summit 2023 Watch

video

by Beinan Wang , Chunxu Tang

AI/ML Data Analytics Databricks SQL

he increasing popularity of data analytics and artificial intelligence (AI) has led to a dramatic increase in the volume of data being used in these fields, creating a growing need for an enhanced computational capability. Cache plays a crucial role as an accelerator for data and AI computations, but it is important to note that these domains have different data access patterns, requiring different cache strategies. In this session, you will see our observations on data access patterns in the analytical SQL and AI training domains based on practical experience with large-scale systems. We will discuss the evaluation results of various caching strategies for analytical SQL and AI and provide caching recommendations for different use cases. Over the years, we have learned some best practices from big internet companies about the following aspects of our journey:

Traffic pattern for analytical SQL and cache strategy recommendation
Traffic pattern for AI training and how we can measure the cache efficiency for different AI training process
Cache capacity planning based on real-time metrics of the working set
Adaptive caching admission and eviction for uncertain traffic patterns

Talk by: Chunxu Tang and Beinan Wang

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Democratization at Michelin

2023-07-27 · Databricks DATA + AI Summit 2023 Watch

video

by Fabien Cochet , Philippe Leonhart

Data Analytics Data Lakehouse Databricks DWH

Too often business decisions in large organizations are based on time consuming and labor-intensive data extracts, fragile Excel or access sheets that require significant manual intervention. The teams that prepare these manual reports have invaluable heuristic knowledge that, when combined with meaningful data and tools, can make smart business decisions. Imagine a world where these business teams are empowered with tools that help them build meaningful reports despite their limited technical expertise.

In this session, we will discuss: - The value derived from investing in developing citizen data personas within a business organization - How we successfully built a citizen data analytics culture within Michelin - Real examples of the impact of this initiative on the business and on the people themselves

The audience will walk away with some convincing arguments for building a citizen data culture in their organization and a how-to cookbook that they can use to cultivate citizen data personas. Finally, they can interactively uncover key success factors in the case of Michelin that can help drive a similar initiative in their respective companies.

Talk by: Philippe Leonhart and Fabien Cochet

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics?

2023-07-27 · Databricks DATA + AI Summit 2023 Watch

video

by Oz Katz (Treeverse)

Arrow Big Data Data Engineering Data Lakehouse Databricks Delta DWH Polars Rust

Rust is a unique language whose traits make it very appealing for data engineering. In this session, we'll walk through the different aspects of the language that make it such a good fit for big data processing including: how it improves performance and how it provides greater safety guarantees and compatibility with a wide range of existing tools that make it well positioned to become a major building block for the future of analytics.

We will also take a hands-on look through real code examples at a few emerging technologies built on top of Rust that utilize these capabilities, and learn how to apply them to our modern lakehouse architecture.

Talk by: Oz Katz

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

talk-data.com

Activity Trend

Top Events

Top Speakers

Banner Inflation, Banking Braves Out

Ramp's $8 Billion Data Strategy (W/ Ian Macomber and Ryan Delgado)

70: He’s Hired 30 Data Analysts; Here’s What You Should Know

#149 Expanding the Scope of Generative AI in the Enterprise with Bal Heroor, CEO and Principal at Mactores

Quantifying The Return On Investment For Your Data Team

Near Perfect, New Probability

69: Skills, Networking, Portfolio: How Brad Yarbro Landed a Data Job

ANI / AGI / ASI

Strategies For A Successful Data Platform Migration

Data Wrangling with SQL

Perfect PCE, Problematic Politics

dbt Labs on dbt (w/ Daniel Le)

Cross-Platform Data Lineage with OpenLineage

Internet-Scale Analytics: Migrating a Mission Critical Product to the Cloud

If a Duck Quacks in the Forest and Everyone Hears, Should You Care?

Using Lakehouse to Fight Cancer:Ontada’s Journey to Establish a RWD Platform on Databricks Lakehouse

Data Architecture: IQVIA's Migration to Databricks Lakehouse for High-Performance Analytics

Data Caching Strategies for Data Analytics and AI

Data Democratization at Michelin

Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics?