talk-data.com talk-data.com

Topic

BI

Business Intelligence (BI)

data_visualization reporting analytics

1211

tagged

Activity Trend

111 peak/qtr
2020-Q1 2026-Q1

Activities

1211 activities · Newest first

What skills should you learn when studying to be a Data Analyst?

Join me with data legend Luke Barousse to discuss where you should focus your time.

Is it Python? Is it SQL? Is it Excel? Is it Power BI?

Listen to find out 👀

Connect with Luke Barousse:

🤝 Connect on Linkedin

▶️ Subscribe on Youtube

📊 Datanerd.tech

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(03:42) - Analyzing 1.2M data jobs (DataNerd.tech)

(06:21) - The most important data skills

(12:13) - More senior skills

(22:52) - Data job titles

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Summary

Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its implementation in your environment.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm welcoming back Nick Schrock to talk about the state of the ecosystem for data orchestration

Interview

Introduction How did you get involved in the area of data management? Can you start by defining what data orchestration is and how it differs from other types of orchestration systems? (e.g. container orchestration, generalized workflow orchestration, etc.) What are the misconceptions about the applications of/need for/cost to implement data orchestration?

How do those challenges of customer education change across roles/personas?

Because of the multi-faceted nature of data in an organization, how does that influence the capabilities and interfaces that are needed in an orchestration engine? You have been working on Dagster for five years now. How have the requirements/adoption/application for orchestrators changed in that time? One of the challenges for any orchestration engine is to balance the need for robust and extensible core capabilities with a rich suite of integrations to the broader data ecosystem. What are the factors that you have seen make the most influence in driving adoption of a given engine? What are the most interesting, innovative, or unexpected ways that you have seen data orchestration implemented and/or used? What are the most interesting, unexpected, or challenging lessons that you have learned while working o

For the past few years, we've seen the importance of data literacy and why organizations must invest in a data-driven culture, mindset, and skillset. However, as generative AI tools like ChatGPT have risen to prominence in the past year, AI literacy has never been more important. But how do we begin to approach AI literacy? Is it an extension of data literacy, a complement, or a new paradigm altogether? How should you get started on your AI literacy ambitions?  Cindi Howson is the Chief Data Strategy Officer at ThoughtSpot and host of The Data Chief podcast. Cindi is a data analytics, AI, and BI thought leader and an expert with a flair for bridging business needs with technology. As Chief Data Strategy Officer at ThoughtSpot, she advises top clients on data strategy and best practices to become data-driven, speaks internationally on top trends such as AI ethics, and influences ThoughtSpot’s product strategy.

Cindi was previously a Gartner Research Vice President, the lead author for the data and analytics maturity model and analytics and BI Magic Quadrant, and a popular keynote speaker. She introduced new research in data and AI for good, NLP/BI Search, and augmented analytics, bringing both BI bake-offs and innovation panels to Gartner globally. She’s frequently quoted in MIT, Harvard Business Review, and Information Week. She is rated a top 12 influencer in big data and analytics by Analytics Insight, Onalytca, Solutions Review, and Humans of Data.

In the episode, Cindi and Adel discuss how generative AI accelerates an organization’s data literacy, how leaders can think beyond data literacy and start to think about AI literacy, the importance of responsible use of AI, how to best communicate the value of AI within your organization, what generative AI means for data teams, AI use-cases in the data space, the psychological barriers blocking AI adoption, and much more. 

Links Mentioned in the Show: The Data Chief Podcast  ThoughtSpot Sage  BloombergGPT  Radar: Data & AI Literacy Course: AI Ethics  Course: Generative AI Concepts Course: Implementing AI Solutions in Business 

Summary

Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Your host is Tobias Macey and today I'm interviewing Adrian Brudaru about dlt, an open source python library for data loading

Interview

Introduction How did you get involved in the area of data management? Can you describe what dlt is and the story behind it?

What is the problem you want to solve with dlt? Who is the target audience?

The obvious comparison is with systems like Singer/Meltano/Airbyte in the open source space, or Fivetran/Matillion/etc. in the commercial space. What are the complexities or limitations of those tools that leave an opening for dlt? Can you describe how dlt is implemented? What are the benefits of building it in Python? How have the design and goals of the project changed since you first started working on it? How does that language choice influence the performance and scaling characteristics? What problems do users solve with dlt? What are the interfaces available for extending/customizing/integrating with dlt? Can you talk through the process of adding a new source/destination? What is the workflow for someone building a pipeline with dlt? How does the experience scale when supporting multiple connections? Given the limited scope of extract and load, and the composable design of dlt it seems like a purpose built companion to dbt (down to th

Business Intelligence Career Master Plan

Embark on your business intelligence career with 'Business Intelligence Career Master Plan'. This book provides you with a clear roadmap, actionable insights, and expert advice to help you navigate the challenges of building a successful career in BI. You'll learn everything from identifying your starting point in BI to developing critical skills in data analysis, visualization, and management. What this Book will help me do Understand various business intelligence roles and their responsibilities to find your ideal BI career path. Develop expertise in using tools like Power BI and databases like AdventureWorks to handle and analyze data effectively. Master the art of creating informative and compelling data visualizations to tell impactful data stories. Gain the technical skills needed for programming and system development to excel in the BI field. Learn how to automate and optimize BI workflows to enhance productivity and efficiency. Author(s) The authors, None Chavez and None Moncada, excel in mentoring aspiring business intelligence professionals. With vast experience in BI systems and project management, they aim to make technical concepts accessible and fascinating. Their hands-on guidance empowers readers to build essential skills and thrive in the BI field. Who is it for? This book is ideal for aspiring business intelligence developers and data analysts eager to advance their careers. If you're passionate about data and enjoy solving complex problems, this resource will equip you with the knowledge and tools to succeed. Starting with a foundational understanding of common tools like Excel and SQL is recommended to get the most out of this book.

Mastering Tableau 2023 - Fourth Edition

This comprehensive book on Tableau 2023 is your practical guide to mastering data visualization and business intelligence techniques. You will explore the latest features of Tableau, learn how to create insightful dashboards, and gain proficiency in integrating analytics and machine learning workflows. By the end, you'll have the skills to address a variety of analytics challenges using Tableau. What this Book will help me do Master the latest Tableau 2023 features and use cases to tackle analytics challenges. Develop and implement ETL workflows using Tableau Prep Builder for optimized data preparation. Integrate Tableau with programming languages such as Python and R to enhance analytics. Create engaging, visually impactful dashboards for effective data storytelling. Understand and apply data governance to ensure data quality and compliance. Author(s) Marleen Meier is an experienced data visualization expert and Tableau consultant with over a decade of experience helping organizations transform data into actionable insights. Her approach integrates her technical expertise and a keen eye for design to make analytics accessible rather than overwhelming. Her passion for teaching others to use visualization tools effectively shines through in her writing. Who is it for? This book is ideal for business analysts, BI professionals, or data analysts looking to enhance their Tableau expertise. It caters to both newcomers seeking to understand the foundations of Tableau and experienced users aiming to refine their skills in advanced analytics and data visualization. If your goal is to leverage Tableau as a strategic tool in your organization's BI projects, this book is for you.

Summary

Data persistence is one of the most challenging aspects of computer systems. In the era of the cloud most developers rely on hosted services to manage their databases, but what if you are a cloud service? In this episode Vignesh Ravichandran explains how his team at Cloudflare provides PostgreSQL as a service to their developers for low latency and high uptime services at global scale. This is an interesting and insightful look at pragmatic engineering for reliability and scale.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Vignesh Ravichandran about building an internal database as a service platform at Cloudflare

Interview

Introduction How did you get involved in the area of data management? Can you start by describing the different database workloads that you have at Cloudflare?

What are the different methods that you have used for managing database instances?

What are the requirements and constraints that you had to account for in designing your current system? Why Postgres? optimizations for Postgres

simplification from not supporting multiple engines

limitations in postgres that make multi-tenancy challenging scale of operation (data volume, request rate What are the most interesting, innovative, or unexpected ways that you have seen your DBaaS used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on your internal database platform? When is an internal database as a service the wrong choice? What do you have planned for the future of Postgres hosting at Cloudflare?

Contact Info

LinkedIn Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Mac

Extending Microsoft Business Central with Power Platform

Unlock the full potential of Microsoft Business Central by integrating it with the Power Platform through this practical and hands-on guide. With step-by-step tutorials, you'll learn how to combine the capabilities of tools like Power Apps, Power Automate, and Dataverse to build scalable and efficient business solutions. By the end of the book, you'll be equipped to streamline business processes and add significant value. What this Book will help me do Effectively deploy Power Platform functionalities for Microsoft Business Central projects. Seamlessly connect Business Central with cloud and on-premises services. Leverage Dataverse and virtual tables to enhance data modeling and accessibility. Build custom applications using Power Apps and automate workflows with Power Automate. Generate advanced visual reports with Power BI directly integrated with Business Central. Author(s) Kim Congleton and Shawn Sissenwein are industry professionals with extensive experience in ERP systems and Microsoft technologies. With a deep knowledge of Business Central and the Power Platform, they bring practical insights into maximizing business value through technological advancements. Their teaching approach focuses on hands-on learning, real-world application, and empowering readers with actionable skills. Who is it for? This book is ideal for Business Central users, consultants, and solution architects aiming to enhance Business Central's capabilities through the Power Platform. If you're familiar with Business Central's basics and seek to optimize and extend its functionality without requiring extensive programming knowledge, then this guide is tailored for you.

Summary

Generative AI has unlocked a massive opportunity for content creation. There is also an unfulfilled need for experts to be able to share their knowledge and build communities. Illumidesk was built to take advantage of this intersection. In this episode Greg Werner explains how they are using generative AI as an assistive tool for creating educational material, as well as building a data driven experience for learners.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Greg Werner about building IllumiDesk, a data-driven and AI powered online learning platform

Interview

Introduction How did you get involved in the area of data management? Can you describe what Illumidesk is and the story behind it? What are the challenges that educators and content creators face in developing and maintaining digital course materials for their target audiences? How are you leaning on data integrations and AI to reduce the initial time investment required to deliver courseware? What are the opportunities for collecting and collating learner interactions with the course materials to provide feedback to the instructors? What are some of the ways that you are incorporating pedagogical strategies into the measurement and evaluation methods that you use for reports? What are the different categories of insights that you need to provide across the different stakeholders/personas who are interacting with the platform and learning content? Can you describe how you have architected the Illumidesk platform? How have the design and goals shifted since you first began working on it? What are the strategies that you have used to allow for evolution and adaptation of the system in order to keep pace with the ecosystem of generative AI capabilities? What are the failure modes of the content generation that you need to account for? What are the most interesting, innovative, or unexpected ways that you have seen Illumidesk us

Em um papo empolgante, mergulhamos no universo dos profissionais de dados e suas habilidades essenciais, com um foco especial no poderoso Power BI e demais ferramentas. Descubra, como as ferramentas de analytics, como o Power BI, estão moldando o futuro do campo de dados e análises.

Nste episódio do Data Hackers — a maior comunidade de AI e Data Science do Brasil-, conheçam as apaixonadas pela área de dados e principais referências no assunto: a Karine Lago — especialista em Business Intelligence, Power BI e Excel, premiada pela Microsoft mais de sete vezes e Escritora; e a Letícia Smirelli — Chief Product Officer (CPO), Power BI Specialist, Microsoft Data Analyst Associate e DataViz & Dashboard Design; ambas sócias na Nexos Educação.

Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Caso queira, você também pode ouvir o episódio aqui no post mesmo!

Link Medium: https://medium.com/data-hackers/power-bi-dashboards-e-a-carreira-de-analista-de-dados-data-hackers-podcast-72-829986f5f2a1

Falamos no episódio

Conheça nosso convidado:

Karine Lago — especialista em Business Intelligence, Escritora, Power BI e Excel, premiada pela Microsoft mais de sete vezes;  Letícia Smirelli — Chief Product Officer (CPO), Power BI Specialist, Microsoft Data Analyst Associate e DataViz & Dashboard Design.

Bancada Data Hackers:

Paulo Vasconcellos Gabriel Lages Monique Femme

Links de referências:

Tech and Cheers — Meetup ed. Data Connect (São Paulo): https://www.sympla.com.br/evento/tech-and-cheers-meetup-ed-data-connect/2110360 https://towardsdatascience.com/whats-the-difference-between-analytics-and-statistics-cd35d457e17

Tech and Cheers — ed. Mulher.ADA (Blumenau):https://www.sympla.com.br/evento/tech-and-cheers-meetup-ed-mulher-ada/2109236

World Economic Forum (The Future of Jobs Report 2023):https://www.weforum.org/reports/the-future-of-jobs-report-2023/ Canal Karine Lago (Youtube):https://www.youtube.com/@KarineLago Pagina Karine Lago: https://keepo.io/karinedolago/?fbclid=PAAaZ32JXyRtPv7wcHcfaxtKA5TOU9VRaCt_F_nb7zhAptO4AtthorxiHWCdg_aem_Ab53sgYj0AXg1wHrOP9-c_K7pwoMqX0psYWAvNMAanqh5pafTHBFb3bnshKB534J9AA Canal Leticia Smirelli (Youtube): https://www.youtube.com/@LeticiaSmirelli Pagina Leticia Smirelli: https://keepo.io/leticia/?fbclid=PAAabu7cvnFTkkFw1UiJrDMIXiMJ45Av6XKlCXIfWAUiRH2c4kiSZzo7FX6TY_aem_Ab7BHn25MaVK22HFw9zXNfsYv5k5Y5o9WLMGZeFB9wSSSAV3d7EDA0JuGjXWSqd_SEs

Summary

Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you're not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. In this episode Ariel Pohoryles explains what they are and how they work together to increase your chances of success.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about the seven principles of modern data pipelines

Interview

Introduction How did you get involved in the area of data management? Can you start by defining what you mean by a "modern" data pipeline? At Rivery you published a white paper identifying seven principles of modern data pipelines:

Zero infrastructure management ELT-first mindset Speaks SQL and Python Dynamic multi-storage layers Reverse ETL & operational analytics Full transparency Faster time to value

What are the applications of data that you focused on while identifying these principles? How do the application of these principles influence the ability of organizations and their data teams to encourage and keep pace with the use of data in the business? What are the technical components of a pipeline infrastructure that are necessary to support a "modern" workflow? How do the technologies involved impact the organizational involvement with how data is applied throughout the business? When using managed services, what are the ways that the pricing model acts to encourage/discourage experimentation/exploration with data? What are the most interesting, innovative, or unexpected ways that you have seen these seven principles implemented/applied? What are the most interesting, unexpected, or challenging lessons that you have learned while working with customers to adapt to these principles? What are the cases where some/all of these principles are undesirable/impractical to implement? What are the opportunities for further advancement/sophistication in the ways that teams work with and gain value from data?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned somethi

Using Lakehouse to Fight Cancer:Ontada’s Journey to Establish a RWD Platform on Databricks Lakehouse

Ontada, a McKesson business, is an oncology real-world data and evidence, clinical education and provider of technology business dedicated to transforming the fight against cancer. Core to Ontada’s mission is using real-world data (RWD) and evidence generation to improve patient health outcomes and to accelerate life science research.

To support its mission, Ontada embarked on a journey to migrate its enterprise data warehouse (EDW) from an on-premise Oracle database to Databricks Lakehouse. This move allows Ontada to now consume data from any source, including structured and unstructured data from its own EHR and genomics lab results, and realize faster time to insight. In addition, using the Lakehouse has helped Ontada eliminate data silos, enabling the organization to realize the full potential of RWD – from running traditional descriptive analytics to extracting biomarkers from unstructured data. The session will cover the following topics:

  • Oracle to Databricks: migration best practices and lessons learned
  • People, process, and tools: expediting innovation while protecting patient information using Unity Catalog
  • Getting the most out of the Databricks Lakehouse: from BI to genomics, running all analytics under one platform
  • Hyperscale biomarker abstraction: reducing the manual effort needed to extract biomarkers from large unstructured data (medical notes, scanned/faxed documents) using spaCY and John Snow Lab NLP libraries

Join this session to hear how Ontada is transforming RWD to deliver safe and effective cancer treatment.

Talk by: Donghwa Kim

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Sisense-Developing Data Products: Infusion & Composability Are Changing Expectations

Composable analytics is the next progression of business intelligence. We will discuss how current analytics rely on two key principles: composability and agility. Through modularizing our analytics capabilities, we can rapidly “compose” new data applications. An organization uses these building blocks to deliver customized analytics experiences at a customer level.

This session will orientate business intelligence leaders to composable data and analytics.

  • How data teams can use composable analytics to decrease application development time.
  • How an organization can leverage existing and new tools to maximize value-based, data-driven insights.
    • Requirements for effectively deploying composable analytics.
    • Utilizing no, low-code and high-code analytics capabilities.
    • Extracting full value from your customer data and metadata.
    • Leveraging analytics building blocks to create new products and revenue streams.

Talk by: Scott Castle

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

JetBlue’s Real-Time AI & ML Digital Twin Journey Using Databricks

JetBlue has embarked over the past year on an AI and ML transformation. Databricks has been instrumental in this transformation due to the ability to integrate streaming pipelines, ML training using MLflow, ML API serving using ML registry and more in one cohesive platform. Using real-time streams of weather, aircraft sensors, FAA data feeds, JetBlue operations and more are used for the world's first AI and ML operating system orchestrating a digital-twin, known as BlueSky for efficient and safe operations. JetBlue has over 10 ML products (multiple models each product) in production across multiple verticals including dynamic pricing, customer recommendation engines, supply chain optimization, customer sentiment NLP and several more.

The core JetBlue data science and analytics team consists of Operations Data Science, Commercial Data Science, AI and ML engineering and Business Intelligence. To facilitate the rapid growth and faster go-to-market strategy, the team has built an internal Data Catalog + AutoML + AutoDeploy wrapper called BlueML using Databricks features to empower data scientists including advanced analysts with the ability to train and deploy ML models in less than five lines of code.

Talk by: Derrick Olson and Rob Bajra

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Why a Major Japanese Financial Institution Chose Databricks To Accelerate its Data AI-Driven Journey

In this session, NTT DATA presents a case study involving of one of the largest and most prominent financial institutions in Japan. The project involved migration from the largest data analysis platform to Databricks, a project that required careful navigation of very strict security requirements while accommodating the needs of evolving technical solutions so they could support a wide variety of company structures. This session is for those who want to accelerate their business by effectively utilizing AI as well as BI.

NTT DATA is one of the largest system integrators in Japan, providing data analytics infrastructure to leading companies to help them effectively drive the democratization of data and AI as many in the Japanese market are now adding AI into their BI offering.

Talk by: Yuki Saito

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks SQL: Why the Best Serverless Data Warehouse is a Lakehouse

Many organizations rely on complex cloud data architectures that create silos between applications, users and data. This fragmentation makes it difficult to access accurate, up-to-date information for analytics, often resulting in the use of outdated data. Enter the lakehouse, a modern data architecture that unifies data, AI, and analytics in a single location.

This session explores why the lakehouse is the best data warehouse, featuring success stories, use cases and best practices from industry experts. You'll discover how to unify and govern business-critical data at scale to build a curated data lake for data warehousing, SQL and BI. Additionally, you'll learn how Databricks SQL can help lower costs and get started in seconds with on-demand, elastic SQL serverless warehouses, and how to empower analytics engineers and analysts to quickly find and share new insights using their preferred BI and SQL tools such as Fivetran, dbt, Tableau, or Power BI.

Talk by: Miranda Luna and Cyrielle Simeone

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Real-Time Streaming Solution for Call Center Analytics: Business Challenges and Technical Enablement

A large international client with a business footprint in North America, Europe and Africa reached out to us with an interest in having a real-time streaming solution designed and implemented for its call center handling incoming and outgoing client calls. The client had a previous bad experience with another vendor, who overpromised and underdelivered on the latency of the streaming solution. The previous vendor delivered an over-complex streaming data pipeline resulting in the data taking over five minutes to reach a visualization layer. The client felt that architecture was too complex and involved many services integrated together.

Our immediate challenges involved gaining the client's trust and proving that our design and implementation quality would supersede a previous experience. To resolve an immediate challenge of the overly complicated pipeline design, we deployed a Databricks Lakehouse architecture with Azure Databricks at the center of the solution. Our reference architecture integrated Genesys Cloud : App Services : Event Hub : Databricks : : Data Lake : Power BI.

The streaming solution proved to be low latency (seconds) during the POV stage, which led to subsequent productionalization of the pipeline with deployment of jobs, DLTs pipeline, including multi-notebook workflow and business and performance metrics dashboarding relied on by the call center staff for a day-to-day performance monitoring and improvements.

Talk by: Natalia Demidova

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Streaming Data Analytics with Power BI and Databricks

This session is comprised of a series of end-to-end technical demos illustrating the synergy between Databricks and Power BI for streaming use cases, and considerations around when to choose which scenario:

Scenario 1: DLT + Power BI Direct Query and Auto Refresh

Scenario 2: Structured Streaming + Power BI streaming datasets

Scenario 3: DLT + Power BI composite datasets

Talk by: Liping Huang and Marius Panga

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Building Apps on the Lakehouse with Databricks SQL

BI applications are undoubtedly one of the major consumers of a data warehouse. Nevertheless, the prospect of accessing data using standard SQL is appealing to many more stakeholders than just the data analysts. We’ve heard from customers that they experience an increasing demand to provide access to data in their lakehouse platforms from external applications beyond BI, such as e-commerce platforms, CRM systems, SaaS applications, or custom data applications developed in-house. These applications require an “always on” experience, which makes Databricks SQL Serverless a great fit.

In this session, we give an overview of the approaches available to application developers to connect to Databricks SQL and create modern data applications tailored to needs of users across an entire organization. We discuss when to choose one of the Databricks native client libraries for languages such as Python, Go, or node.js and when to use the SQL Statement Execution API, the newest addition to the toolset. We also explain when ODBC and JDBC might not be the best for the task and when they are your best friends. Live demos are included.

Talk by: Adriana Ispas and Chris Stevens

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks SQL Serverless Under the Hood: How We Use ML to Get the Best Price/Performance

Join this session to learn how Databricks SQL Serverless warehouses use ML to make large improvements in price-performance for both ETL and BI workloads. We will demonstrate how they can cater to an organization’s peak concurrency needs for BI and showcase the latest advancements in resource-based scheduling, autoscaling, and caching enhancements that allow for seamless performance and workload management. We will deep dive into new features such as Predictive I/O and Intelligent Workload Management, and show new price/performance benchmarks.

Talk by: Gaurav Saraf, Mostafa Mokhtar, and Jeremy Lewallen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin