talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

Pelo quarto ano consecutivo, ainda acreditamos no IPO da Databricks e o Paulo Vasconcellos, já apostou todas as suas forças na venda da Stability AI, em 2024. E se você ouviu o episódio de Tendências, que gravamos no ano passado, acertamos quase todas previsões !!

Agora, chegou aquele momento do ano em que vamos tentar prever o que será tendência em Dados e AI para o ano de 2024! Vem com a gente pra esse papo com nossos community managers: Marlesson Santana , Pietro Oliveira e a bancada Data Hackers.

Façam suas apostas !! 

Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Caso queira, você também pode ouvir o episódio aqui no post mesmo!

[EMBEDAR_EPISODIO]

Conheça nosso convidado:

Pietro Oliveira Marlesson Santana

Nossa Bancada Data Hackers:

Monique Femme — Head of Community Management na Data Hackers Paulo Vasconcellos — Co-founder da Data Hackers e Principal Data Scientist na Hotmart. Gabriel Lages — Co-founder da Data Hackers e Data & Analytics Sr. Director na Hotmart.

Falamos no episódioLinks de referências:

Ouça o episódio de Tendências de 2023: https://medium.com/data-hackers/as-tend%C3%AAncias-para-dados-e-ai-em-2023-data-hackers-podcast-62-2dff6fdddb6e O Brasileiro com a IA mais baixada do Mundo — Data Hackers Podcast 70:https://medium.com/data-hackers/o-brasileiro-com-a-ia-mais-baixada-do-mundo-data-hackers-podcast-70-e13a8c66fbcd Matéria:É verdade? Museu do Louvre pega fogo e vídeo viraliza": https://www.folhavitoria.com.br/geral/noticia/01/2024/e-verdade-museu-do-louvre-pega-fogo-e-video-viraliza-assustador-viral Pika (AI Video): https://pika.art/login Eleições na Argentina: IA vira arma de campanha: https://olhardigital.com.br/2023/11/17/pro/ia-vira-arma-de-campanha-durante-eleicoes-na-argentina/ Cloud da Magalu: https://medium.com/data-hackers/magalu-cloud-por-dentro-da-primeira-cloud-brasileira-em-hiperescala-data-hackers-epis%C3%B3dio-79-3ca324ddf66e

At Databricks, we see organizations looking to leverage the power of AI, not only to deliver intelligent solutions, but to also have intelligent user interfaces. Join us as we delve into how lakehouse architecture forms the backbone of intelligent data platforms, integrating AI to enhance user interaction and self-management. Discover how this evolution is democratizing data and AI access for all data workers in modern organizations, paving the way for the next generation of data and AI enabled solutions.

At Databricks, we see organizations looking to leverage the power of AI, not only to deliver intelligent solutions, but to also have intelligent user interfaces. Join us as we delve into how lakehouse architecture forms the backbone of intelligent data platforms, integrating AI to enhance user interaction and self-management. Discover how this evolution is democratizing data and AI access for all data workers in modern organizations, paving the way for the next generation of data and AI enabled solutions.

Make your data AI ready with Microsoft Fabric and Azure Databricks | BRK221H

Bring your data into the era of AI with Microsoft Fabric, a powerful all in one AI powered analytics solution for enterprises that covers everything from data movement to data science, real time analytics and business intelligence. Learn how Azure Databricks and Microsoft Fabric seamlessly work together to offer customers a modern, price performant analytics solution that helps teams turn data into a competitive advantage.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK221H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill * https://aka.ms/azure-ignite2023-dataaiblog

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Justyna Lucznik * Kristen Christensen * Patrick Baumgartner * Eric McChesney * Hannah Chen * Wangui wmckelvey * Arthi Ramasubramanian Iyer * Chris Finlan * Christian Wade * Ed Donahue * Kasper de Jonge * Mohammad Ali * Ravs Kaur * Steve Howard * Jessica Hawk * Amir Netz * Arun Ulagaratchagan

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK221H | English (US) | Data

MSIgnite

Unify your data across domains clouds and engines in OneLake | BRK223H

OneLake simplifies your data estate for the next generation, empowering you to leverage your investments for a truly hybrid and multi cloud strategy. Learn how to bring your data across clouds, accounts, and engines together faster and more efficiently than ever before. Are you using Azure Databricks? Do you want to understand how to combine that with Microsoft Fabric? We've got you covered. No matter where your data is, OneLake is where we accelerate your data potential, together.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK223H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill * https://aka.ms/azure-ignite2023-dataaiblog

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Joshua Caplan * Priya Sathy * Thasmika Gokal * Tyler Mays-Childers * Ed Donahue * Matthew Hicks * Swetha Mannepalli * Trevor Olson

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK223H | English (US) | Data

MSIgnite

Jonathan Frankle is the Chief Scientist at MosaicML, which was recently bought by Databricks for $1.3 billion.  MosaicML helps customers train generative AI models on their data. Lots of companies are excited about gen AI, and the hope is that their company data and information will be what sets them apart from the competition.  In this conversation with Tristan and Julia, Jonathan discusses a potential future where you can train specialized, purpose-built models, the future of MosaicML inside of Databricks, and the importance of responsible AI practices. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

No Compromises: Analytics Engineering on the Lakehouse (London) - Coalesce 2023

The Lakehouse architecture has emerged as the ideal data architecture for the AI age. It unites the structured world of analytics with the rapidly evolving world of AI.But what really makes something a “Lakehouse” and why should you care? In this session, Databricks discusses the key components of a lakehouse, what you should look for when adapting this paradigm and how Databricks and dbt are together enabling analytics engineers, analysts and data scientists to collaborate together on a single unified platform.

Speaker: Thor List, Senior Field Engineering Manager, Databricks

Register for Coalesce at https://coalesce.getdbt.com

No compromises: Analytics engineering on the Lakehouse - Coalesce 2023

The Lakehouse architecture unites the structured world of analytics with the rapidly evolving world of AI.

But what really makes something a “Lakehouse” and why should you care? In this session, Databricks discusses the key components of a lakehouse, what you should look for when adapting this paradigm and how Databricks and dbt are together enabling analytics engineers, analysts and data scientists to collaborate together on a single unified platform. You’ll hear from a customer about how they leveraged dbt Cloud on Databricks to deliver powerful customer experiences quickly and efficiently. Ken Wong shares the latest capabilities of Databricks platform and provides a sneak peek of upcoming features.

Speakers: Ken Wong, Senior Director of Product Management, Databricks; Samuel Garfield, Analytics Engineer, Retool

Register for Coalesce at https://coalesce.getdbt.com

10x-ing developer experience with Databricks, Delta, and dbt Cloud - Coalesce 2023

In this session, gain strategic guidance on how to deploy dbt Cloud seamlessly to a team of 5-85 people. You'll learn best practices across development and automation that will ensure stability and high standards as you scale the number of developers using dbt Cloud and the number of models built up to the low thousands.

This session is a great fit for folks with beginner through intermediate levels of experience with dbt. In basketball terms, this talk covers mid-range shooting skills, but does not go into detail about 3-pointers, let alone half court shots. Likewise, this talk is not for people who are brand new to dbt and aren't familiar with the basic architecture of dbt and the modern data stack.

Speakers: Chris Davis, Senior Staff Engineer, Udemy, Inc.

Register for Coalesce at https://coalesce.getdbt.com

Summary

Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable

Interview

Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it?

What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction?

What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data?

How have you worked to address that in the Decodable platform and interfaces?

As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable?

Contact Info

esammer on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Decodable

Podcast Episode

Understanding the Apache Flink Journey Flink

Podcast Episode

Debezium

Podcast Episode

Kafka Redpanda

Podcast Episode

Kinesis PostgreSQL

Podcast Episode

Snowflake

Podcast Episode

Databricks Startree Pinot

Podcast Episode

Rockset

Podcast Episode

Druid InfluxDB Samza Storm Pulsar

Podcast Episode

ksqlDB

Podcast Episode

dbt GitHub Actions Airbyte Singer Splunk Outbox Pattern

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Neo4J: NODES Conference Logo

NODES 2023 is a free online conference focused on graph-driven innovations with content for all skill levels. Its 24 hours are packed with 90 interactive technical sessions from top developers and data scientists across the world covering a broad range of topics and use cases. The event tracks: - Intelligent Applications: APIs, Libraries, and Frameworks – Tools and best practices for creating graph-powered applications and APIs with any software stack and programming language, including Java, Python, and JavaScript - Machine Learning and AI – How graph technology provides context for your data and enhances the accuracy of your AI and ML projects (e.g.: graph neural networks, responsible AI) - Visualization: Tools, Techniques, and Best Practices – Techniques and tools for exploring hidden and unknown patterns in your data and presenting complex relationships (knowledge graphs, ethical data practices, and data representation)

Don’t miss your chance to hear about the latest graph-powered implementations and best practices for free on October 26 at NODES 2023. Go to Neo4j.com/NODES today to see the full agenda and register!Rudderstack: Rudderstack

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize: Materialize

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Datafold: Datafold

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare…

Architecting Data and Machine Learning Platforms

All cloud architects need to know how to build data platforms that enable businesses to make data-driven decisions and deliver enterprise-wide intelligence in a fast and efficient way. This handbook shows you how to design, build, and modernize cloud native data and machine learning platforms using AWS, Azure, Google Cloud, and multicloud tools like Snowflake and Databricks. Authors Marco Tranquillin, Valliappa Lakshmanan, and Firat Tekiner cover the entire data lifecycle from ingestion to activation in a cloud environment using real-world enterprise architectures. You'll learn how to transform, secure, and modernize familiar solutions like data warehouses and data lakes, and you'll be able to leverage recent AI/ML patterns to get accurate and quicker insights to drive competitive advantage. You'll learn how to: Design a modern and secure cloud native or hybrid data analytics and machine learning platform Accelerate data-led innovation by consolidating enterprise data in a governed, scalable, and resilient data platform Democratize access to enterprise data and govern how business teams extract insights and build AI/ML capabilities Enable your business to make decisions in real time using streaming pipelines Build an MLOps platform to move to a predictive and prescriptive analytics approach

Vector and Raster Data Unification Through H3 | M. Colic | Tech Lead Public Sector UK&I | Databricks

Milos Colic, Tech Lead Public Sector UK&I at Databricks, demonstrates how raster and vector geospatial data can be standardised into a unified domain.

This unification facilitates an easy plugin/plugout capability for all raster and vector layers. Databricks used these principles to design an easy, scalable and extensible Flood Risk for Physical Assets solution using H3 as a unification grid.

To learn more about H3 check out: https://docs.carto.com/data-and-analysis/analytics-toolbox-for-bigquery/sql-reference/h3

Data Warehousing using Fivetran, dbt and DBSQL

In this video you will learn how to use Fivetran to ingest data from Salesforce into your Lakehouse. After the data has been ingested, you will then learn how you can transform your data using dbt. Then we will use Databricks SQL to query, visualize and govern your data. Lastly, we will show you how you can use AI functions in Databricks SQL to call language learning models.

Read more about Databricks SQL https://docs.databricks.com/en/sql/index.html#what-is-databricks-sql

Distributing Data Governance: How Unity Catalog Allows for a Collaborative Approach

As one of the world’s largest providers of content delivery network (CDN) and security solutions, Akamai owns thousands of data assets of various shapes and sizes, some even go up to multiple PBs. Several departments within the company leverage Databricks for their data and AI workloads, which means we have over a hundred Databricks workspaces within a single Databricks account, where some of the assets are shared across products, and some are product-specific.

In this presentation, we will describe how to use the capabilities of Unity Catalog to distribute the administration burden between departments, while still maintaining a unified governance model.

We will also share the benefits we’ve found in using Unity Catalog, beyond just access management, such as:

  • Visibility into which data assets we have in the organization
  • Ability to identify and potentially eliminate duplicate data workloads between departments
  • Removing boilerplate code for accessing external sources
  • Increasing innovation of product teams by exposing the data assets in a better, more efficient way

Talk by: Gilad Asulin and Pulkit Chadha

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Cross-Platform Data Lineage with OpenLineage

There are more data tools available than ever before, and it is easier to build a pipeline than it has ever been. These tools and advancements have created an explosion of innovation, resulting in data within today's organizations becoming increasingly distributed and can't be contained within a single brain, a single team, or a single platform. Data lineage can help by tracing the relationships between datasets and providing a map of your entire data universe.

OpenLineage provides a standard for lineage collection that spans multiple platforms, including Apache Airflow, Apache Spark™, Flink®, and dbt. This empowers teams to diagnose and address widespread data quality and efficiency issues in real time. In this session, we will show how to trace data lineage across Apache Spark and Apache Airflow. There will be a walk-through of the OpenLineage architecture and a live demo of a running pipeline with real-time data lineage.

Talk by: Julien Le Dem,Willy Lulciuc

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Internet-Scale Analytics: Migrating a Mission Critical Product to the Cloud

While we may not all agree on a “If it ain’t broke, don’t fix it” approach, we can all agree that “If it shows any crack, migrate it to the cloud and completely re-architect it.” Akamai’s CSI (Cloud Security Intelligence) group is responsible for processing massive amounts of security events arriving from our edge network, which is estimated to process 30% of internet traffic, making it accessible by various internal consumers powering customer-facing products.

In this session, we will visit the reasons for migrating one of our mission critical security products and its 10GB ingest pipeline to the cloud, examine our new architecture and its benefits and touch on the challenges we faced during the process (and still do). While our requirements are unique and our solution contains a few proprietary components, this session will provide you with several concepts involving popular off-the-shelf products you can easily use in your own cloud environment.

Talk by: Yaniv Kunda

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: EY | Business Value Unleashed: Real-World Accelerating AI & Data-Centric Transformation

Data and AI are revolutionizing industries and transforming businesses at an unprecedented pace. These advancements pave the way for groundbreaking outcomes such as fresh revenue streams, optimized working capital, and captivating, personalized customer experiences.

Join Hugh Burgin, Luke Pritchard and Dan Diasio as we explore a range of real-world examples of AI and data-driven transformation opportunities being powered by Databricks, including business value realized and technical solutions implemented. We will focus on how to integrate and leverage business insights, a diverse network of cloud-based solutions and Databricks to unleash new business value opportunities. By highlighting real-world use cases we will discuss:

  • Examples of how Manufacturing, Retail, Financial Services and other sectors are using Databricks services to scale AI, gain insights that matter and secure their data
  • The ways data monetization are changing how companies view data and incentivizing better data management
  • Examples of Generative AI and LLMs changing how businesses operate, how their customers engage, and what you can do about it

Talk by: Hugh Burgin and Luke Pritchard

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin