talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

4055

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

4055 activities · Newest first

Data Exploration and Preparation with BigQuery

In "Data Exploration and Preparation with BigQuery," Michael Kahn provides a hands-on guide to understanding and utilizing Google's powerful data warehouse solution, BigQuery. This comprehensive book equips you with the skills needed to clean, transform, and analyze large datasets for actionable business insights. What this Book will help me do Master the process of exploring and assessing the quality of datasets. Learn SQL for performing efficient and advanced data transformations in BigQuery. Optimize the performance of BigQuery queries for speed and cost-effectiveness. Discover best practices for setting up and managing BigQuery resources. Apply real-world case studies to analyze data and derive meaningful insights. Author(s) Michael Kahn is an experienced data engineer and author specializing in big data solutions and technologies. With years of hands-on experience working with Google Cloud Platform and BigQuery, he has assisted organizations in optimizing their data pipelines for effective decision-making. His accessible writing style ensures complex topics become approachable, enabling readers of various skill levels to succeed. Who is it for? This book is tailored for data analysts, data engineers, and data scientists who want to learn how to effectively use BigQuery for data exploration and preparation. Whether you're new to BigQuery or looking to deepen your expertise in working with large datasets, this book provides clear guidance and practical examples to achieve your goals.

Kafka Troubleshooting in Production: Stabilizing Kafka Clusters in the Cloud and On-premises

This book provides Kafka administrators, site reliability engineers, and DataOps and DevOps practitioners with a list of real production issues that can occur in Kafka clusters and how to solve them. The production issues covered are assembled into a comprehensive troubleshooting guide for those engineers who are responsible for the stability and performance of Kafka clusters in production, whether those clusters are deployed in the cloud or on-premises. This book teaches you how to detect and troubleshoot the issues, and eventually how to prevent them. Kafka stability is hard to achieve, especially in high throughput environments, and the purpose of this book is not only to make troubleshooting easier, but also to prevent production issues from occurring in the first place. The guidance in this book is drawn from the author's years of experience in helping clients and internal customers diagnose and resolve knotty production problems and stabilize their Kafka environments. The book is organized into recipe-style troubleshooting checklists that field engineers can easily follow when under pressure to fix an unstable cluster. This is the book you will want by your side when the stakes are high, and your job is on the line. What You Will Learn Monitor and resolve production issues in your Kafka clusters Provision Kafka clusters with the lowest costs and still handle the required loads Perform root cause analyses of issues affecting your Kafka clusters Know the ways in which your Kafka cluster can affect its consumers and producers Prevent or minimize data loss and delays in data streaming Forestall production issues through an understanding of common failure points Create checklists for troubleshooting your Kafka clusters when problems occur Who This Book Is For Site reliability engineers tasked with maintaining stability of Kafka clusters, Kafka administrators who troubleshoot production issues around Kafka, DevOps and DataOps experts who are involved with provisioning Kafka (whether on-premises or in the cloud), developers of Kafka consumers and producers who wish to learn more about Kafka

Summary

Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. One of the core complexities that needs to be addressed is the fractal set of integrations that need to be managed across the individual components. In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Developing event-driven pipelines is going to be a lot easier - Meet Functions! Memphis functions enable developers and data engineers to build an organizational toolbox of functions to process, transform, and enrich ingested events “on the fly” in a serverless manner using AWS Lambda syntax, without boilerplate, orchestration, error handling, and infrastructure in almost any language, including Go, Python, JS, .NET, Java, SQL, and more. Go to dataengineeringpodcast.com/memphis today to get started! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'll be sharing an update on my own journey of building a data platform, with a particular focus on the challenges of tool integration and maintaining a single source of truth

Interview

Introduction How did you get involved in the area of data management? data sharing weight of history

existing integrations with dbt switching cost for e.g. SQLMesh de facto standard of Airflow

Single source of truth

permissions management across application layers Database engine Storage layer in a lakehouse Presentation/access layer (BI) Data flows dbt -> table level lineage orchestration engine -> pipeline flows

task based vs. asset based

Metadata platform as the logical place for horizontal view

Contact Info

LinkedIn Website

Parting Questio

Unify your data across domains clouds and engines in OneLake | BRK223H

OneLake simplifies your data estate for the next generation, empowering you to leverage your investments for a truly hybrid and multi cloud strategy. Learn how to bring your data across clouds, accounts, and engines together faster and more efficiently than ever before. Are you using Azure Databricks? Do you want to understand how to combine that with Microsoft Fabric? We've got you covered. No matter where your data is, OneLake is where we accelerate your data potential, together.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK223H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill * https://aka.ms/azure-ignite2023-dataaiblog

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Joshua Caplan * Priya Sathy * Thasmika Gokal * Tyler Mays-Childers * Ed Donahue * Matthew Hicks * Swetha Mannepalli * Trevor Olson

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK223H | English (US) | Data

MSIgnite

Master Platform Engineering: Architecting Scalable and Resilient Systems | BRK209

Organizations have diverse application estates that lead to operations teams managing variable application stacks and complex integrations that are error-prone and difficult to re-use. This session covers platform engineering best practices so you can provide development teams with a consistent and automated experience that empowers them to ship new functionality quickly, securely, and use cloud services as efficiently as possible.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK209H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Dan Sol * Mark Weitzel * Russell Conard * Ed Donahue

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK209 | English (US) | AI & Apps

MSIgnite

AI and Kubernetes: A winning combination for Modern App Development | BRK208H

The future of app development is at the intersection of AI and cloud-native technologies like Kubernetes. Whether you’re a Dev team using generative AI or an Ops teams balancing innovation with security, compliance, and cost, Azure has the tools to help you succeed. Discover how: 1. Cutting-edge features in Azure Kubernetes Service, Azure Functions, & Azure Container Apps help seamlessly bring your intelligent apps to production. 2. AI assistance built into Azure empowers Dev and Ops to scale.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK208H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill * https://aka.ms/azure-ignite2023-dataaiblog

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Devanshi Joshi * Jorge Palma * Kamala Dasika * Daria Grigoriu * Tara E Walker * Ed Donahue * Nate Ceres * Simon Jakesch * Thiago Almeida

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK208H | English (US) | AI & Apps

MSIgnite

The consequences of data not being easily accessible within an organization are profound. Good decision-making often relies on good information, and with crucial insights locked behind closed doors, decision-makers may have to rely on incomplete information, stifling their ability to innovate through a lack of comprehensive data access or an inability to leverage data to its full potential. The ramifications of this are not merely operational – they extend to the core of an organization's ability to thrive in the data-driven era. However, democratizing access to data is only the first hurdle in driving a data led organization, employees need to feel confident in their ability to use data, try new tools and adopt new processes. But who is best to show us the benefits of accessing and utilizing data currently, and the cultural benefits it can bring.  Lilac Schoenbeck is the Vice President of Strategic Initiatives at Rocket Software. Lilac has two decades of experience in enterprise software, data center technology and cloud, with wider experience in product marketing, pricing and packaging, corporate strategy, M&A integrations and product management. Lilac is passionate about delivering exceptional technology to IT teams that helps them drive value for their businesses.  In the episode, Richie and Lilac explore data democratization and the importance of having widespread data capabilities across an organization, common data problems that data democratization can solve, tooling to facilitate better access and use of data, tool and process adoption, confidence with data, good data culture, processes to encourage good data usage and much more.  Links mentioned in the show Rocket SoftwareWhat Does Democratizing Data Mean? Unlocking the Power of Data CulturesDemocratizing Data in Large Enterprises[Course] Introduction to Data Culture

Summary

The dbt project has become overwhelmingly popular across analytics and data engineering teams. While it is easy to adopt, there are many potential pitfalls. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro. That’s three free boards at dataengineeringpodcast.com/miro. Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Dustin Dorsey and Cameron Cyr about how to design your dbt projects

Interview

Introduction How did you get involved in the area of data management? What was your path to adoption of dbt?

What did you use prior to its existence? When/why/how did you start using it?

What are some of the common challenges that teams experience when getting started with dbt?

How does prior experience in analytics and/or software engineering impact those outcomes?

You recently wrote a book to give a crash course in best practices for dbt. What motivated you to invest that time and effort?

What new lessons did you learn about dbt in the process of writing the book?

The introduction of dbt is largely res

Get superior price and performance with Azure cloud-scale databases | BRK224H

Improve performance with the latest capabilities for Azure SQL Databases, Azure Database for PostgreSQL, and SQL Server enabled by Azure Arc for hybrid and multi-cloud. You’ll learn how customers enabled ongoing innovation by migrating to Azure Database for MySQL. This session will cover tactical ways to get the most from your applications with the databases that are easy to use, deliver unmatched price/performance, support open-source and enable transformative AI technologies.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK224H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill * https://aka.ms/ArcSQL * https://aka.ms/azure-ignite2023-dataaiblog

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Chandra Gavaravarapu * Maximilian Conrad * Shireesh Thota * Simon Faber * Vlad Rabenok * Xiaoxuan Guo * Ed Donahue * Aditya Badramraju * Bob Ward * Denzil Ribeiro * Parikshit Savjani

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK224H | English (US) | Data

MSIgnite

Get ready for ESG Regulation with Microsoft Cloud for Sustainability | BRK272H

New and emerging reporting regulations like CSRD will require many organizations to provide transparency into their environmental, social, and governance (ESG) sustainability progress. This poses a challenge for many companies and means collecting more environmental social and governance data than ever before. Come learn how expanded functionality in Microsoft Cloud for Sustainability, including AI, water, and waste data capabilities can help address these requirements.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK272H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Brandon Potter * Kevin Magarian * Shefy Manayil Kareem * Alejandro Gutierrez * Ravi Gangadharan * Ed Donahue * Heather Wikene * Nikol Vladinska * Robin Smith * Rohan Jha * Shelly Bakke

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK272H | English (US) | Data

MSIgnite

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services

This book is your practical and comprehensive guide to learning Google Cloud Platform (GCP) for data science, using only the free tier services offered by the platform. Data science and machine learning are increasingly becoming critical to businesses of all sizes, and the cloud provides a powerful platform for these applications. GCP offers a range of data science services that can be used to store, process, and analyze large datasets, and train and deploy machine learning models. The book is organized into seven chapters covering various topics such as GCP account setup, Google Colaboratory, Big Data and Machine Learning, Data Visualization and Business Intelligence, Data Processing and Transformation, Data Analytics and Storage, and Advanced Topics. Each chapter provides step-by-step instructions and examples illustrating how to use GCP services for data science and big data projects. Readers will learn how to set up a Google Colaboratory account and run Jupyternotebooks, access GCP services and data from Colaboratory, use BigQuery for data analytics, and deploy machine learning models using Vertex AI. The book also covers how to visualize data using Looker Data Studio, run data processing pipelines using Google Cloud Dataflow and Dataprep, and store data using Google Cloud Storage and SQL. What You Will Learn Set up a GCP account and project Explore BigQuery and its use cases, including machine learning Understand Google Cloud AI Platform and its capabilities Use Vertex AI for training and deploying machine learning models Explore Google Cloud Dataproc and its use cases for big data processing Create and share data visualizations and reports with Looker Data Studio Explore Google Cloud Dataflow and its use cases for batch and stream data processing Run data processing pipelines on Cloud Dataflow Explore Google Cloud Storageand its use cases for data storage Get an introduction to Google Cloud SQL and its use cases for relational databases Get an introduction to Google Cloud Pub/Sub and its use cases for real-time data streaming Who This Book Is For Data scientists, machine learning engineers, and analysts who want to learn how to use Google Cloud Platform (GCP) for their data science and big data projects

Unlock innovation with AI by migrating enterprise apps to App Service | BRK207H

Discover why Azure App Service is as fast growing as the managed platform of choice for migrating on-premises .NET and Java apps to the cloud. Learn how to deploy your web applications with ease, using built-in support for containers like GitHub and DevOps. Secure your apps with SSL, authentication, and firewall features. We'll also share latest tools and innovations to lower the cost and time to complete your migration projects.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK207H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill * https://aka.ms/azure-ignite2023-dataaiblog

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Gaurav Seth * Scott Hunter * Tulika Chaudharie * Byron Tardif * Ed Donahue * Michael YenChi Ho * Stefan Schackow * Yutang Lin

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK207H | English (US) | AI & Apps

MSIgnite

Bring enhanced manageability to SQL Server anywhere with Azure Arc | OD45

Join this discussion to discover how connecting your SQL Servers to Azure can enhance your management, security, and governance capabilities with live demos. SQL Server enabled by Azure Arc is a hybrid cloud solution that allows you to manage, secure and govern your SQL Server estate running anywhere from Azure. Our experts will also explore different options for deploying Azure Arc to your SQL Servers at scale.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsOD45 * https://aka.ms/ArcSQL

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Dhananjay Mahajan * Lance Wright * Nikita Takru * Raj Pochiraju

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

OD45 | English (US) | Data

MSIgnite

Network Innovation and Cloud Computing Converge with Alkira and Azure | OD27

Microsoft Azure is renowned for its high performance, reliability, and scale. With over 60 regions globally, Azure provides the infrastructure necessary for the Alkira cloud network platform to bridge the gap between all types of networking in the cloud or on-premises. In this session, join Alkira’s CEO Amir Khan and a current customer, as they discuss Azure, Alkira, and cloud network strategy based on real-world experiences.

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Amir Khan * Sukruth Srikantha

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

OD27 | English (US) | AI & Apps

MSIgnite

Accelerate your business with the Microsoft commercial marketplace  | OD25

The Microsoft commercial marketplace is the most comprehensive marketplace to optimize your entire cloud portfolio. With a wide range of offerings covering categories like AI, security, and business applications, as well as tailored solutions for industry-specific needs such as healthcare and financial services, our marketplace streamlines your cloud operations, maximizing the value of your cloud investments and fostering innovation.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsOD25 * https://Azure.com/marketplace * https://aka.ms/marketplacecustomerdocs * https://www.youtube.com/watch?v=QrmQKVlksJs

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Felipe Ospina * Tricia Apperson * Will Kearl

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

OD25 | English (US) | AI & Apps

MSIgnite

Send us a text He's BACK!  Roger Premo, General Manager, Corporate Strategy and Ventures Development at IBM.  How the world has changed in a short year.  Generative AI and more!   02:29 Meet Roger Premo Take 205:52 A Changing World08:18 Generative AI12:48 Both Sides of the Story14:22 Hybrid Cloud and AI20:50 IBM's watsonx25:53 What Have We Learned?27:46 Enterprise Models29:59 Hugging Face31:03 IBM's Differentiation32:23 The 2 min Bar Pitch35:57 Three Questions42:21 An Intentional Hybrid Cloud Architecture 46:40 Responsible AILinkedin: https://www.linkedin.com/in/ropremo/ Website: https://www.ibm.com/watsonx Want to be featured as a guest on Making Data Simple?  Reach out to us at [email protected] and tell us why you should be next.  The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Summary

Software development involves an interesting balance of creativity and repetition of patterns. Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. In this episode Eran Yahav shares the journey that he has taken in building this product and the ways that it enhances the ability of humans to get their work done, and when the humans have to adapt to the tool.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Eran Yahav about building an AI powered developer assistant at Tabnine

Interview

Introduction How did you get involved in machine learning? Can you describe what Tabnine is and the story behind it? What are the individual and organizational motivations for using AI to generate code?

What are the real-world limitations of generative AI for creating software? (e.g. size/complexity of the outputs, naming conventions, etc.) What are the elements of skepticism/overs

In today's cloud ecosystem, many laud the visible pillars of AWS's Well-Architected Framework, yet an essential component often remains in the shadows: Infrastructure as Code (IAC). Elizabeth Adeotun Adegbaju, a DevOps Engineer with a rich history in AWS cloud infrastructure, unravels the indispensable role of IAC in fortifying each of the renowned AWS pillars. Through this illuminating talk, attendees will gain insights into the intricate interplay between IAC and AWS's principles of operational excellence, cost optimization, reliability, performance efficiency, security, and sustainability. Dive deep into real-world examples, understand the potential pitfalls of overlooking IAC, and emerge with a renewed appreciation for its foundational significance in cloud architecture. This session is a clarion call for organizations to recognize and harness the power of IAC, positioning it not just as an option but as an imperative in achieving success in the cloud.