talk-data.com talk-data.com

Topic

Big Data

data_processing analytics large_datasets

1217

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

1217 activities · Newest first

In today’s data-driven era, ensuring data reliability and enhancing our testing and development capabilities are paramount. Local unit testing has its merits but falls short when dealing with the volume of big data. One major challenge is running Spark jobs pre-deployment to ensure they produce expected results and handle production-level data volumes. In this talk, we will discuss how Autodesk leveraged Astronomer to improve pipeline development. We’ll explore how it addresses challenges with sensitive and large data sets that cannot be transferred to local machines or non-production environments. Additionally, we’ll cover how this approach supports over 10 engineers working simultaneously on different feature branches within the same repo. We will highlight the benefits, such as conflict-free development and testing, and eliminating concerns about data corruption when running DAGs on production Airflow servers. Join me to discover how solutions like Astronomer empower developers to work with increased efficiency and reliability. This talk is perfect for those interested in big data, cloud solutions, and innovative development practices.

Whether big or small, one of the biggest challenges organizations face when they want to work with data effectively is often lack of access to it. This is where building a data platform comes in. But building a data platform is no easy feat. It's not just about centralizing data in the data warehouse, it’s also about making sure that data is actionable, trustable and usable. So, how do you make sure your data platform is up to par? Shuang Li is Group Product Manager at Box. With experience of building data, analytics, ML, and observability platform products for both external and internal customers, Shuang is always passionate about the insights, optimizations, and predictions that big data and AI/ML make possible. Throughout her career, she transitioned from academia to engineering, from engineering to product management, and then from an individual contributor to an emerging product executive. In the episode, Adel and Shuang explore her career journey, including transitioning from academia to engineering and helping to work on Google Fiber, how to build a data platform, ingestion pipelines, processing pipelines, challenges and milestones in building a data platform, data observability and quality, developer experience, data democratization, future trends and a lot more.  Links Mentioned in the Show: BoxConnect with Shuang on Linkedin[Course] Understanding Modern Data ArchitectureRelated Episode: Scaling Enterprise Analytics with Libby Duane Adams, Chief Advocacy Officer and Co-Founder of Alteryx New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Machine Learning Powered Auto Remediation in Netflix Data Platform

Speakers: Stephanie Vezich Tamayo (Senior Machine Learning Engineer at Netflix) Binbing Hou (Senior Software Engineer at Netflix)

This tech talk is a part of the Data Engineering Open Forum at Netflix 2024. At Netflix, hundreds of thousands of workflows and millions of jobs are running every day on our big data platform, but diagnosing and remediating job failures can impose considerable operational burdens. To handle errors efficiently, Netflix developed a rule-based classifier for error classification called “Pensive.” However, as the system has increased in scale and complexity, Pensive has been facing challenges due to its limited support for operational automation, especially for handling memory configuration errors and unclassified errors. To address these challenges, we have developed a new feature called “Auto Remediation,” which integrates the rules-based classifier with an ML service.

If you are interested in attending a future Data Engineering Open Forum, we highly recommend you join our Google Group (https://groups.google.com/g/data-engineering-open-forum) to stay tuned to event announcements.

Having a strong personal brand is one of the best things you can do to stand out from your competition in today's difficult job market.    In this episode, you'll learn why brand building should be at the top of your list, and more importantly, hear actionable tips that you can use to make progress right away.    We'll be sharing some of the best strategies, actionable advice, and personal anecdotes from two of the best personal brand builders in data, Kate Strachnyi and Kristen Kehrer.

You'll leave with a concrete path to building your brand and accelerating your career, starting today. What You'll Learn: Why personal brands matter more than ever in 2024 What a strong personal brand looks like How to start building your personal brand online   Register for free to be part of the next live session: https://bit.ly/3XB3A8b   About our guests: As the founder of DATAcated, Kate Strachnyi helps companies amplify their brand and expertise in artificial intelligence, machine learning, and data science. Kate is a content creator with over 200k followers across LinkedIn, YouTube, Instagram, and other platforms. She also runs a DATAcated Plus program with 25+ influencers that can be hired to 'make a spash' on social media.  As a marketing and branding expert, Kate has been recognized as a LinkedIn Top Voice in Data Science and Analytics for 2018 and 2019, and as a DataIQ USA100 for 2022. Kate is also the author of ColorWise: A Data Storyteller's Guide to the Intentional Use of Color.  https://www.datacated.com/brand-builder     Kristen Kehrer has been providing innovative & practical statistical modeling solutions in the utilities, healthcare, and eCommerce sectors since 2010. Alongside her professional accomplishments, she achieved recognition as a LinkedIn Top Voice in Data Science & Analytics in 2018. Kristen is also the founder of Data Moves Me, LLC, and has previously served as a faculty member and subject matter expert at the Emeritus Institute of Management and UC Berkeley Ext.

 Kristen lights up on stage and has spoken at conferences like ODSC, DataScienceGO, BI+Analytics Conference, Boye Conference, and Big Data LDN, etc.

She holds a Master of Science degree in Applied Statistics from Worcester Polytechnic Institute and a Bachelor of Science degree in Mathematics.

https://www.datamovesme.com/   Follow us on Socials: LinkedIn YouTube Instagram (Mavens of Data) Instagram (Maven Analytics) TikTok Facebook Medium X/Twitter

Databricks Certified Associate Developer for Apache Spark Using Python

This book serves as the ultimate preparation for aspiring Databricks Certified Associate Developers specializing in Apache Spark. Deep dive into Spark's components, its applications, and exam techniques to achieve certification and expand your practical skills in big data processing and real-time analytics using Python. What this Book will help me do Deeply understand Apache Spark's core architecture for building big data applications. Write optimized SQL queries and leverage Spark DataFrame API for efficient data manipulation. Apply advanced Spark functions, including UDFs, to solve complex data engineering tasks. Use Spark Streaming capabilities to implement real-time and near-real-time processing solutions. Get hands-on preparation for the certification exam with mock tests and practice questions. Author(s) Saba Shah is a seasoned data engineer with extensive experience working at Databricks and leading data science teams. With her in-depth knowledge of big data applications and Spark, she delivers clear, actionable insights in this book. Her approach emphasizes practical learning and real-world applications. Who is it for? This book is ideal for data professionals such as engineers and analysts aiming to achieve Databricks certification. It is particularly helpful for individuals with moderate Python proficiency who are keen to understand Spark from scratch. If you're transitioning into big data roles, this guide prepares you comprehensively.

Data Engineering with Databricks Cookbook

In "Data Engineering with Databricks Cookbook," you'll learn how to efficiently build and manage data pipelines using Apache Spark, Delta Lake, and Databricks. This recipe-based guide offers techniques to transform, optimize, and orchestrate your data workflows. What this Book will help me do Master Apache Spark for data ingestion, transformation, and analysis. Learn to optimize data processing and improve query performance with Delta Lake. Manage streaming data processing with Spark Structured Streaming capabilities. Implement DataOps and DevOps workflows tailored for Databricks. Enforce data governance policies using Unity Catalog for scalable solutions. Author(s) Pulkit Chadha, the author of this book, is a Senior Solutions Architect at Databricks. With extensive experience in data engineering and big data applications, he brings practical insights into implementing modern data solutions. His educational writings focus on empowering data professionals with actionable knowledge. Who is it for? This book is ideal for data engineers, data scientists, and analysts who want to deepen their knowledge in managing and transforming large datasets. Readers should have an intermediate understanding of SQL, Python programming, and basic data architecture concepts. It is especially well-suited for professionals working with Databricks or similar cloud-based data platforms.

Databricks ML in Action

Dive into the Databricks Data Intelligence Platform and learn how to harness its full potential for creating, deploying, and maintaining machine learning solutions. This book covers everything from setting up your workspace to integrating state-of-the-art tools such as AutoML and VectorSearch, imparting practical skills through detailed examples and code. What this Book will help me do Set up and manage a Databricks workspace tailored for effective data science workflows. Implement monitoring to ensure data quality and detect drift efficiently. Build, fine-tune, and deploy machine learning models seamlessly using Databricks tools. Operationalize AI projects including feature engineering, data pipelines, and workflows on the Databricks Lakehouse architecture. Leverage integrations with popular tools like OpenAI's ChatGPT to expand your AI project capabilities. Author(s) This book is authored by Stephanie Rivera, Anastasia Prokaieva, Amanda Baker, and Hayley Horn, seasoned experts in data science and machine learning from Databricks. Their collective years of expertise in big data and AI technologies ensure a rich and insightful perspective. Through their work, they strive to make complex concepts accessible and actionable. Who is it for? This book serves as an ideal guide for machine learning engineers, data scientists, and technically inclined managers. It's well-suited for those transitioning to the Databricks environment or seeking to deepen their Databricks-based machine learning implementation skills. Whether you're an ambitious beginner or an experienced professional, this book provides clear pathways to success.

This session will detail the process of architecting enterprise-grade big data pipelines, encompassing the orchestration of Ephemeral Dataproc clusters, customization through custom images and the strategic incorporation of GPU resources. Real-world use cases, best practices, challenges, and future trends in this domain will also be discussed, providing actionable insights for implementing cutting-edge big data solutions.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Learn about real-time AI-powered insights with BigQuery continuous queries, and how this new feature is poised to revolutionize data engineering by empowering event-driven and AI-driven data pipelines with Vertex AI, Pub/Sub, and Bigtable – all through the familiar language of Cloud SQL. Learn about how UPS was able to use big data on millions of shipped packages to reduce package theft, their work on more efficient claims processing, and why they are looking to BigQuery to accelerate time to insights and smarter business outcomes.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

session
by Skarpi Hedinsson (LA Rams) , Sarah Kennedy (Google Cloud) , Edward Green (McLaren Racing) , Sean Curtis (Major League Baseball (MLB)) , Daniel Brusilovsky (Golden State Warriors)

How can you distill enormous volumes of disparate data and use it to surprise and delight your customers? How are you evolving your infrastructure and processes to support greater demands to use data faster and more productively? From the playing field to the racetrack, leading organizations like the MLB, Golden State Warriors, McLaren, and LA Rams grapple with massive volumes of data. This customer panel brings together technology leaders to discuss the universal challenges along with their innovative solutions for maximizing the value of big data and AI across their businesses.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

The Data Engineer's role amidst the rise of big data, cloud computing, and AI-driven analytics has shifted. This panel chat explores the ever-changing landscape of essential skills and the automation of outdated ones. With a myriad of architectural options available, we'll dissect how organizations navigate the complexities to tailor solutions to their specific needs. Let's unravel the intricacies of building scalable data systems, pinpointing common breakpoints and strategies for efficient scaling. Come along as we delve into in constructing the foundation of the data-driven future.

Join Jamie Underwood and Andy Hannah for a transformative deep dive into the Responsible Data Revolution. In this session, they'll explore the crucial intersection of innovation and advanced analytics, delving into the legal aspects surrounding it. With Jamie's expertise in navigating the intricacies of intellectual property and Andy's deep entrepreneurial and analytical experience, we'll uncover the ethical and legal considerations that arise in the era of big data and AI. 

From privacy concerns to intellectual property rights, they'll discuss the evolving landscape of data governance and responsible innovation. Gain insights into strategies for leveraging data ethically and responsibly while maximizing its potential for transformative innovation. This session promises to equip you with the knowledge and tools to navigate the complex terrain of the data revolution responsibly.

Choosing the Right Abstraction Level for Your Kafka Project by Carlos Manuel Duclos-Vergara

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Keyword search is dead! And so are Solr and Elasticsearch? by Daniel Wrigley

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Neural Networks on the Source Code by Jameel Nabbo

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Towards Human-AI Teaming: Challenges and Opportunities of Human in the Loop AI Training

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online. Clodéric Mars & Sagar Kurandwad

An Introduction to Streaming SQL with Materialize by Marta Paes

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Big or Small Data in the Food Industry? by Antía Fernández

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.

Complex AI Forecasting Methods for Investments Portfolio Optimization by Paweł Skrzypek & Anna Warno

Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q

Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.