talk-data.com talk-data.com

Topic

API

Application Programming Interface (API)

integration software_development data_exchange

856

tagged

Activity Trend

65 peak/qtr
2020-Q1 2026-Q1

Activities

856 activities · Newest first

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don’t), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! In this episode, we explore: LLMs Gaming the System: Uncover how LLMs are using political sycophancy and tool-using flattery to game the system. Dive deeper: paper, chain of thought prompting & post on x.Recording Industry Association of America (RIAA) Sue AI Music Generators: They are taking on Suno and Udio for using copyrighted music to train their models. Some ai generated music that is very similar to existing songs: song 1, song 2, song 3. More on GenAI: midjourney creating copyrighted images, and chatGPT reciting email-adresses.AI-Powered Olympic Recaps: NBC’s personalized daily recaps with Al Michaels' voice offer a new way to catch up on the Olympics.Figma’s AI Redesign: Discover Figma’s new AI tools that speed up design and creativity. We debate the tool's value and its application in the design process. Rabbit R1 Security Flaws: Hackers exposed hardcoded API keys in Rabbit R1’s source code, leading to major security issues. Find out more.Pyinstrument for Python: Meet Pyinstrument, the easy-to-use Python profiler that optimizes code performance. Explore it on GitHub.The Ultimate Font - Bart’s dreams come true: Explore the groundbreaking integration of True Type Fonts with AI for dynamic text rendering. Discover more here.Hot Takes on AI Competition: Google claims no one has a moat in AI, sparking debate on open-source models' future. We also explore Ladybird Browser Project, an independently funded browser project aiming to build a cutting-edge browser engine.

When developing Machine Learning (ML) models, the biggest challenges are often infrastructural. How do we deploy our model and expose an inference API? How can we retrain? Can we continuously evaluate performance and monitor model drift? In this talk, we will present how we are tackling these problems at the Philadelphia Phillies by developing a suite of tools that enable our software engineering and analytics teams to train, test, evaluate, and deploy ML models - that can be entirely orchestrated in Airflow. This framework abstracts away the infrastructural complexities that productionizing ML Pipelines presents and allows our analysts to focus on developing robust baseball research for baseball operations stakeholders across player evaluation, acquisition, and development. We’ll also look at how we use Airflow, MLflow, MLServer, cloud services, and GitHub Actions to architect a platform that supports our framework for all points of the ML Lifecycle.

Soon we will finally switch to a 100% React UI with a full separation between the API and UI as well. While we are doing such a big change, let’s also take the opportunity to imagine whole new interfaces vs just simply modernizing the existing views. How can we use design to help you better understand what is going on with your DAG? Come listen to some of our proposed ideas and bring your own big ideas as the second half will be an open discussion.

In his presentation, Elad will provide a novel take on Airflow, highlighting its versatility beyond conventional use for scheduled pipelines. He’ll discuss its application as an on-demand tool for initiating and halting jobs, mainly in the Data Science fields, like dataset enrichment and batch prediction via API calls, complete with real-time status tracking and alerts. The talk aims to encourage a fresh approach to Airflow utilization but will also delve into the technical aspects of implementing DAG triggering and cancellation logic. What will the audience learn: Real-life use case of leveraging Airflow capabilities beyond traditional pipeline scheduling, with innovative integration as the infrastructure for ML Platform. Trigger on-demand DAGs through API. Cancel running DAGs. Demonstration of an end-to-end ML pipeline utilizing AWS Sagemaker for batch predictions. Some more Airflow best practices. Join us to learn from Wix’s experience and best practices!

Airflow is all about schedules…we use CRON strings and Timetable to define schedules, and there’s an Airflow Scheduler component that manages those timetables, and a lot more, to ensure that DAGs and tasks are addressed based on those schedules. But what do you do if your data isn’t available on a schedule? What if data is coming from many sources, at varying times, and your job is to make sure it’s all as up-to-date as possible? An event-driven data pipeline may be the answer. An event-driven architecture (or EDA) is an architecture pattern that uses events to decouple an application’s components. It relies on external events, not an internal schedule, to create loosely coupled data pipelines that determine when to take action, and what actions to take. In this session, we will discuss the design considerations when using Airflow in an EDA and the tools Airflow has to make this happen, including Datasets, REST API, Dynamic Task Mapping, custom Timetables, Sensors, and queues.

There are many Airflow tutorials. However, many don’t show the full process of sourcing, transforming, testing, alerting, documenting, and finally supplying data. This talk with go over how to piece together an end-to-end Airflow project that transforms raw data to be consumable by the business. It will include how various technologies can all be orchestrated by Airflow to satisfy the needs of analysts, engineers, and business stakeholders. The talk will be divided into the following sections: Introduction: Introducing the business problem and how we came up with the solution design Data sourcing: Fetching and storing API data using basic operators and hooks Transformation and Testing: How to use dbt to build and test models based on the raw data Alerting: Alerting the necessary parties when any part of this DAG fails using Slack Consumption: How to make dynamic data accessible to business stakeholders

Gen AI has taken the computing world by storm. As Enterprises and Startups have started to experiment with LLM applications, it has become clear that providing the right context to these LLM applications is critical. This process known as Retrieval augmented generation (RAG) relies on adding custom data to the large language model, so that the efficacy of the response can be improved. Processing custom data and integrating with Enterprise applications is a strength of Apache Airflow. This talk goes into details about a vision to enhance Apache Airflow to more intuitively support RAG, with additional capabilities and patterns. Specifically, these include the following Support for unstructured data sources such as Text, but also extending to Image, Audio, Video, and Custom sensor data LLM model invocation, including both external model services through APIs and local models using container invocation. Automatic Index Refreshing with a focus on unstructured data lifecycle management to avoid cumbersome and expensive index creation on Vector databases Templates for hallucination reduction via testing and scoping strategies

Are you looking to harness the full potential of data-driven pipelines with Apache Airflow? This session will dive into the newly introduced conditional expressions for advanced dataset scheduling in Airflow - a feature highly requested by the Airflow community. Attendees will learn how to effectively use logical operators to create complex dependencies that trigger DAGs based on the dataset updates in real-world scenarios. We’ll also explore the innovative DatasetOrTimeSchedule, which combines time-based and dataset-triggered scheduling for unparalleled flexibility. Furthermore, attendees will discover the latest API endpoints that facilitate external updates and resets of dataset events, streamlining workflow management across different deployments. This talk also aims to explain: The basics of using conditional expressions for dataset scheduling. How do we integrate time-based schedules with dataset triggers? Practical applications of the new API endpoints for enhanced dataset management. Real-world examples of how these features can optimize your data workflows.

“More data lineage” has been second most popular feature request in Airflow Survey 2023. However, despite the integration of OpenLineage in Airflow 2.7 through AIP-53, the most popular Operator in Airflow - PythonOperator - isn’t covered by lineage support. With addition of TaskFlow API, Airflow Datasets, Airflow ObjectStore, and many other small changes, writing DAGs without using other operators is easier than ever. And that’s why lineage collection in Airflow moves beyond covering specific Operators, to covering Hooks and Object Storage. In this session, you’ll learn how newly added AIP-62 will allow you author DAGs the way you love, while also keeping benefits of a data pipeline well covered by lineage.

Application Programming Interfaces (APIs) are as pervasive as they are critical to the functioning of the modern world. That personalized and content-rich product page with a sub-second load time on Amazon? That's just a couple-hundred API calls working their magic. Every experience on your mobile device? Loaded with APIs. But, just because they're everywhere doesn't mean that they spring forth naturally from the keystrokes of a developer. There's a lot more going on that requires real thought and planning, and the boisterous arrival of AI to mainstream modernity has made the role of APIs and their underlying infrastructure even more critical. On this episode, Moe, Julie, and Tim dug into the fascinating world with API Maven Marco Palladino, the co-founder and CTO at Kong, Inc. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

The Evolution of Delta Lake from Data + AI Summit 2024

Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format.

Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - DuckDB integration with Delta - Delta 4.0

AWS re:Inforce 2024 - Cloud upgrade: Modern TLS encryption for all AWS service connections (DAP304)

In this session, hear how AWS is upgrading the cloud by enabling the modern encryption protocol Transport Layer Security (TLS) 1.3 globally and removing outdated TLS 1.0/1.1 for all AWS service API endpoints. Learn about the higher security and faster connection performance TLS 1.3 provides and how you can check that your applications are using it. Find out how we minimized the customer impact of these large changes across millions of differing client configurations, including some of the unique challenges we encountered and how we overcame those obstacles. Lastly, learn about future TLS technologies for AWS, including QUIC and post-quantum cryptography.

Learn more about AWS re:Inforce at https://go.aws/reinforce.

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

reInforce2024 #CloudSecurity #AWS #AmazonWebServices #CloudComputing

Databricks Certified Associate Developer for Apache Spark Using Python

This book serves as the ultimate preparation for aspiring Databricks Certified Associate Developers specializing in Apache Spark. Deep dive into Spark's components, its applications, and exam techniques to achieve certification and expand your practical skills in big data processing and real-time analytics using Python. What this Book will help me do Deeply understand Apache Spark's core architecture for building big data applications. Write optimized SQL queries and leverage Spark DataFrame API for efficient data manipulation. Apply advanced Spark functions, including UDFs, to solve complex data engineering tasks. Use Spark Streaming capabilities to implement real-time and near-real-time processing solutions. Get hands-on preparation for the certification exam with mock tests and practice questions. Author(s) Saba Shah is a seasoned data engineer with extensive experience working at Databricks and leading data science teams. With her in-depth knowledge of big data applications and Spark, she delivers clear, actionable insights in this book. Her approach emphasizes practical learning and real-world applications. Who is it for? This book is ideal for data professionals such as engineers and analysts aiming to achieve Databricks certification. It is particularly helpful for individuals with moderate Python proficiency who are keen to understand Spark from scratch. If you're transitioning into big data roles, this guide prepares you comprehensively.

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! In this episode, we dive deep into the fascinating and complex world of AI with our special guest, Senne Batsleer: De Mol + AI Voices: Exploring the use of AI-generated voices to disguise the mole in the Belgian TV show "The Mole". Our guest, Senne Batsleer, shares insights from their experience with AI voice technology. Scarlett Johansson vs OpenAI: Delving into the controversy of OpenAI using a voice eerily similar to Scarlett Johansson's in their new AI model. Read more in The Guardian and The Washington Post. Elon Musk’s xAI Raises $6B: A look into Elon Musk’s latest venture, xAI, and its ambitious funding round, aiming to challenge AI giants like OpenAI and Microsoft. OpenAI and News Corp’s $250M Deal: The implications of OpenAI’s data deal with News Corp.  Google AI Search Risks: Examining Google's AI search providing potentially dangerous answers based on outdated Reddit comments. Find out more on The Verge and BBC.  Humane’s AI Pin Looking for a Buyer: Discussing the struggles of Humane’s wearable AI device and its search for a buyer following a rocky debut. PostgREST Turns Databases into APIs: Exploring the concept of turning PostgreSQL databases directly into RESTful APIs, enhancing real-time applications. Risks of Expired Domain Names: Highlighting the dangers of expired domains and how they can be exploited by hackers.  The 'Dead Internet' Theory: Debating the rise of bots on the web and their potential to surpass human activity online. 

Kafka Streams in Action, Second Edition

Everything you need to implement stream processing on Apache KafkaⓇ using Kafka Streams and the kqsIDB event streaming database. Kafka Streams in Action, Second Edition guides you through setting up and maintaining your streaming processing with Kafka. Inside, you’ll find comprehensive coverage of not only Kafka Streams, but the entire toolbox you’ll need for effective streaming—from the components of the Kafka ecosystem, to Producer and Consumer clients, Connect, and Schema Registry. In Kafka Streams in Action, Second Edition you’ll learn how to: Design streaming applications in Kafka Streams with the KStream and the Processor API Integrate external systems with Kafka Connect Enforce data compatibility with Schema Registry Build applications that respond immediately to events in either Kafka Streams or ksqlDB Craft materialized views over streams with ksqlDB This totally revised new edition of Kafka Streams in Action has been expanded to cover more of the Kafka platform used for building event-based applications. You’ll also find full coverage of ksqlDB, an event streaming database that makes it a snap to create applications that respond immediately to events, such as real-time push and pull updates. About the Technology Enterprise applications need to handle thousands—even millions—of data events every day. With an intuitive API and flawless reliability, the lightweight Kafka Streams library has earned a spot at the center of these systems. Kafka Streams provides exactly the power and simplicity you need to manage real-time event processing or microservices messaging. About the Book Kafka Streams in Action, Second Edition teaches you how to create event streaming applications on the amazing Apache Kafka platform. This thoroughly revised new edition now covers a wider range of streaming architectures and includes data integration with Kafka Connect. As you go, you’ll explore real-world examples that introduce components and brokers, schema management, and the other essentials. Along the way, you’ll pick up practical techniques for blending Kafka with Spring, low-level control of processors and state stores, storing event data with ksqlDB, and testing streaming applications. What's Inside Design efficient streaming applications Integrate external systems with Kafka Connect Enforce data compatibility with Schema Registry About the Reader For Java developers. No knowledge of Kafka or streaming applications required. About the Author Bill Bejeck is a Confluent engineer and a Kafka Streams contributor with over 15 years of software development experience. Bill is also a committer on the Apache KafkaⓇ project. Quotes Comprehensive streaming data applications are only a few years away from becoming the reality, and this book is the guide the industry has been waiting for to move beyond the hype. - Adi Polak, Director, Developer Experience Engineering, Confluent Covers all the key aspects of building applications with Kafka Streams. Whether you are getting started with stream processing or have already built Kafka Streams applications, it is an essential resource. - Mickael Maison, Principal Software Engineer, Red Hat Serves as both a learning and a resource guide, offering a perfect blend of ‘how-to’ and ‘why-to.’ Even if you have been using Kafka Streams for many years, I highly recommend this book. - Neil Buesing, CTO & Co-founder, Kinetic Edge

Digital Transformation of SAP Supply Chain Processes: Build Mobile Apps Using SAP BTP and SAP Mobile Services

Take a high-level tour of SAP oDATA integrations with frontend technologies like Angular using the SAP Mobile Services Platform. This book will give you a different perspective on executing SAP transactions on iOS using Angular instead of SAP-provided Fiori-based applications. You’ll start by learning about SAP supply chain processes such as Goods Receipt, Transfer Posting, Goods Issue, and Inventory Search. You’ll then move on to understanding the thought process involved in integrating SAP's backend (SAP ECC) with Angular iOS app using SAP Mobile Services running on SAP BTP. All this will serve as a guide tailored to SAP functional and technical consultants actively engaged in client-facing roles. You’ll follow a roadmap for modernizing and streamlining supply chain operations by leveraging Angular iOS apps. Digital Transformation of SAP Supply Chain Processes provides the essential tools for businesses looking to stay competitive in today's technology-driven landscape. What You Will Learn Study the fundamental procedures to set up the Authorization Endpoint, Token Endpoint, and base URL within SAP Mobile Services. Manage attachments in mobile applications and store them in an external content repository. Gain proficiency in testing OData services using the POSTMAN API client with OAuth protocol. Acquire knowledge about the JSON messages, CORS protocol, and X-CSRF token exchange. Link Zebra Printers through the Zebra Native Printing app on iOS App to print SAP forms on mobile printers. Who This Book Is For SAP Consultants with an interest in the Digital Transformation of SAP Supply Chain Processes to iOS-based SAP transactions.

Protocol Buffers Handbook

The "Protocol Buffers Handbook" by Clément Jean offers an in-depth exploration of Protocol Buffers (Protobuf), a powerful data serialization format. Learn everything from syntax and schema evolution to custom validations and cross-language integrations. With practical examples in Go and Python, this guide empowers you to efficiently serialize and manage structured data across platforms. What this Book will help me do Develop advanced skills in using Protocol Buffers (Protobuf) for efficient data serialization. Master the key concepts of Protobuf syntax and schema evolution for compatibility. Learn to create custom validation plugins and tailor Protobuf processes. Integrate Protobuf with multiple programming environments, including Go and Python. Automate Protobuf projects using tools like Buf and Bazel to streamline workflows. Author(s) Clément Jean is a skilled programmer and technical writer specializing in data serialization and distributed systems. With substantial experience in developing scalable microservices, he shares valuable insights into using Protocol Buffers effectively. Through this book, Clément offers a hands-on approach to Protobuf, blending theory with practical examples derived from real-world scenarios. Who is it for? This book is perfect for software engineers, system integrators, and data architects who aim to optimize data serialization and APIs, regardless of their programming language expertise. Beginners will grasp foundational Protobuf concepts, while experienced developers will extend their knowledge to advanced, practical applications. Those working with microservices and heavily data-dependent systems will find this book especially relevant.

Software Engineering for Data Scientists

Data science happens in code. The ability to write reproducible, robust, scaleable code is key to a data science project's success—and is absolutely essential for those working with production code. This practical book bridges the gap between data science and software engineering, and clearly explains how to apply the best practices from software engineering to data science. Examples are provided in Python, drawn from popular packages such as NumPy and pandas. If you want to write better data science code, this guide covers the essential topics that are often missing from introductory data science or coding classes, including how to: Understand data structures and object-oriented programming Clearly and skillfully document your code Package and share your code Integrate data science code with a larger code base Learn how to write APIs Create secure code Apply best practices to common tasks such as testing, error handling, and logging Work more effectively with software engineers Write more efficient, maintainable, and robust code in Python Put your data science projects into production And more