talk-data.com talk-data.com

Topic

API

Application Programming Interface (API)

integration software_development data_exchange

856

tagged

Activity Trend

65 peak/qtr
2020-Q1 2026-Q1

Activities

856 activities · Newest first

Feed The Alligators With the Lights On: How Data Engineers Can See Who Really Uses Data | Stemma

ABOUT THE TALK: At Lyft, Mark Grover built the Amundsen data catalog so data scientists could navigate hundreds of thousands of tables to distinguish trustworthy data from sandboxed, out-of-date data. When he took Amundsen open source, he helped dozens of data teams support a variety of demands to make data discoverable and self-serve. Mark frequently sees processes that seem “good enough” come back to bite data teams. In this talk, Mark takes us deep into query logs and APIs to see where all of that metadata lives, and he'll demonstrate how to use it so you don’t lose any fingers during your next data change.

ABOUT THE SPEAKER: Mark Grover is the co-founder/CEO of Stemma - a modern data catalog for building self-serve data culture used by Grafana, iRobot, SoFi, Convoy and many others. He is the co-creator of the leading open-source data catalog, Amundsen, used by Lyft, Instacart, Square, ING, Snap and many more! ​Mark was previously a developer on Apache Spark at Cloudera and is a committer and PMC member on a few open-source Apache project. He is a co-author of Hadoop Application Architectures.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

ChatGPT has leaped into the forefront of our lives—everyone from students to multinational organizations are seeing value in adding a chat interface to an LLM. But OpenAI has been concentrating on this for years, steadily developing one of the most viral digital products this century. In this episode of our AI series, we sit down with Logan Kilpatrick. Logan currently leads developer relations at OpenAI, supporting developers building with DALL-E, the OpenAI API, and ChatGPT. Logan takes us through OpenAI’s products, API, and models, and provides insights into the many use cases of ChatGPT.  Logan provides fascinating information on ChatGPT’s plugins and how they can be used to build agents that help us in a variety of contexts. He also discusses the future integration of LLMs into our daily lives and how it will add structure to the unstructured nature and difficult-to-leverage data we generate and interact with on a daily basis. Logan also touches on the powerful image input features in GPT4, how it can help those with partial sight to improve their quality of life, and how it can be used for various other use cases. Throughout the episode, we unpack the need for collaboration and innovation, due to ChatGPT becoming more powerful when integrated with other pieces of software. Covering key discussion points with regard to AI tools currently, in particular, what could be built in-house by OpenAI and what could be built in the public domain. Logan also discusses the ecosystem forming around ChatGPT and how it will all become connected going forward. Finally, Logan shares tips for getting better responses from ChatGPT and the things to consider when integrating it into your organization’s product.  This episode provides a deep dive into the world of GPT models from within the eye of the storm, providing valuable insights to those interested in AI and its practical applications in our daily lives.

The Modern Data Stack has brought a lot of new buzzwords into the data engineering lexicon: "data mesh", "data observability", "reverse ETL", "data lineage", "analytics engineering". In this light-hearted talk we will demystify the evolving revolution that will define the future of data analytics & engineering teams.

Our journey begins with the PyData Stack: pandas pipelines powering ETL workflows...clean code, tested code, data validation, perfect for in-memory workflows. As demand for self-serve analytics grows, new data sources bring more APIs to model, more code to maintain, DAG workflow orchestration tools, new nuances to capture ("the tax team defines revenue differently"), more dashboards, more not-quite-bugs ("but my number says this...").

This data maturity journey is a well-trodden path with common pitfalls & opportunities. After dashboards comes predictive modelling ("what will happen"), prescriptive modelling ("what should we do?"), perhaps eventually automated decision making. Getting there is much easier with the advent of the Python Powered Modern Data Stack.

In this talk, we will cover the shift from ETL to ELT, the open-source Modern Data Stack tools you should know, with a focus on how dbt's new Python integration is changing how data pipelines are built, run, tested & maintained. By understanding the latest trends & buzzwords, attendees will gain a deeper insight into Python's role at the core of the future of data engineering.

Tired of having to handle asynchronous processes for neuroevolution? Do you want to leverage massive vectorization and high-throughput accelerators for evolution strategies (ES)? evosax allows you to leverage JAX, XLA compilation and auto-vectorization/parallelization to scale ES to your favorite accelerators. In this talk we will get to know the core API and how to solve distributed black-box optimization problems with evolution strategies.

Asynchronous programming is a type of parallel programming in which a unit of work is allowed to run separately from the primary application thread. Post execution, it notifies the main thread about the completion or failure of the worker thread. There are numerous benefits to using it, such as improved application performance, enhanced responsiveness, and effective usage of CPU.

Asynchronicity seems to be a big reason why Node.js is so popular for server-side programming. Most of the code we write, especially in heavy IO applications like websites, depends on external resources. This could be anything from a remote database POST API call. As soon as you ask for any of these resources, your code is waiting around for process completion with nothing to do. With asynchronous programming, you allow your code to handle other tasks while waiting for these other resources to respond.

In this session, we are going to talk about asynchronous programming in Python. Its benefits and multiple ways to implement it.

Let’s say you are the ruler of a remote island. For it to succeed and thrive you can’t expect it to be isolated from the world. You need to establish trade routes, offer your products to other islands, and import items from them. Doing this will certainly make your economy grow! We’re not going to talk about land masses or commerce, however, you should think of your application as an island that needs to connect to other applications to succeed. Unfortunately, the sea is treacherous and is not always very consistent, similar to the networks you use to connect your application to the world.

We will explore some techniques and libraries in the Python ecosystem used to make your life easier while dealing with external services. From asynchronicity, caching, testing, and building abstractions on top of the APIs you consume, you will definitely learn some strategies to build your connected application gracefully, and avoid those pesky 2 AM errors that keep you awake.

Get ready to level up your big data processing skills! Join us for an introductory talk on Apache Spark, the distributed computing system used by tech giants like Netflix and Amazon. We'll cover PySpark DataFrames and how to use them. Whether you're a Python developer new to big data or looking to explore new technologies, this talk is for you. You'll gain foundational knowledge about Apache Spark and its capabilities, and learn how to leverage DataFrames and SQL APIs to efficiently process large amounts of data. Don't miss out on this opportunity to up your big data game!

Over the past decade, developers, researchers, and the community have successfully built tens of thousands of data applications using Spark. Since then, use cases and requirements of data applications have evolved: Today, every application, from web services that run in application servers, interactive environments such as notebooks and IDEs, to phones and edge devices such as smart home devices, want to leverage the power of data.

However, Spark's driver architecture is monolithic, running client applications on top of a scheduler, optimizer and analyzer. This architecture makes it hard to address these new requirements: there is no built-in capability to remotely connect to a Spark cluster from languages other than SQL.

Spark Connect introduces a decoupled client-server architecture for Apache Spark that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, Notebooks and programming languages.

This talk highlights how simple it is to connect to Spark using Spark Connect from any data applications or IDEs. We will do a deep dive into the architecture of Spark Connect and give an outlook of how the community can participate in the extension of Spark Connect for new programming languages and frameworks - to bring the power of Spark everywhere.

An exchange of views on fastAPI in practice.

FastAPI is great, it helps many developers create REST APIs based on the OpenAPI standard and run them asynchronously. It has a thriving community and educational documentation.

FastAPI does a great job of getting people started with APIs quickly.

This talk will point out some obstacles and dark spots that I wish we had known about before. In this talk we want to highlight solutions.

In this talk, we will introduce the audience to DoWhy, a library for causal machine-learning (ML). We will introduce typical problems where causal ML can be applied and will specifically do a deep dive on root cause analysis using DoWhy. To do this, we will lay out what typical problem spaces for causal ML look like, what kind of problems we're trying to solve, and then show how to use DoWhy's API to solve these problems. Expect to see a lot of code with a hands-on example. We will close this session by zooming out a bit and also talk about the PyWhy organization governing DoWhy.

AutoML, or automated machine learning, offers the promise of transforming raw data into accurate predictions with minimal human intervention, expertise, and manual experimentation. In this talk, we will introduce AutoGluon, a cutting-edge toolkit that enables AutoML for tabular, multimodal and time series data. AutoGluon emphasizes usability, enabling a wide variety of tasks from regression to time series forecasting and image classification through a unified and intuitive API. We will specifically focus on tasks on tabular and time series tasks where AutoGluon is the current state-of-the-art, and demonstrate how AutoGluon can be used to achieve competitive performance on tabular and time series competition data sets. We will also discuss the techniques used to automatically build and train these models, peeking under the hood of AutoGluon.

In this talk, I'll show how large language models such as GPT-3 complement rather than replace existing machine learning workflows. Initial annotations are gathered from the OpenAI API via zero- or few-shot learning, and then corrected by a human decision maker using an annotation tool. The resulting annotations can then be used to train and evaluate models as normal. This process results in higher accuracy than can be achieved from the OpenAI API alone, with the added benefit that you'll own and control the model for runtime.

In a modern data stack, data is collected from various sources, such as databases, APIs, and third-party applications. This data is then processed and transformed into a usable format for analysis. However, data quality can suffer at every stage of this process, leading to unreliable insights and flawed decision-making.

One of the biggest challenges of maintaining data quality in a modern data stack is the sheer volume and variety of data. With so much data coming in from different sources, ensuring that all data is accurate, complete, and consistent can be challenging.

Another challenge is data lineage. With data flowing through multiple systems, it can be difficult to track its origin and how it has been transformed over time. This lack of transparency can make it challenging to identify and address issues with data quality.

A modern data stack combines different tools, technologies, and processes businesses use to collect, store, analyze, and visualize data. It is designed to provide a unified and streamlined approach to data management, allowing organizations to make data-driven decisions quickly and efficiently.

The modern data stack differs from the traditional one in several ways. Traditionally, data stacks were built using a monolithic architecture that relied on expensive hardware and software licenses. These stacks were challenging to manage and slow to scale and often resulted in data silos that hindered collaboration between different teams.

On the other hand, the modern data stack is built using a modular architecture that leverages cloud computing, open-source software, and APIs. This approach allows organizations to use the best-of-breed tools for each step of the data pipeline, resulting in a more flexible, scalable, and cost-effective solution.

Summary

With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Join in with the event for the global data community, Data Council Austin. From March 28-30th 2023, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council today! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Your host is Tobias Macey and today I'm interviewing Priyendra Deshwal about how NetSpring is using the data warehouse to deliver a more flexible and detailed view of your product analytics

Interview

Introduction How did you get involved in the area of data management? Can you describe what NetSpring is and the story behind it?

What are the activities that constitute "product analytics" and what are the roles/teams involved in those activities?

When teams first come to you, what are the common challenges that they are facing and what are the solutions that they have attempted to employ? Can you describe some of the challenges involved in bringing product analytics into enterprise or highly regulated environments/industries?

How does a warehouse-native approach simplify that effort?

There are many different players (both commercial and open source) in the product analytics space. Can you share your view on the role that NetSpring plays in that ecosystem? How is the NetSpring platform implemented to be able to best take advantage of modern warehouse technologies and the associated data stacks?

What are the pre-requisites for an organization's infrastructure/data maturity for being able to benefit from NetSpring? How have the goals and implementation of the NetSpring platform evolved from when you first started working on it?

Can you describe the steps involved in integrating NetSpring with an organization's existing warehouse?

What are the signals that NetSpring uses to understand the customer journeys of different organizations? How do you manage the variance of the data models in the warehouse while providing a consistent experience for your users?

Given that you are a product organization, how are you using NetSpring to power NetSpring? What are the most interesting, innovative, or unexpected ways that you have seen NetSpring used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on NetSpring? When is NetSpring the wrong choice? What do you have planned for the future of NetSpring?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

NetSpring ThoughtSpot Product Analytics Amplitude Mixpanel Customer Data Platform GDPR CCPA Segment

Podcast Episode

Rudderstack

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: TimeXtender: TimeXtender Logo TimeXtender is a holistic, metadata-driven solution for data integration, optimized for agility. TimeXtender provides all the features you need to build a future-proof infrastructure for ingesting, transforming, modelling, and delivering clean, reliable data in the fastest, most efficient way possible.

You can't optimize for everything all at once. That's why we take a holistic approach to data integration that optimises for agility instead of fragmentation. By unifying each layer of the data stack, TimeXtender empowers you to build data solutions 10x faster while reducing costs by 70%-80%. We do this for one simple reason: because time matters.

Go to dataengineeringpodcast.com/timextender today to get started for free!Rudderstack: Rudderstack

RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team.

RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again.

Visit dataengineeringpodcast.com/rudderstack to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.Data Council: Data Council Logo Join us at the event for the global data community, Data Council Austin. From March 28-30th 2023, we'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount off tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit: dataengineeringpodcast.com/data-council Promo Code: dataengpod20Support Data Engineering Podcast

In this episode, Conor and Bryce talk to Zach Laine about APL, Haskell, the problem Three Consecutive Odds and why C++ developers should learn other languages. Link to Episode 119 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Zach Laine has been using C++ in industry for 15 years, focusing on data visualization, numeric computing, games, generic programming, and good library design. He finds the process of writing bio blurbs to be a little uncomfortable.

Show Notes

Date Recorded: 2023-02-16 Date Released: 2023-03-03 ADSP Episode 117: OOP, C++ Containers, APIs, EOP & More with Zach Laine!ADSP Episode 118: C++ Allocators with Zach Laine! (Part 2)APLBQNC++98 std::count_ifAnamorphismsC++20 std::views::splitC++23 std::views::chunkC++23 std::views::chunk_byADSP Episode 115: Max Gap in C++23ADSP Episode 116: Max Gap Count in C++23C++98 std::adjacent_differenceC++23 std::views::adjacent_transformThree Consecutive OddsC++98 std::transformC++17 std::transform_reduceC++23 std::views::adjacentC++23 std::views::slideHaskell fromEnumArrayCast Episode: Michael Higginson, 2022 Dyalog Contest WinnerReverse Polish notationP2672 Exploring the Design Space for a Pipeline OperatorDuo LingoDaniela Engert Duo Lingo StreakCategory Theory for Programmers - Bartosz MilewskiC++23 std::views::filterCollection Oriented Programming

In this episode, Conor and Bryce talk to Zach Laine about C++ allocators! Link to Episode 118 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Zach Laine has been using C++ in industry for 15 years, focusing on data visualization, numeric computing, games, generic programming, and good library design. He finds the process of writing bio blurbs to be a little uncomfortable.

Show Notes

Date Recorded: 2023-02-16 Date Released: 2023-02-24 ADSP Episode 117: OOP, C++ Containers, APIs, EOP & More with Zach Laine!C++ std::allocatorC++ std::vectorstatic_vectorAn Introduction to Container Adapters in C++C++ std::stackMISRA StandardThrust thrust::host_vector & thrust::device_vectorC++ STL-Like Algorithm LibrariesBoostCon / C++NowBoostCon 2011 - Bryce Lelbach: AST Construction with the Universal TreeBoostCon 2011 - Bryce Lelbach: AST Construction with the Universal Tree ~ SlidesBoost SpiritBoost Spirit utreeCanada Wide Science FairConor’s Science Fair Project SCI IIPlanarianMethylprednisoloneConor’s Science Fair Project Project PokerDavid Stone on TwitterIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

API Analytics for Product Managers

In API Analytics for Product Managers, you will learn how to approach APIs as products to drive revenue and business growth. The book provides actionable insights on researching, strategizing, marketing, and evaluating the performance of APIs in SaaS contexts. What this Book will help me do Learn to develop long-term strategies for managing APIs as a product. Master the concepts of the API lifecycle and API maturity for better management. Understand and apply key metrics to measure activation, retention, and engagement of APIs. Design support models for APIs that ensure scalability and efficiency. Gain techniques for deriving actionable business insights from metrics analysis. Author(s) Deepa Goyal is an experienced product manager who specializes in API lifecycle management and analytics strategies. With years of industry experience, she has developed deep expertise in scaling and optimizing APIs to deliver business value. Her practical and results-oriented writing style makes complex topics accessible for professionals looking to enhance their API strategies. Who is it for? Ideal for product managers, engineers, and executives in SaaS companies looking to maximize the potential of APIs. This book is especially suited for individuals with foundational knowledge of APIs aiming to refine their analytical and strategic skills. Readers will gain actionable insights to track API performance effectively and implement metrics-driven decisions. It's a must-read for those focused on leveraging APIs for business growth.

In this episode, Conor and Bryce talk to Zach Laine! Link to Episode 117 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Zach Laine has been using C++ in industry for 15 years, focusing on data visualization, numeric computing, games, generic programming, and good library design. He finds the process of writing bio blurbs to be a little uncomfortable.

Show Notes

Date Recorded: 2023-02-16 Date Released: 2023-02-17 UT AustinObject Oriented ProgrammingC++ virtualDynamic and Static PolymorphismAd Hoc PolymorphismParametric PolymorphismRank PolymorphismElements of Programming (Free PDF)The Structure and Interpretation of Computer ProgramsC++23 std::flat_mapC++17 std::string_viewC++20 std::spanC++20 std::basic_string::starts_withC++20 std::basic_string::ends_withC++20 std::basic_string::containsIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In today’s episode, we’re joined by Gleb Polyakov. Gleb is the CEO and Co-Founder of Nylas, a platform that allows developers to automate manual, repetitive everyday tasks with little to no code.

We talk about:

  • How Nylas works, the benefits it provides and who it targets.
  • The definition of first-party data and why it’s important.
  • The growth of the API economy.
  • The new roles of sales and marketing when selling to developers.
  • The trend of using education as a sales technique.

Gleb Polyakov - https://www.linkedin.com/in/gpolyakov Nylas - https://www.linkedin.com/company/nylas/

This episode is brought to you by Qrvey

The tools you need to take action with your data, on a platform built for maximum scalability, security and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com.

Qrvey, the modern no-code analytics solution for SaaS companies on AWS.

saas #analytics #AWS #BI