talk-data.com talk-data.com

Event

PyConDE & PyData Berlin 2023

2023-04-17 – 2023-04-19 PyData

Activities tracked

4

Filtering by: Analytics ×

Sessions & talks

Showing 1–4 of 4 · Newest first

Search within this event →

Postmodern Architecture: The Python Powered Modern Data Stack

2023-04-19
talk

The Modern Data Stack has brought a lot of new buzzwords into the data engineering lexicon: "data mesh", "data observability", "reverse ETL", "data lineage", "analytics engineering". In this light-hearted talk we will demystify the evolving revolution that will define the future of data analytics & engineering teams.

Our journey begins with the PyData Stack: pandas pipelines powering ETL workflows...clean code, tested code, data validation, perfect for in-memory workflows. As demand for self-serve analytics grows, new data sources bring more APIs to model, more code to maintain, DAG workflow orchestration tools, new nuances to capture ("the tax team defines revenue differently"), more dashboards, more not-quite-bugs ("but my number says this...").

This data maturity journey is a well-trodden path with common pitfalls & opportunities. After dashboards comes predictive modelling ("what will happen"), prescriptive modelling ("what should we do?"), perhaps eventually automated decision making. Getting there is much easier with the advent of the Python Powered Modern Data Stack.

In this talk, we will cover the shift from ETL to ELT, the open-source Modern Data Stack tools you should know, with a focus on how dbt's new Python integration is changing how data pipelines are built, run, tested & maintained. By understanding the latest trends & buzzwords, attendees will gain a deeper insight into Python's role at the core of the future of data engineering.

Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem

2023-04-19
talk

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing, and is becoming the de facto standard for tabular data. This talk will give an overview of the recent developments both in Apache Arrow itself as how it is being adopted in the PyData ecosystem (and beyond) and can improve your day-to-day data analytics workflows.

Bayesian Marketing Science: Solving Marketing's 3 Biggest Problems

2023-04-18
talk

In this talk I will present two new open-source packages that make up a powerful and state-of-the-art marketing analytics toolbox. Specifically, PyMC-Marketing is a new library built on top of the popular Bayesian modeling library PyMC. PyMC-Marketing allows robust estimation of customer acquisition costs (via media mix modeling) as well as customer lifetime value. In addition, I will show how we can estimate the effectiveness of marketing campaigns using a new Bayesian causal inference package called CausalPy. The talk will be applied with a real-world case-study and many code examples. Special emphasis will be placed on the interplay between these tools and how they can be combined together to make optimal marketing budget decisions in complex scenarios.

WALD: A Modern & Sustainable Analytics Stack

2023-04-17
talk

The name WALD-stack stems from the four technologies it is composed of, i.e. a cloud-computing Warehouse like Snowflake or Google BigQuery, the open-source data integration engine Airbyte, the open-source full-stack BI platform Lightdash, and the open-source data transformation tool DBT.

Using a Formula 1 Grand Prix dataset, I will give an overview of how these four tools complement each other perfectly for analytics tasks in an ELT approach. You will learn the specific uses of each tool as well as their particular features. My talk is based on a full tutorial, which you can find under waldstack.org.