TensorFlow

#100 Embedded Machine Learning on Edge Devices

2022-08-15 · DataFramed Listen

podcast_episode

by Daniel Situnayake (Edge Impulse)

AI/ML Data Science

Machine learning models are often thought to be mainly utilized by large tech companies that run large and powerful models to accomplish a wide array of tasks. However, machine learning models are finding an increasing presence in edge devices such as smart watches.

ML engineers are learning how to compress models and fit them into smaller and smaller devices while retaining accuracy, effectiveness, and efficiency. The goal is to empower domain experts in any industry around the world to effectively use machine learning models without having to become experts in the field themselves.

Daniel Situnayake is the Founding TinyML Engineer and Head of Machine Learning at Edge Impulse, a leading development platform for embedded machine learning used by over 3,000 enterprises across more than 85,000 ML projects globally. Dan has over 10 years of experience as a software engineer, which includes companies like Google (where he worked on TensorFlow Lite) and Loopt, and co-founded Tiny Farms America’s first insect farming technology company. He wrote the book, "TinyML," and the forthcoming "AI at the Edge".

Daniel joins the show to talk about his work with EdgeML, the biggest challenges facing the field of embedded machine learning, the potential use cases of machine learning models in edge devices, and the best tips for aspiring machine learning engineers and data science practitioners to get started with embedded machine learning.

Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML BI Dashboard Databricks Power BI Scikit-learn

In our presentation, we will walk through a model created to predict repeat admissions to substance abuse treatment centers. The goal is to predict early who will be at high risk for relapse so care can be tailored to put additional focus on these patients. We used the Treatment Episode Data Set (TEDS) Admissions data set, which includes every publicly funded substance abuse treatment admission in the US.

While longitudinal data is not available in the data set, we were able to predict with 88% accuracy and an f-score of 0.85 which admissions were first or repeat admissions. Our solution used a scikit-learn Random Forest model and leveraged MLFlow to track model metrics to choose the most effective model. Our pipeline tested over 100 models of different types ranging from Gradient Boosted Trees to Deep Neural Networks in Tensorflow.

To improve model interpretability, we used Shapley values to measure which variables were most important for predicting readmission. These model metrics along with other valuable data are visualized in an interactive Power BI dashboard designed to help practitioners understand who to focus on during treatment. We are in discussions with companies and researchers who may be able to leverage this model in substance abuse treatment centers in the field.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Quick to Production with the Best of Both Apache Spark and Tensorflow on Databricks

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML Databricks MLOps Spark

Using Tensorflow with big datasets has been an impediment for building deep learning models due to the added complexities of running it in a distributed setting and complicated MLOps code, recent advancements in tensorflow 2, and some extension libraries for Spark has now simplified a lot of this. This talk focuses on how we can leverage the best of both Spark and tensorflow to build machine learning and deep learning models using minimal MLOps code letting Spark handle the grunt of work, enabling us to focus more on feature engineering and building the model itself. This design also enables us to use any of the libraries in the tensorflow ecosystem (like tensorflow recommenders) with the same boilerplate code. For businesses like ours, fast prototyping and quick experimentations are key to building completely new experiences in an efficient and iterative way. It is always preferable to have tangible results before putting more resources into a certain project. This design provides us with that capability and lets us spend more time on research, building models, testing quickly, and rapidly iterating. It also provides us with the flexibility to use our choice of framework at any stage of the machine learning lifecycle. In this talk, we will go through some of the best and new features of both spark and tensorflow, how to go from single node training to distributed training with very few extra lines of code, how to leverage MLFlow as a central model store, and finally, using these models for batch and real-time inference.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Developer Advocacy Engineer for Open-Source - Merve Noyan

2022-07-01 · DataTalks.Club Listen

podcast_episode

by Merve Noyan (Hugging Face)

AI/ML GitHub HTML MLOps NLP

We talked about:

Merve’s background Merve’s first contributions to open source What Merve currently does at Hugging Face (Hub, Spaces) What is means to be a developer advocacy engineer at Hugging Face The best way to get open source experience (Google Summer of Code, Hacktoberfest, and sprints) The peculiarities of hiring as it relates to code contributions Best resources to learn about NLP besides Hugging Face Good first projects for NLP The most important topics in NLP right now NLP ML Engineer vs NLP Data Scientist Project recommendations and other advice to catch the eye of recruiters Merve on Twitch and her podcast Finding Merve online Merve and Mario Kart

Links:

Hugging Face Course: https://hf.co/course Natural Language Processing in TensorFlow: https://www.coursera.org/learn/natural-language-processing-tensorflow Github ML Poetry: https://github.com/merveenoyan/ML-poetry Tackling multiple tasks with a single visual language model: https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model Hugging Face big science/TOpp: https://huggingface.co/bigscience/T0pp Pathways Language Model (PaLM) blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Automatic Speech Recognition at Scale Using Tensorflow, Kubernetes and Airflow

2022-07-01 · Airflow Summit 2022

session

by Rafael Pierre

Airflow Kubernetes

Automatic Speech Recognition is quite a compute intensive task, which depends on complex Deep Learning models. To do this at scale, we leveraged the power of Tensorflow, Kubernetes and Airflow. In this session, you will learn about our journey to tackle this problem, main challenges, and how Airflow made it possible to create a solution that is powerful, yet simple and flexible.

TFX on Airflow with delegation of processing to third party services

2022-07-01 · Airflow Summit 2022

session

by Israel Herraiz , Paul Balm

AI/ML Airflow Flink Cloud Computing Dataflow GCP Spark

Get your ticket for this workshop Tensorflow Extended (TFX) can run machine learning pipelines on Airflow, but all the steps are run by default in the same workers where the Airflow DAG is running. This can lead to an excessive usage of resources, and breaks the assumption that Airflow is a scheduler; it becomes also the data processing platform. In this session, we will see how to use TFX with third party services, on top of Google Cloud Platform. The data processing steps can be run in Dataflow, Spark, Flink and other runners (parallelizing the processing of data and scaling up to petabytes), and the training steps can be run in Vertex or other external services. After this workshop, you will have learnt how to externalize any TFX heavyweight computing outside Airflow, while maintaining Airflow as the orchestrator for your machine learning pipelines.

Vega: Unifying Machine Learning Workflows at Credit Karma using Apache Airflow

2022-07-01 · Airflow Summit 2022

session

by Nicholas Pataki (Credit Karma) , Raj Katakam (Credit Karma) , Debasish Das (Credit Karma)

AI/ML Airflow API Beam BigQuery Cloud Computing ETL/ELT Python

At Credit Karma, we enable financial progress for more than 100 million of our members by recommending them personalized financial products when they interact with our application. In this talk we are introducing our machine learning platform to build interactive and production model-building workflows to serve relevant financial products to Credit Karma users. Vega, Credit Karma’s Machine Learning Platform, has 3 major components: 1) QueryProcessor for feature and training data generation, backed by Google BigQuery, 2) PipelineProcessor for feature transformations, offline scoring and model-analysis, backed by Apache Beam 3) ModelProcessor for running Tensorflow and Scikit models, backed by Google AI Platform, which provides data scientists the flexibility to explore different kinds of machine learning or deep learning models, ranging from gradient boosted trees to neural network with complex structures Vega exposed a unified Python API for Feature Generation, Modeling ETL, Model Training and Model Analysis. Vega supports writing interactive notebooks and python scripts to run these components in local mode with sampled data and in cloud mode for large scale distributed computing. Vega provides the ability to chain the processors provided by data scientists through Python code to define the entire workflow. Then it automatically generates the execution plan for deploying the workflow on Apache Airflow for running offline model experiments and refreshes. Overall, with the unified python API and automated Airflow DAG generation, Vega has improved the efficiency of ML Engineering. Using Airflow we deploy more than 20K features and 100 models daily

Andrej Baranovskij: MLOps: Scaling TensorFlow Model on Kubernetes

2022-05-19 · DATA MINER Big Data Europe Conference 2020 Watch

video

by Andrej Baranovskij

Kubernetes MLOps

Deep Learning with Python, Second Edition

2021-11-03 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Francois Chollet

AI/ML Data Science Keras Python programming-languages software-development

Printed in full color! Unlock the groundbreaking advances of deep learning with this extensively revised new edition of the bestselling original. Learn directly from the creator of Keras and master practical Python deep learning techniques that are easy to apply in the real world. In Deep Learning with Python, Second Edition you will learn: Deep learning from first principles Image classification and image segmentation Timeseries forecasting Text classification and machine translation Text generation, neural style transfer, and image generation Printed in full color throughout Deep Learning with Python has taught thousands of readers how to put the full capabilities of deep learning into action. This extensively revised full color second edition introduces deep learning using Python and Keras, and is loaded with insights for both novice and experienced ML practitioners. You’ll learn practical techniques that are easy to apply in the real world, and important theory for perfecting neural networks. About the Technology Recent innovations in deep learning unlock exciting new software capabilities like automated language translation, image recognition, and more. Deep learning is quickly becoming essential knowledge for every software developer, and modern tools like Keras and TensorFlow put it within your reach—even if you have no background in mathematics or data science. This book shows you how to get started. About the Book Deep Learning with Python, Second Edition introduces the field of deep learning using Python and the powerful Keras library. In this revised and expanded new edition, Keras creator François Chollet offers insights for both novice and experienced machine learning practitioners. As you move through this book, you’ll build your understanding through intuitive explanations, crisp color illustrations, and clear examples. You’ll quickly pick up the skills you need to start developing deep-learning applications. What's Inside Deep learning from first principles Image classification and image segmentation Time series forecasting Text classification and machine translation Text generation, neural style transfer, and image generation Printed in full color throughout About the Reader For readers with intermediate Python skills. No previous experience with Keras, TensorFlow, or machine learning is required. About the Author François Chollet is a software engineer at Google and creator of the Keras deep-learning library. Quotes Chollet is a master of pedagogy and explains complex concepts with minimal fuss, cutting through the math with practical Python code. He is also an experienced ML researcher and his insights on various model architectures or training tips are a joy to read. - Martin Görner, Google Immerse yourself into this exciting introduction to the topic with lots of real-world examples. A must-read for every deep learning practitioner. - Sayak Paul, Carted The modern classic just got better. - Edmon Begoli, Oak Ridge National Laboratory Truly the bible of deep learning. - Yiannis Paraskevopoulos, University of West Attica

Episode 49: Special Guest Dave Abrahams! (Part 2)

2021-10-29 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Dave Abrahams (Adobe) , Bryce Adelstein Lelbach (NVIDIA)

C++ Cyber Security

In this episode, Bryce and Conor interview Dave Abrahams about how he went from programming BASIC to APL to C++! About the Guest: Dave Abrahams is a contributor to the C++ standard, a founding contributor of the Boost C++ Libraries project and of the BoostCon/C++Now conference, and was a principal designer of the Swift programming language. He recently spent seven years at Apple, culminating in the creation of the declarative SwiftUI framework, worked at Google on Swift for TensorFlow, and is now a principal scientist at Adobe, where he and Sean Parent are rebooting the Software Technology Lab. Date Recorded: 2021-10-03 Date Released: 2021-10-29 ADSP Episode 48: Special Guest Dave Abrahams!Algorithms + Data Structures = ProgramsNiklaus WirthCombinatory LogicStepanov’s “Notes on Higher Order Programming in Scheme”PDP-8BASIC Computer Games by David AhlRutgers UniversityPDP-10TECOAPLPrinceton UniversityAaron Hsu’s Co-dfns GPU CompilerSwift Programming LanguageConor’s Galaxy Brain Programming LanguagesBen Deane’s Six languages worth knowingLisp MachineEmacsComposer’s MosaicTHINK CException handling: a false sense of security - Tom GargillIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Episode 48: Special Guest Dave Abrahams!

2021-10-22 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Dave Abrahams (Adobe) , Bryce Adelstein Lelbach (NVIDIA)

C++

In this episode, Bryce and Conor interview Dave Abrahams and talk about C++Now (aka BoostCon), C++ and Swift! About the Guest: Dave Abrahams is a contributor to the C++ standard, a founding contributor of the Boost C++ Libraries project and of the BoostCon/C++Now conference, and was a principal designer of the Swift programming language. He recently spent seven years at Apple, culminating in the creation of the declarative SwiftUI framework, worked at Google on Swift for TensorFlow, and is now a principal scientist at Adobe, where he and Sean Parent are rebooting the Software Technology Lab. Date Recorded: 2021-10-03 Date Released: 2021-10-22 C++Now (formerly BoostCon)Swift Programming LanguageC++ Move ConstructorsBoost C++ LibrariesC++ Standard Template LibraryStepanov WebsiteChris Lattner on TwitterJeremy Siek’s ProfileRust Programming LanguageC++ std::mutexC++ std::shared_mutexThe Day The Standard Library Died (blog that mentions std::string ABI break)Intro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Apache Airflow and Ray: Orchestrating ML at Scale

2021-07-01 · Airflow Summit 2021

session

by Daniel Imberman

AI/ML Airflow Pandas

As the Apache Airflow project grows, we seek both ways to incorporate rising technologies and novel ways to expose them to our users. Ray is one of the fastest-growing distributed computation systems on the market today. In this talk, we will introduce the Ray decorator and Ray backend. These features, built with the help of the Ray maintainers at Anyscale, will allow Data Scientists to natively integrate their distributed pandas, XGBoost, and TensorFlow jobs to their airflow pipelines with a single decorator. By merging the orchestration of Airflow and the distributed computation of Ray, this coordination of technologies opens Airflow users to a whole host of new possibilities when designing their pipelines.

Distributed Data Systems with Azure Databricks

2021-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alan Bernardo Palacio

AI/ML Azure ADF Big Data Cloud Computing Databricks Delta ETL/ELT Microsoft Python Data Streaming data +3 more

In 'Distributed Data Systems with Azure Databricks', you will explore the capabilities of Microsoft Azure Databricks as a platform for building and managing big data pipelines. Learn how to process, transform, and analyze data at scale while developing expertise in training distributed machine learning models and integrating them into enterprise workflows. What this Book will help me do Design and implement Extract, Transform, Load (ETL) pipelines using Azure Databricks. Conduct distributed training of machine learning models using TensorFlow and Horovod. Integrate Azure Databricks with Azure Data Factory for optimized data pipeline orchestration. Utilize Delta Engine for efficient querying and analysis of data within Delta Lake. Employ Databricks Structured Streaming to manage real-time production-grade data flows. Author(s) None Palacio is an experienced data engineer and cloud computing specialist, with extensive knowledge of the Microsoft Azure platform. With years of practical application of Databricks in enterprise settings, Palacio provides clear, actionable insights through relatable examples. They bring a passion for innovative solutions to the field of big data automation. Who is it for? This book is ideal for data engineers, machine learning engineers, and software developers looking to master Azure Databricks for large-scale data processing and analysis. Readers should have basic familiarity with cloud platforms, understanding of data pipelines, and a foundational grasp of Python and machine learning concepts. It is perfect for those wanting to create scalable and manageable data workflows.

AI and Machine Learning for Coders

2020-10-01 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Laurence Moroney

AI/ML Cloud Computing NLP ai-ml data machine-learning

If you're looking to make a career move from programmer to AI specialist, this is the ideal place to start. Based on Laurence Moroney's extremely successful AI courses, this introductory book provides a hands-on, code-first approach to help you build confidence while you learn key topics. You'll understand how to implement the most common scenarios in machine learning, such as computer vision, natural language processing (NLP), and sequence modeling for web, mobile, cloud, and embedded runtimes. Most books on machine learning begin with a daunting amount of advanced math. This guide is built on practical lessons that let you work directly with the code. You'll learn: How to build models with TensorFlow using skills that employers desire The basics of machine learning by working with code samples How to implement computer vision, including feature detection in images How to use NLP to tokenize and sequence words and sentences Methods for embedding models in Android and iOS How to serve models over the web and in the cloud with TensorFlow Serving

Hands-on Time Series Analysis with Python: From Basics to Bleeding Edge Techniques

2020-08-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by ASHISH PATEL , B V Vishwas

Python RNNs data data-science data-science-tasks statistics time-series

Learn the concepts of time series from traditional to bleeding-edge techniques. This book uses comprehensive examples to clearly illustrate statistical approaches and methods of analyzing time series data and its utilization in the real world. All the code is available in Jupyter notebooks. You'll begin by reviewing time series fundamentals, the structure of time series data, pre-processing, and how to craft the features through data wrangling. Next, you'll look at traditional time series techniques like ARMA, SARIMAX, VAR, and VARMA using trending framework like StatsModels and pmdarima. The book also explains building classification models using sktime, and covers advanced deep learning-based techniques like ANN, CNN, RNN, LSTM, GRU and Autoencoder to solve time series problem using Tensorflow. It concludes by explaining the popular framework fbprophet for modeling time series analysis. After reading Hands-On Time Series Analysis with Python, you'll be able to apply these new techniques in industries, such as oil and gas, robotics, manufacturing, government, banking, retail, healthcare, and more. What You'll Learn: · Explains basics to advanced concepts of time series · How to design, develop, train, and validate time-series methodologies · What are smoothing, ARMA, ARIMA, SARIMA,SRIMAX, VAR, VARMA techniques in time series and how to optimally tune parameters to yield best results · Learn how to leverage bleeding-edge techniques such as ANN, CNN, RNN, LSTM, GRU, Autoencoder to solve both Univariate and multivariate problems by using two types of data preparation methods for time series. · Univariate and multivariate problem solving using fbprophet. Who This Book Is For Data scientists, data analysts, financial analysts, and stock market researchers

Machine Learning with Apache Airflow

2020-07-01 · Airflow Summit 2020

session

by Daniel Imberman

AI/ML Airflow Cloud Computing Data Science GCP Cloud Functions Cyber Security Spark

This talk discusses how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor As the field of data science grows in popularity, companies find themselves in need of a single common language that can connect their data science teams and data infrastructure teams. Data scientists want rapid iteration, infrastructure engineers want monitoring and security controls, and product owners want their solutions deployed in time for quarterly reports. This talk will discuss how to build an Airflow based data platform that can take advantage of popular ML tools (Jupyter, Tensorflow, Spark) while creating an easy-to-manage/monitor ecosystem for data infrastructure and support team. In this talk, we will take an idea from a single-machine Jupyter Notebook to a cross-service Spark + Tensorflow pipeline, to a canary tested, production-ready model served on Google Cloud Functions. We will show how Apache Airflow can connect all layers of a data team to deliver rapid results.

Keeping Your Data Warehouse In Order With DataForm

2019-10-15 · Data Engineering Podcast Listen

podcast_episode

by Lewis Hemens (Dataform) , Tobias Macey

AI/ML Airflow Analytics AWS Big Data Data Engineering Data Management Datacoral Google Dataform DWH Iceberg Presto +3 more

Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. Dataform is a platform that helps you apply engineering principles to your data transformations and table definitions, including unit testing SQL scripts, defining repeatable pipelines, and adding metadata to your warehouse to improve your team’s communication. In this episode CTO and co-founder of Dataform Lewis Hemens joins the show to explain his motivation for creating the platform and company, how it works under the covers, and how you can start using it today to get your data warehouse under control.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! This week’s episode is also sponsored by Datacoral. They provide an AWS-native, serverless, data infrastructure that installs in your VPC. Datacoral helps data engineers build and manage the flow of data pipelines without having to manage any infrastructure. Datacoral’s customers report that their data engineers are able to spend 80% of their work time invested in data transformations, rather than pipeline maintenance. Raghu Murthy, founder and CEO of Datacoral built data infrastructures at Yahoo! and Facebook, scaling from mere terabytes to petabytes of analytic data. He started Datacoral with the goal to make SQL the universal data programming language. Visit Datacoral.com today to find out more. Are you working on data, analytics, or AI using platforms such as Presto, Spark, or Tensorflow? Check out the Data Orchestration Summit on November 7 at the Computer History Museum in Mountain View. This one day conference is focused on the key data engineering challenges and solutions around building analytics and AI platforms. Attendees will hear from companies including Walmart, Netflix, Google, and DBS Bank on how they leveraged technologies such as Alluxio, Presto, Spark, Tensorflow, and you will also hear from creators of open source projects including Alluxio, Presto, Airflow, Iceberg, and more! Use discount code PODCAST for 25% off of your ticket, and the first five people to register get free tickets! Register now as early bird tickets are ending this week! Attendees will takeaway learnings, swag, a free voucher to visit the museum, and a chance to win the latest ipad Pro! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Lewis Hemens about DataForm, a platform that helps analy

The Workflow Engine For Data Engineers And Data Scientists

2019-06-25 · Data Engineering Podcast Listen

podcast_episode

by Jeremiah Lowin (Prefect) , Tobias Macey

AI/ML Airflow Big Data Cloud Computing Data Engineering Data Management Data Science Prefect Data Streaming

Summary Building a data platform that works equally well for data engineering and data science is a task that requires familiarity with the needs of both roles. Data engineering platforms have a strong focus on stateful execution and tasks that are strictly ordered based on dependency graphs. Data science platforms provide an environment that is conducive to rapid experimentation and iteration, with data flowing directly between stages. Jeremiah Lowin has gained experience in both styles of working, leading him to be frustrated with all of the available tools. In this episode he explains his motivation for creating a new workflow engine that marries the needs of data engineers and data scientists, how it helps to smooth the handoffs between teams working on data projects, and how the design lets you focus on what you care about while it handles the failure cases for you. It is exciting to see a new generation of workflow engine that is learning from the benefits and failures of previous tools for processing your data pipelines.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Jeremiah Lowin about Prefect, a workflow platform for data engineering

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Prefect is and your motivation for creating it? What are the axes along which a workflow engine can differentiate itself, and which of those have you focused on for Prefect? In some of your blog posts and your PyData presentation you discuss the concept of negative vs. positive engineering. Can you briefly outline what you mean by that and the ways that Prefect handles the negative cases for you? How is Prefect itself implemented and what tools or systems have you relied on most heavily for inspiration? How do you manage passing data between stages in a pipeline when they are running across distributed nodes? What was your decision making process when deciding to use Dask as your supported execution engine?

For tasks that require specific resources or dependencies how do you approach the idea of task affinity?

Does Prefect support managing tasks that bridge network boundaries? What are some of the features or capabilities of Prefect that are misunderstood or overlooked by users which you think should be exercised more often? What are the limitations of the open source core as compared to the cloud offering that you are building? What were your assumptions going into this project and how have they been challenged or updated as you dug deeper into the problem domain and received feedback from users? What are some of the most interesting/innovative/unexpected ways that you have seen Prefect used? When is Prefect the wrong choice? In your experience working on Airflow and Prefect, what are some of the common challenges and anti-patterns that arise in data engineering projects?

What are some best practices and industry trends that you are most excited by?

What do you have planned for the future of the Prefect project and company?

Contact Info

LinkedIn @jlowin on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Prefect Airflow Dask

Podcast Episode

Prefect Blog PyData Presentation Tensorflow Workflow Engine

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Deep Learning for Search

2019-06-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tommaso Teofili

AI/ML Data Science Java NLP data data-engineering search

Deep Learning for Search teaches you how to improve the effectiveness of your search by implementing neural network-based techniques. By the time you're finished with the book, you'll be ready to build amazing search engines that deliver the results your users need and that get better as time goes on! About the Technology Deep learning handles the toughest search challenges, including imprecise search terms, badly indexed data, and retrieving images with minimal metadata. And with modern tools like DL4J and TensorFlow, you can apply powerful DL techniques without a deep background in data science or natural language processing (NLP). This book will show you how. About the Book Deep Learning for Search teaches you to improve your search results with neural networks. You’ll review how DL relates to search basics like indexing and ranking. Then, you’ll walk through in-depth examples to upgrade your search with DL techniques using Apache Lucene and Deeplearning4j. As the book progresses, you’ll explore advanced topics like searching through images, translating user queries, and designing search engines that improve as they learn! What's Inside Accurate and relevant rankings Searching across languages Content-based image search Search with recommendations About the Reader For developers comfortable with Java or a similar language and search basics. No experience with deep learning or NLP needed. About the Author Tommaso Teofili is a software engineer with a passion for open source and machine learning. As a member of the Apache Software Foundation, he contributes to a number of open source projects, ranging from topics like information retrieval (such as Lucene and Solr) to natural language processing and machine translation (including OpenNLP, Joshua, and UIMA). He currently works at Adobe, developing search and indexing infrastructure components, and researching the areas of natural language processing, information retrieval, and deep learning. He has presented search and machine learning talks at conferences including BerlinBuzzwords, International Conference on Computational Science, ApacheCon, EclipseCon, and others. You can find him on Twitter at @tteofili. Quotes A practical approach that shows you the state of the art in using neural networks, AI, and deep learning in the development of search engines. - From the Foreword by Chris Mattmann, NASA JPL A thorough and thoughtful synthesis of traditional search and the latest advancements in deep learning. - Greg Zanotti, Marquette Partners A well-laid-out deep dive into the latest technologies that will take your search engine to the next level. - Andrew Wyllie, Thynk Health Hands-on exercises teach you how to master deep learning for search-based products. - Antonio Magnaghi, System1

Machine Learning for Finance

2019-05-30 · O'Reilly AI & ML Books O'Reilly Amazon

book

by James Le (Twelve Labs) , Jannes Klaas

AI/ML Keras Python Scikit-learn ai-ml data machine-learning

Dive deep into how machine learning is transforming the financial industry with 'Machine Learning for Finance'. This comprehensive guide explores cutting-edge concepts in machine learning while providing practical insights and Python code examples to help readers apply these techniques to real-world financial scenarios. Whether tackling fraud detection, financial forecasting, or sentiment analysis, this book equips you with the understanding and tools needed to excel. What this Book will help me do Understand and implement machine learning techniques for structured data, natural language, images, and text. Learn Python-based tools and libraries such as scikit-learn, Keras, and TensorFlow for financial data analysis. Apply machine learning for tasks like predicting financial trends, detecting fraud, and customer sentiment analysis. Explore advanced topics such as neural networks, generative adversarial networks (GANs), and reinforcement learning. Gain hands-on experience with machine learning debugging, products launch preparation, and addressing bias in data. Author(s) James Le None and Jannes Klaas are experts in machine learning applications in financial technology. Jannes has extensive experience training financial professionals on implementing machine learning strategies in their work and pairs this with a deep academic understanding of the topic. Their dedication to empowering readers to confidently integrate AI and machine learning into financial applications shines through in this user-focused, richly detailed book. Who is it for? This book is tailored for financial professionals, data scientists, and enthusiasts aiming to harness machine learning's potential in finance. Readers should have a foundational understanding of mathematics, statistics, and Python programming. If you work in financial services and are curious about applications ranging from fraud detection to trend forecasting, this resource is for you. It's designed for those looking to advance their skills and make impactful contributions in financial technology.

talk-data.com

Activity Trend

Top Events

Top Speakers

#100 Embedded Machine Learning on Edge Devices

Predicting Repeat Admissions to Substance Abuse Treatment with Machine Learning

Quick to Production with the Best of Both Apache Spark and Tensorflow on Databricks

Developer Advocacy Engineer for Open-Source - Merve Noyan

Automatic Speech Recognition at Scale Using Tensorflow, Kubernetes and Airflow

TFX on Airflow with delegation of processing to third party services

Vega: Unifying Machine Learning Workflows at Credit Karma using Apache Airflow

Andrej Baranovskij: MLOps: Scaling TensorFlow Model on Kubernetes

Deep Learning with Python, Second Edition

Episode 49: Special Guest Dave Abrahams! (Part 2)

Episode 48: Special Guest Dave Abrahams!

Apache Airflow and Ray: Orchestrating ML at Scale

Distributed Data Systems with Azure Databricks

AI and Machine Learning for Coders

Hands-on Time Series Analysis with Python: From Basics to Bleeding Edge Techniques

Machine Learning with Apache Airflow

Keeping Your Data Warehouse In Order With DataForm

The Workflow Engine For Data Engineers And Data Scientists

Deep Learning for Search

Machine Learning for Finance