TensorFlow

Data Science and Engineering at Enterprise Scale

2019-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jerome Nilmeier

AI/ML Analytics Data Science Python Spark SQL Data Streaming data data-science

As enterprise-scale data science sharpens its focus on data-driven decision making and machine learning, new tools have emerged to help facilitate these processes. This practical ebook shows data scientists and enterprise developers how the notebook interface, Apache Spark, and other collaboration tools are particularly well suited to bridge the communication gap between their teams. Through a series of real-world examples, author Jerome Nilmeier demonstrates how to generate a model that enables data scientists and developers to share ideas and project code. You’ll learn how data scientists can approach real-world business problems with Spark and how developers can then implement the solution in a production environment. Dive deep into data science technologies, including Spark, TensorFlow, and the Jupyter Notebook Learn how Spark and Python notebooks enable data scientists and developers to work together Explore how the notebook environment works with Spark SQL for structured data Use notebooks and Spark as a launchpad to pursue supervised, unsupervised, and deep learning data models Learn additional Spark functionality, including graph analysis and streaming Explore the use of analytics in the production environment, particularly when creating data pipelines and deploying code

Machine Learning In The Enterprise

2019-02-11 · Data Engineering Podcast Listen

podcast_episode

by Kevin Dewalt (Prolego) , Tobias Macey

AI/ML Airflow CI/CD Data Engineering Data Management Data Science DevOps Git Jenkins Keras Pandas PyTorch +1 more

Summary Machine learning is a class of technologies that promise to revolutionize business. Unfortunately, it can be difficult to identify and execute on ways that it can be used in large companies. Kevin Dewalt founded Prolego to help Fortune 500 companies build, launch, and maintain their first machine learning projects so that they can remain competitive in our landscape of constant change. In this episode he discusses why machine learning projects require a new set of capabilities, how to build a team from internal and external candidates, and how an example project progressed through each phase of maturity. This was a great conversation for anyone who wants to understand the benefits and tradeoffs of machine learning for their own projects and how to put it into practice.

Introduction

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Kevin Dewalt about his experiences at Prolego, building machine learning projects for Fortune 500 companies

Interview

Introduction How did you get involved in the area of data management? For the benefit of software engineers and team leaders who are new to machine learning, can you briefly describe what machine learning is and why is it relevant to them? What is your primary mission at Prolego and how did you identify, execute on, and establish a presence in your particular market?

How much of your sales process is spent on educating your clients about what AI or ML are and the benefits that these technologies can provide?

What have you found to be the technical skills and capacity necessary for being successful in building and deploying a machine learning project?

When engaging with a client, what have you found to be the most common areas of technical capacity or knowledge that are needed?

Everyone talks about a talent shortage in machine learning. Can you suggest a recruiting or skills development process for companies which need to build out their data engineering practice? What challenges will teams typically encounter when creating an efficient working relationship between data scientists and data engineers? Can you briefly describe a successful project of developing a first ML model and putting it into production?

What is the breakdown of how much time was spent on different activities such as data wrangling, model development, and data engineering pipeline development? When releasing to production, can you share the types of metrics that you track to ensure the health and proper functioning of the models? What does a deployable artifact for a machine learning/deep learning application look like?

What basic technology stack is necessary for putting the first ML models into production?

How does the build vs. buy debate break down in this space and what products do you typically recommend to your clients?

What are the major risks associated with deploying ML models and how can a team mitigate them? Suppose a software engineer wants to break into ML. What data engineering skills would you suggest they learn? How should they position themselves for the right opportunity?

Contact Info

Email: Kevin Dewalt [email protected] and Russ Rands [email protected] Connect on LinkedIn: Kevin Dewalt and Russ Rands Twitter: @kevindewalt

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Prolego Download our book: Become an AI Company in 90 Days Google Rules Of ML AI Winter Machine Learning Supervised Learning O’Reilly Strata Conference GE Rebranding Commercials Jez Humble: Stop Hiring Devops Experts (And Start Growing Them) SQL ORM Django RoR Tensorflow PyTorch Keras Data Engineering Podcast Episode About Data Teams DevOps For Data Teams – DevOps Days Boston Presentation by Tobias Jupyter Notebook Data Engineering Podcast: Notebooks at Netflix Pandas

Podcast Interview

Joel Grus

JupyterCon Presentation Data Science From Scratch

Expensify Airflow

James Meickle Interview

Git Jenkins Continuous Integration Practical Deep Learning For Coders Course by Jeremy Howard Data Carpentry

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Hands-On Deep Learning with Apache Spark

2019-01-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Guglielmo Iozzia

AI/ML Keras RNNs Scala Spark apache-spark data data-engineering

"Hands-On Deep Learning with Apache Spark" is an essential resource for mastering distributed deep learning frameworks and applications on Apache Spark. Through practical examples and guided tutorials, this book teaches you to deploy scalable deep learning solutions for handling complex data challenges efficiently. What this Book will help me do Understand how to set up Apache Spark for deep learning workflows. Gain practical insight into implementing neural networks, including CNNs and RNNs, on distributed platforms. Learn to train and optimize models using popular frameworks like TensorFlow and Keras. Develop expertise in analyzing large datasets with textual and image-based deep learning methods. Acquire skills to deploy trained models for real-world applications in distributed environments. Author(s) None Iozzia is an accomplished software engineer and data scientist with a strong background in distributed computing and machine learning. With years of experience working with Apache Spark and deep learning technologies, None brings a wealth of practical knowledge to the table. Their passion for providing clear, hands-on guidance makes this book an approachable and valuable resource for learners of all levels. Who is it for? This book is aimed at Scala developers, data scientists, and data analysts who are looking to extend their skill set to include distributed deep learning on Apache Spark. It's ideally suited for readers familiar with machine learning basics and those with prior exposure to Apache Spark workflows. If you aim to create scalable machine learning solutions that handle complex data, this book offers precisely what you need.

Hands-On Artificial Intelligence for Beginners

2018-10-31 · O'Reilly AI & ML Books O'Reilly Amazon

book

by David Dindi , Patrick D. Smith

AI/ML RNNs ai-ml artificial-intelligence-ai artificial intelligence (ai) data

"Hands-On Artificial Intelligence for Beginners" is your gateway to understanding and implementing modern AI technologies. This book introduces foundational AI concepts, delves into machine learning, deep learning, and neural networks, and guides you through practical applications in real-world scenarios. What this Book will help me do Understand and apply core AI and machine learning principles using tools like TensorFlow. Develop and train artificial neural networks for various applications. Implement advanced models like CNNs, RNNs, and generative models to solve real-world tasks. Explore reinforcement learning techniques and their game-playing strategies. Design, deploy, and optimize scalable AI systems for long-term use. Author(s) None Dindi and Patrick D. Smith are experts in Artificial Intelligence with extensive teaching and development experience. They dedicate their writing to demystifying complex ideas and making them accessible to learners. Their commitment to hands-on practice ensures that readers build concrete skills while grasping theoretical concepts. Who is it for? If you're an aspiring data scientist or developer keen to break into Artificial Intelligence, this book is perfect for you. Beginners with basic programming knowledge will feel comfortable progressing through the material. Readers looking for practical illustrations of AI concepts will benefit greatly from the hands-on approach. This book is tailored for learners aiming to build and deploy real-world AI systems efficiently.

Python Data Analytics: With Pandas, NumPy, and Matplotlib

2018-09-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Fabio Nelli

AI/ML Analytics Data Analytics DataViz JavaScript Keras Matplotlib NumPy Pandas Python PyTorch Scikit-learn +3 more

Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This revision is fully updated with new content on social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation Author Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Second Edition is an invaluable reference with its examples of storing, accessing, and analyzing data. What You'll Learn Understand the core concepts of data analysis and the Python ecosystem Go in depth with pandas for reading, writing, and processing data Use tools and techniques for data visualization and image analysis Examine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorch Who This Book Is For Experienced Python developers who need to learn about Pythonic tools for data analysis

Apache Spark Deep Learning Cookbook

2018-07-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ahmed Sherif , Amrith Ravindra , Michal Malohlava , Adnan Masood

AI/ML Big Data Keras NLP Python RNNs Spark apache-spark data data-engineering

Embark on a journey to master distributed deep learning with the "Apache Spark Deep Learning Cookbook". Designed specifically for leveraging the capabilities of Apache Spark, TensorFlow, and Keras, this book offers over 80 problem-solving recipes to efficiently train and deploy state-of-the-art neural networks, addressing real-world AI challenges. What this Book will help me do Set up and configure a working Apache Spark environment optimized for deep learning tasks. Implement distributed training practices for deep learning models using TensorFlow and Keras. Develop and test neural networks such as CNNs and RNNs targeting specific big data problems. Apply Spark's built-in libraries and integrations for enhanced NLP and computer vision applications. Effectively manage and preprocess large datasets using Spark DataFrames for machine learning tasks. Author(s) Authors Ahmed Sherif and None Ravindra bring years of experience in deep learning, Apache Spark use cases, and hands-on practical training. Their collective expertise has contributed to designing this cookbook approach, focusing on clarity and usability for readers tackling challenging machine learning scenarios. Who is it for? This book is ideal for IT professionals, data scientists, and software developers with foundational understanding of machine learning concepts and Apache Spark framework capabilities. If you aim to scale deep learning and integrate efficient computing with Spark's power, this guide is for you. Familiarity with Python will help maximize the book's potential.

IBM PowerAI: Deep Learning Unleashed on IBM Power Systems Servers

2018-03-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alfonso Jara , Dino Quintero , Shota Tsukamoto , Richard Wale , Bruno C. Faria , Bing He , Chris Parsons

AI/ML IBM data data-engineering

Abstract This IBM® Redbooks® publication is a guide about the IBM PowerAI Deep Learning solution. This book provides an introduction to artificial intelligence (AI) and deep learning (DL), IBM PowerAI, and components of IBM PowerAI, deploying IBM PowerAI, guidelines for working with data and creating models, an introduction to IBM Spectrum™ Conductor Deep Learning Impact (DLI), and case scenarios. IBM PowerAI started as a package of software distributions of many of the major DL software frameworks for model training, such as TensorFlow, Caffe, Torch, Theano, and the associated libraries, such as CUDA Deep Neural Network (cuDNN). The IBM PowerAI software is optimized for performance by using the IBM Power Systems™ servers that are integrated with NVLink. The AI stack foundation starts with servers with accelerators. graphical processing unit (GPU) accelerators are well-suited for the compute-intensive nature of DL training, and servers with the highest CPU to GPU bandwidth, such as IBM Power Systems servers, enable the high-performance data transfer that is required for larger and more complex DL models. This publication targets technical readers, including developers, IT specialists, systems architects, brand specialist, sales team, and anyone looking for a guide about how to understand the IBM PowerAI Deep Learning architecture, framework configuration, application and workload configuration, and user infrastructure.

Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15

2018-01-22 · Data Engineering Podcast Listen

podcast_episode

by Alex Ratner (Snorkel) , Tobias Macey

AI/ML Big Data Data Collection Data Engineering Data Management GitHub Linux PyTorch

Summary

The majority of the conversation around machine learning and big data pertains to well-structured and cleaned data sets. Unfortunately, that is just a small percentage of the information that is available, so the rest of the sources of knowledge in a company are housed in so-called “Dark Data” sets. In this episode Alex Ratner explains how the work that he and his fellow researchers are doing on Snorkel can be used to extract value by leveraging labeling functions written by domain experts to generate training sets for machine learning models. He also explains how this approach can be used to democratize machine learning by making it feasible for organizations with smaller data sets than those required by most tooling.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Alex Ratner about Snorkel and Dark Data

Interview

Introduction How did you get involved in the area of data management? Can you start by sharing your definition of dark data and how Snorkel helps to extract value from it? What are some of the most challenging aspects of building labelling functions and what tools or techniques are available to verify their validity and effectiveness in producing accurate outcomes? Can you provide some examples of how Snorkel can be used to build useful models in production contexts for companies or problem domains where data collection is difficult to do at large scale? For someone who wants to use Snorkel, what are the steps involved in processing the source data and what tooling or systems are necessary to analyse the outputs for generating usable insights? How is Snorkel architected and how has the design evolved over its lifetime? What are some situations where Snorkel would be poorly suited for use? What are some of the most interesting applications of Snorkel that you are aware of? What are some of the other projects that you and your group are working on that interact with Snorkel? What are some of the features or improvements that you have planned for future releases of Snorkel?

Contact Info

Website ajratner on Github @ajratner on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Stanford DAWN HazyResearch Snorkel Christopher Ré Dark Data DARPA Memex Training Data FDA ImageNet National Library of Medicine Empirical Studies of Conflict Data Augmentation PyTorch Tensorflow Generative Model Discriminative Model Weak Supervision

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14

2018-01-15 · Data Engineering Podcast Listen

podcast_episode

by Christopher Meiklejohn (LASP) , Tobias Macey

Cassandra Data Engineering Data Management Delta Docker DynamoDB GitHub Kubernetes Linux

Summary

As we scale our systems to handle larger volumes of data, geographically distributed users, and varied data sources the requirement to distribute the computational resources for managing that information becomes more pronounced. In order to ensure that all of the distributed nodes in our systems agree with each other we need to build mechanisms to properly handle replication of data and conflict resolution. In this episode Christopher Meiklejohn discusses the research he is doing with Conflict-Free Replicated Data Types (CRDTs) and how they fit in with existing methods for sharing and sharding data. He also shares resources for systems that leverage CRDTs, how you can incorporate them into your systems, and when they might not be the right solution. It is a fascinating and informative treatment of a topic that is becoming increasingly relevant in a data driven world.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Christopher Meiklejohn about establishing consensus in distributed systems

Interview

Introduction How did you get involved in the area of data management? You have dealt with CRDTs with your work in industry, as well as in your research. Can you start by explaining what a CRDT is, how you first began working with them, and some of their current manifestations? Other than CRDTs, what are some of the methods for establishing consensus across nodes in a system and how does increased scale affect their relative effectiveness? One of the projects that you have been involved in which relies on CRDTs is LASP. Can you describe what LASP is and what your role in the project has been? Can you provide examples of some production systems or available tools that are leveraging CRDTs? If someone wants to take advantage of CRDTs in their applications or data processing, what are the available off-the-shelf options, and what would be involved in implementing custom data types? What areas of research are you most excited about right now? Given that you are currently working on your PhD, do you have any thoughts on the projects or industries that you would like to be involved in once your degree is completed?

Contact Info

Website cmeiklejohn on GitHub Google Scholar Citations

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Basho Riak Syncfree LASP CRDT Mesosphere CAP Theorem Cassandra DynamoDB Bayou System (Xerox PARC) Multivalue Register Paxos RAFT Byzantine Fault Tolerance Two Phase Commit Spanner ReactiveX Tensorflow Erlang Docker Kubernetes Erleans Orleans Atom Editor Automerge Martin Klepman Akka Delta CRDTs Antidote DB Kops Eventual Consistency Causal Consistency ACID Transactions Joe Hellerstein

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Project Common Voice

2017-08-11 · Data Skeptic Listen

podcast_episode

by Kyle Polich , Andre Natal (Mozilla)

Thanks to our sponsor Springboard. In this week's episode, guest Andre Natal from Mozilla joins our host, Kyle Polich, to discuss a couple exciting new developments in open source speech recognition systems, which include Project Common Voice. In June 2017, Mozilla launched a new open source project, Common Voice, a novel complementary project to the TensorFlow-based DeepSpeech implementation. DeepSpeech is a deep learning-based voice recognition system that was designed by Baidu, which they describe in greater detail in their research paper. DeepSpeech is a speech-to-text engine, and Mozilla hopes that, in the future, they can use Common Voice data to train their DeepSpeech engine.

Peter Morgan, CEO, Deep Learning Partnership

2017-06-01 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Peter Morgan (Deep Learning Partnership) , Vishal Kumar (AnalyticsWeek)

AI/ML Data Science IBM Keras Marketing NLP

ERRATA (As Reported by Peter: "The book Peter mentioned (at 46:20) by Stuart Russell, "Do the Right Thing", was published in 2003, and not recently"

In this session Peter Morgan, CEO Deep Learning Partnership sat with Vishal Kumar, CEO AnalyticsWeek and shared his thoughts around Deep Learning, Machine Learning and Artificial Intelligence. They've discussed some of the best practices when it comes to picking right solution, right vendor and what are some of the keyword means.

Here's Peter's Bio: Peter Morgan is a scientist-entrepreneur starting out in high energy physics enrolled in the PhD program at the University of Massachusetts at Amherst. After leaving UMass, and founding my own company, Peter has moved into computer networks, designing, implementing and troubleshooting global IP networks for companies such as Cisco, IBM and BT Labs. After getting an MBA and dabbling in financial trading algorithms. Peter has worked for three years on an experiment lead by Stanford University to measure the mass of the neutrino. Since 2012. He had been working in Data Science and Deep Learning, founding an AI Solutions company in Jan 2016.

As an entrepreneur Peter has founded companies in the AI, social media, and music industries. He has also served on the advisory board of technology startups. Peter is a popular speaker at conferences, meetups and webinars. He has cofounded and currently organize meetups in the deep learning space. Peter has business experience in the USA, UK and Europe.

Today, as CEO of Deep Learning Partnership, He leads the strategic direction and business development across product and services. This includes sales and marketing, lead generation, client engagement, recruitment, content creation and platform development. Deep Learning technologies used include computer vision and natural language processing and frameworks like TensorFlow, Keras and MXnet. Deep Learning Partnership design and implement AI solutions for our clients across all business domains.

Interested in sharing your thought leadership with our global listeners? Register your interest @ http://play.analyticsweek.com/guest/

WHAT IS POSSIBLE WITH DATA & MACHINE LEARNING IN 2017?

2017-02-01 · Superweek 2017

talk

by Tahir Fayyaz (/ Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert)

AI/ML Cloud Computing GCP

The advancements in data and machine learning have resulted in some fascinating results. We will explore some exciting recent projects that are using Google Cloud and the open source library TensorFlow with the aim to inspire you to build things you may have thought were not possible.

Etsy: Connecting shoppers with special items with Google AI

· Google Cloud Next '25

demo

AI/ML LLM product-bigquery product-cloud-bigtable product-cloud-run product-dataflow-cloud-storage product-tensorflow product-vertex-ai

Step into Etsy’s "Museum of Extraordinary Objects" where Gemini on Vertex AI curates 100M+ unique goods from makers around the world. Discover how Google AI connects Etsy's extraordinary items with the right buyers—transforming the art of finding what you love, faster.

talk-data.com

Activity Trend

Top Events

Top Speakers

Data Science and Engineering at Enterprise Scale

Machine Learning In The Enterprise

Hands-On Deep Learning with Apache Spark

Hands-On Artificial Intelligence for Beginners

Python Data Analytics: With Pandas, NumPy, and Matplotlib

Apache Spark Deep Learning Cookbook

IBM PowerAI: Deep Learning Unleashed on IBM Power Systems Servers

Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15

CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14

Project Common Voice

Peter Morgan, CEO, Deep Learning Partnership

WHAT IS POSSIBLE WITH DATA & MACHINE LEARNING IN 2017?

Etsy: Connecting shoppers with special items with Google AI