talk-data.com talk-data.com

Topic

NLP

Natural Language Processing (NLP)

ai machine_learning text_analysis

252

tagged

Activity Trend

24 peak/qtr
2020-Q1 2026-Q1

Activities

252 activities · Newest first

We talked about:

Alvaro’s background Working as a QA (Quality Assurance) engineer Transitioning from QA to Machine Learning Gathering knowledge about ML field Searching for an ML job (improving soft skills and CV) Data science interview skills Zoomcamp projects Zoomcamp project deployment How to not undersell yourself during interviews Alvaro’s experience with interviews during his transition Alvaro’s Zoomcamp notes Alvaro’s coach The importance of mathematical knowledge to a transition into ML Preparing for technical interviews Alvaro’s typical workday Alvaro’s team’s tech stack The importance of a technical background to transitioning into ML

Links:

Alvaro's CV: https://www.dropbox.com/s/89hkt3ug0toqa2n/CV%20nou%20-%20angl%C3%A8s.pdf?dl=0 Github profile: https://github.com/ziritrion LinkedIn profile: https://www.linkedin.com/in/alvaronavas/

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcampJoin 

DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Practical MATLAB Deep Learning: A Projects-Based Approach

Harness the power of MATLAB for deep-learning challenges. Practical MATLAB Deep Learning, Second Edition, remains a one-of a-kind book that provides an introduction to deep learning and using MATLAB's deep-learning toolboxes. In this book, you’ll see how these toolboxes provide the complete set of functions needed to implement all aspects of deep learning. This edition includes new and expanded projects, and covers generative deep learning and reinforcement learning. Over the course of the book, you'll learn to model complex systems and apply deep learning to problems in those areas. Applications include: Aircraft navigation An aircraft that lands on Titan, the moon of Saturn, using reinforcement learning Stock market prediction Natural language processing Music creation usng generative deep learning Plasma control Earth sensor processing for spacecraft MATLAB Bluetooth data acquisition applied to dance physics What You Will Learn Explore deep learning using MATLAB and compare it to algorithms Write a deep learning function in MATLAB and train it with examples Use MATLAB toolboxes related to deep learning Implement tokamak disruption prediction Now includes reinforcement learning Who This Book Is For Engineers, data scientists, and students wanting a book rich in examples on deep learning using MATLAB.

We talked about:

Christiaan’s background Usual ways of collecting and curating data Getting the buy-in from experts and executives Starting an annotation booklet Pre-labeling Dataset collection Human level baseline and feedback Using the annotation booklet to boost annotation productivity Putting yourself in the shoes of annotators (and measuring performance) Active learning Distance supervision Weak labeling Dataset collection in career positioning and project portfolios IPython widgets GDPR compliance and non-English NLP Finding Christiaan online

Links:

My personal blog: https://useml.net/ Comtura, my company: https://comtura.ai/ LI: https://www.linkedin.com/in/christiaan-swart-51a68967/ Twitter: https://twitter.com/swartchris8/

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Comet for Data Science

Discover how to manage and optimize the life cycle of your data science projects with Comet! By the end of this book, you will master preparing, analyzing, building, and deploying models, as well as integrating Comet into your workflow. What this Book will help me do Master managing data science workflows with Comet. Confidently prepare and analyze your data for effective modeling. Deploy and monitor machine learning models using Copet tools. Integrate Comet with DevOps and GitLab workflows for production readiness. Apply Comet to advanced topics like NLP, deep learning, and time series analysis. Author(s) Angelica Lo Duca is an experienced author and data scientist with years of expertise in data science workflows and tools. She brings practical insights into integrating platforms like Comet into modern data science tasks. Who is it for? If you are a data science practitioner or programmer looking to understand and implement efficient project lifecycles using Comet, this book is tailored for you. A basic backdrop in data science and programming is highly recommended, but prior expertise in Comet is unnecessary.

Hands-On Healthcare Data

Healthcare is the next frontier for data science. Using the latest in machine learning, deep learning, and natural language processing, you'll be able to solve healthcare's most pressing problems: reducing cost of care, ensuring patients get the best treatment, and increasing accessibility for the underserved. But first, you have to learn how to access and make sense of all that data. This book provides pragmatic and hands-on solutions for working with healthcare data, from data extraction to cleaning and harmonization to feature engineering. Author Andrew Nguyen covers specific ML and deep learning examples with a focus on producing high-quality data. You'll discover how graph technologies help you connect disparate data sources so you can solve healthcare's most challenging problems using advanced analytics. You'll learn: Different types of healthcare data: electronic health records, clinical registries and trials, digital health tools, and claims data The challenges of working with healthcare data, especially when trying to aggregate data from multiple sources Current options for extracting structured data from clinical text How to make trade-offs when using tools and frameworks for normalizing structured healthcare data How to harmonize healthcare data using terminologies, ontologies, and mappings and crosswalks

Building a Lakehouse for Data Science at DoorDash

DoorDash was using a data warehouse but found that they needed more data transparency, lower costs, and the ability to handle streaming data as well as batch data. With an engineering team rooted in big data backgrounds at Uber and LinkedIn, they moved to a Lakehouse architecture intuitively, without knowing about the term. In this session, learn more about how they arrived at that architecture, the process of making the move, and the results they have seen. While addressing both data analysts and data scientists from their lakehouse, this session will focus on their machine learning operations, and how their efficiencies are enabling them to tackle more advanced use cases such as NLP and image classification.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

State-of-the-Art Natural Language Processing with Apache Spark NLP

This session teaches how & why to use the open-source Spark NLP library. Spark NLP provides state-of-the-art accuracy, speed, and scalability for language understanding by delivering production-grade implementations of recent research advances. Spark NLP is the most widely used NLP library in the enterprise today; provides thousands of current, supported, pre-trained models for 200+ languages out of the box; and is the only open-source NLP library that can natively scale to use any Apache Spark cluster.

We’ll walk through Python code running common NLP tasks like document classification, named entity recognition, sentiment analysis, spell checking, question answering, and translation. The discussion of each task includes the latest advances in deep learning and transfer learning used to tackle it. We’ll also cover new free tools for data annotation, no-code active learning & transfer learning, easily deploying NLP models as production-grade services, and sharing models you’ve trained.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Achieve Machine Learning Hyper-Productivity with Transformers and Hugging Face

According to the latest State of AI report, "transformers have emerged as a general-purpose architecture for ML. Not just for Natural Language Processing, but also Speech, Computer Vision or even protein structure prediction." Indeed, the Transformer architecture has proven very efficient on a wide variety of Machine Learning tasks. But how can we keep up with the frantic pace of innovation? Do we really need expert skills to leverage these state-of-the-art models? Or is there a shorter path to creating business value in less time? In this code-level talk, we'll gradually build and deploy a demo involving several Transformer models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, how they can help you become hyper-productive in order to deliver high-quality Machine Learning solutions faster than ever before.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Adversarial AI—The Nature of the Threat, Impacts, and Mitigation Strategies

Adversarial AI/ML is an emerging research area focused on the vulnerabilities of Artificial Intelligence (AI)/Machine Learning (ML) models to adversarial exploitation such as data poisoning, adversarial perturbations, inference and extraction attacks. This research area is of particular interest to domains where AI/ML models play an essential role in the mission-critical decision making processes. In this presentation, we will give a review of the four principal categories of Adversarial AI. We will discuss each one of these, supported by the relevant and interesting examples, and we will discuss the future implications. We will present in greater depth our research in Adversarial NLP, backed by the specific data poisoning and adversarial perturbation examples attacks on NLP classifiers. We will conclude the presentation by discussing the current mitigation approaches and methods, and offer some general recommendations for how to best address the Adversarial AI exploits.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Health Care and Life Sciences Experience at Data + AI Summit 2022

Welcome data teams and executives in the Healthcare and Life Sciences industry! This year’s Data + AI Summit is jam-packed with talks, demos and discussions on the biggest innovations in patient care and drug R&D. To help you take full advantage of the Healthcare and Life Sciences experience at Summit, we’ve curated all the programs in one place.

Highlights at this year’s Summit:

Healthcare and Life Sciences Industry Forum: Our capstone event for Healthcare and Life Sciences attendees at Summit featuring keynotes and panel discussions with Walgreens, Takeda, Optum, and Humana followed by networking. More details in the agenda below. Healthcare and Life Sciences Lounge: Stop by our industry lounge located outside the Expo floor to meet with Databricks’ industry experts and see solutions from our partners including ZS Associates, John Snow Labs and others. Session Talks: Over 10 technical talks on topics including healthcare NLP, knowledge graphs for R&D, commercial analytics, and predicting hospital readmissions.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

In 2020, OpenAI launched GPT-3, a large language AI model that is demonstrating the potential to radically change how we interact with software, and open up a completely new paradigm for cognitive software applications.

Today’s episode features Sandra Kublik and Shubham Saboo, authors of GPT-3: Building Innovative NLP Products Using Large Language Models. We discuss what makes GPT-3 unique, transformative use-cases it has ushered in, the technology powering GPT-3, its risks and limitations, whether scaling models is the path to “Artificial General Intelligence”, and more.

Announcement

For the next seven days, DataCamp Premium and DataCamp for Teams are free. Gain free access by following going here. 

We talked about:

Merve’s background Merve’s first contributions to open source What Merve currently does at Hugging Face (Hub, Spaces) What is means to be a developer advocacy engineer at Hugging Face The best way to get open source experience (Google Summer of Code, Hacktoberfest, and sprints) The peculiarities of hiring as it relates to code contributions Best resources to learn about NLP besides Hugging Face Good first projects for NLP The most important topics in NLP right now NLP ML Engineer vs NLP Data Scientist Project recommendations and other advice to catch the eye of recruiters Merve on Twitch and her podcast Finding Merve online Merve and Mario Kart

Links:

Hugging Face Course: https://hf.co/course Natural Language Processing in TensorFlow: https://www.coursera.org/learn/natural-language-processing-tensorflow Github ML Poetry: https://github.com/merveenoyan/ML-poetry Tackling multiple tasks with a single visual language model: https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model Hugging Face big science/TOpp: https://huggingface.co/bigscience/T0pp Pathways Language Model (PaLM) blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Advanced Analytics with PySpark

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming. Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing. If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis. Familiarize yourself with Spark's programming model and ecosystem Learn general approaches in data science Examine complete implementations that analyze large public datasets Discover which machine learning tools make sense for particular problems Explore code that can be adapted to many uses

AI-Powered Business Intelligence

Use business intelligence to power corporate growth, increase efficiency, and improve corporate decision making. With this practical book featuring hands-on examples in Power BI with basic Python and R code, you'll explore the most relevant AI use cases for BI, including improved forecasting, automated classification, and AI-powered recommendations. And you'll learn how to draw insights from unstructured data sources like text, document, and image files. Author Tobias Zwingmann helps BI professionals, business analysts, and data analytics understand high-impact areas of artificial intelligence. You'll learn how to leverage popular AI-as-a-service and AutoML platforms to ship enterprise-grade proofs of concept without the help of software engineers or data scientists. Learn how AI can generate business impact in BI environments Use AutoML for automated classification and improved forecasting Implement recommendation services to support decision-making Draw insights from text data at scale with NLP services Extract information from documents and images with computer vision services Build interactive user frontends for AI-powered dashboard prototypes Implement an end-to-end case study for building an AI-powered customer analytics dashboard

An estimated 80 to 90 percent of the data in an enterprise is text. Sadly, this rich information is mostly neglected for analytical purposes. Textual data is typically full of information, but also very complex to interpret computationally and statistically. Why? Because textual data is both content and context. The same words and sentences can have very different meanings depending on the context. Textual data is truly a goldmine, but how can we mine it without being digital superpowers like Google, Microsoft or Facebook? To answer this question and many more relating to interpretation of textual data, I recently spoke to Bill Inmon. Bill is the Founder, Chairman and CEO of Forest Rim Technology and author of more than 60 books on data warehousing. He is often described as the Father of Data Warehousing due to his pioneering efforts in making data and data technologies available to organisations across all industries and sizes. In this episode of Leaders of Analytics, we discuss: How Bill became the Father of Data WarehousingThe history of data warehousing and the most exciting developments in this space todayThe typical challenges holding us back from extracting value from textual dataThe concept of the “Textual ETL” and it’s benefits over other text data storage and analytics approachesWhy NLP is not the best approach for textual data analyticsThe biggest opportunities for textual analytics today and in the future, and much more.Connect with Bill: Forest Rim Technnology: https://www.forestrimtech.com/ Bill on LinkedIn: https://www.linkedin.com/in/billinmon/

Artificial Intelligence with Power BI

Discover how to enhance your data analysis with 'Artificial Intelligence with Power BI,' a resource designed to teach you how to leverage Power BI's AI capabilities. You will learn practical methods for enriching your analytics with forecasting, anomaly detection, and machine learning, equipping you to create intelligent, insightful BI reports. What this Book will help me do Learn how to apply AI capabilities such as forecasting and anomaly detection to enrich your reports and drive actionable insights. Explore data preparation techniques optimized for AI, ensuring your datasets are structured for advanced analytics. Develop skills to integrate Azure Machine Learning and Cognitive Services into Power BI, expanding your analytical toolset. Understand how to build Q&A interfaces and integrate Natural Language Processing into your BI solutions. Gain expertise in training and deploying your own machine learning models to achieve tailored insights and predictive analytics. Author(s) None Diepeveen is an experienced data analyst and Power BI expert with a passion for making advanced analytics accessible to professionals. With years of hands-on experience working in the data analytics field, they deliver insights using intuitive, practical approaches through clear and engaging tutorials. Who is it for? This book is ideal for data analysts and BI developers who aim to expand their analytics capabilities with AI. Readers should already be familiar with Power BI and are looking for a resource to teach them how to incorporate predictive and advanced AI techniques into their reporting workflow. Whether you're seeking to gain a professional edge or enhance your organization's data storytelling and insights, this guide is perfect for you.

The Kaggle Book

The Kaggle Book is an essential guide for anyone aiming to excel in data science through Kaggle competitions. With expert advice from Kaggle Grandmasters, you'll learn practical techniques for handling data, creating robust models, and improving your ranking in competitions. This book is packed with insights on advanced topics like ensembling, validation, and evaluation metrics. What this Book will help me do Master the Kaggle platform, including its Notebooks, Datasets, and Discussion capabilities. Enhance model performance using techniques like feature engineering, AutoML, and ensembling strategies. Apply advanced validation schemes to improve the reliability of your predictions. Tackle diverse competition types, including NLP, computer vision, and optimization challenges. Build a professional portfolio to showcase your data science expertise and attract career opportunities. Author(s) Konrad Banachewicz and Luca Massaron, authoritative Kaggle Grandmasters, bring their wealth of experience in competitive data science to this book. They have collectively competed in numerous Kaggle challenges and possess deep insights into what differentiates successful Kagglers. Their guidance combines practicality with expertise, making this book a must-have for aspiring data scientists looking to make an impact. Who is it for? This book is tailored for data analysts and scientists interested in enhancing their Kaggle performance, as well as those new to Kaggle who wish to explore competitive data science. It suits individuals with basic knowledge of machine learning, aiming to develop and demonstrate their skills further. The content is valuable for practitioners aiming to build a professional profile or secure roles in the tech industry.

In this episode, Bryce and Conor finish their interview with Andrei Alexandrescu. Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachWebsite ADSP: The PodcastAbout the Guest: Andrei Alexandrescu specializes in all aspects of designing and implementing software systems, as well as Machine Learning applied to Natural Language Processing and Speech Recognition. He has authored three best-selling books (The D Programming Language, 2010; C++ Coding Standards, 2004; Modern C++ Design, 2001), and dozens of papers and articles in conference proceedings and trade magazines. Show Notes Date Recorded: 2022-02-15 Date Released: 2022-03-11 Andrei Alexandrescu on TwitterD Programming LanguageCategory Theory for Programmers by Bartosz MilewskiIterators Must Go by Andrei Alexandrescu, BoostCon 2009C++Now 2017: Ali Çehreli “Competitive Advantage with D”Fastware - Andrei Alexandrescu - NDC London 2017CUDA C++ Programming GuideADSP Episode 51: Efficiency vs SpeedIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Bryce and Conor interview Andrei Alexandrescu. Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachWebsite ADSP: The PodcastAbout the Guest: Andrei Alexandrescu specializes in all aspects of designing and implementing software systems, as well as Machine Learning applied to Natural Language Processing and Speech Recognition. He has authored three best-selling books (The D Programming Language, 2010; C++ Coding Standards, 2004; Modern C++ Design, 2001), six peer-reviewed papers, and dozens of articles in trade magazines. Show Notes Date Recorded: 2022-02-15 Date Released: 2022-03-11 Andrei Alexandrescu on TwitterEric Neibler ADSP EpisodesSean Parent ADSP EpisodesChandler Carruth ADSP EpisodesPatricia Aas ADSP EpisodesPacfic Northwest C++ Users’ GroupModern C++ Design by Andrei AlexandrescuD Programming LanguageC++23 chunk_by proposalD chunkByReal NetworksUniversity of WashingtonEmotional Code - Kate Gregory [ACCU Conference 2019]Impostor SyndromeIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

We talked about:

Ivan’s role at Personio Ivan’s background Studying technical management Managing a software team NLP teams NLP engineers Becoming an NLP engineer Computer vision NLP engineer vs ML engineer Conversational designers Linguistics outside of chatbots When does a team need an NLP engineer or a linguist? The future of NLP NLP pipelines GPT-3 Problems of GPT-3 Does GPT-3 make everything obsolete? What NLP actually is? Does NLP solve problems better than humans? State of language translation NLP Pandect

Links:

https://github.com/ivan-bilan/The-NLP-Pandect https://github.com/ivan-bilan/The-Engineering-Manager-Pandect https://github.com/ivan-bilan/The-Microservices-Pandect Ivan's presentation about NLP: https://www.youtube.com/watch?v=VRur3xey31s

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html