Yusan Lin shares her research on using data science to explore the fashion industry in this episode. She has applied techniques from data mining, natural language processing, and social network analysis to explore who are the innovators in the fashion world and how their influence effects other designers. If you found this episode interesting and would like to read more, Yusan's papers Text-Generated Fashion Influence Model: An Empirical Study on Style.com and The Hidden Influence Network in the Fashion Industry are worth reading.
talk-data.com
Topic
NLP
Natural Language Processing (NLP)
252
tagged
Activity Trend
Top Events
Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases
This episode overviews some of the fundamental concepts of natural language processing including stemming, n-grams, part of speech tagging, and th bag of words approach.
This book focuses on the basic concepts and the related technologies of data mining for social medial. Topics include: big data and social data, data mining for making a hypothesis, multivariate analysis for verifying the hypothesis, web mining and media mining, natural language processing, social big data applications, and scalability. It explains analytical techniques such as modeling, data mining, and multivariate analysis for social big data. This book is different from other similar books in that presents the overall picture of social big data from fundamental concepts to applications while standing on academic bases.
During the reception of a piece of information, we are never passive. Depending on its origin and content, from our personal beliefs and convictions, we bestow upon this piece of information, spontaneously or after reflection, a certain amount of confidence. Too much confidence shows a degree of naivety, whereas an absolute lack of it condemns us as being paranoid. These two attitudes are symmetrically detrimental, not only to the proper perception of this information but also to its use. Beyond these two extremes, each person generally adopts an intermediate position when faced with the reception of information, depending on its provenance and credibility. We still need to understand and explain how these judgements are conceived, in what context and to what end. Spanning the approaches offered by philosophy, military intelligence, algorithmics and information science, this book presents the concepts of information and the confidence placed in it, the methods that militaries, the first to be aware of the need, have or should have adopted, tools to help them, and the prospects that they have opened up. Beyond the military context, the book reveals ways to evaluate information for the good of other fields such as economic intelligence, and, more globally, the informational monitoring by governments and businesses. Contents 1. Information: Philosophical Analysis and Strategic Applications, Mouhamadou El Hady Ba and Philippe Capet. 2. Epistemic Trust, Gloria Origgi. 3. The Fundamentals of Intelligence, Philippe Lemercier. 4. Information Evaluation in the Military Domain: Doctrines, Practices and Shortcomings, Philippe Capet and Adrien Revault d'Allonnes. 5. Multidimensional Approach to Reliability Evaluation of Information Sources, Frédéric Pichon, Christophe Labreuche, Bertrand Duqueroie and Thomas Delavallade. 6. Uncertainty of an Event and its Markers in Natural Language Processing, Mouhamadou El Hady Ba, Stéphanie Brizard, Tanneguy Dulong and Bénédicte Goujon. 7. Quantitative Information Evaluation: Modeling and Experimental Evaluation, Marie-Jeanne Lesot, Frédéric Pichon and Thomas Delavallade. 8. When Reported Information Is Second Hand, Laurence Cholvy. 9. An Architecture for the Evolution of Trust: Definition and Impact of the Necessary Dimensions of Opinion Making, Adrien Revault d'Allonnes. About the Authors Philippe Capet is a project manager and research engineer at Ektimo, working mainly on information management and control in military contexts. Thomas Delavallade is an advanced studies engineer at Thales Communications & Security, working on social media mining in the context of crisis management, cybersecurity and the fight against cybercrime.
This IBM® Redbooks® publication describes visual development, visualization, adapters, analytics, and accelerators for IBM InfoSphere® Streams (V3), a key component of the IBM Big Data platform. Streams was designed to analyze data in motion, and can perform analysis on incredibly high volumes with high velocity, using a wide variety of analytic functions and data types. The Visual Development environment extends Streams Studio with drag-and-drop development, provides round tripping with existing text editors, and is ideal for rapid prototyping. Adapters facilitate getting data in and out of Streams, and V3 supports WebSphere MQ, Apache Hadoop Distributed File System, and IBM InfoSphere DataStage. Significant analytics include the native Streams Processing Language, SPSS Modeler analytics, Complex Event Processing, TimeSeries Toolkit for machine learning and predictive analytics, Geospatial Toolkit for location-based applications, and Annotation Query Language for natural language processing applications. Accelerators for Social Media Analysis and Telecommunications Event Data Analysis sample programs can be modified to build production level applications. Want to learn how to analyze high volumes of streaming data or implement systems requiring high performance across nodes in a cluster? Then this book is for you. Please note that the additional material referenced in the text is not available from IBM.
To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment. This handy glossary also includes a chapter of key terms that help define many of these tool categories: NoSQL Databases—Document-oriented databases using a key/value interface rather than SQL MapReduce—Tools that support distributed computing on large datasets Storage—Technologies for storing data in a distributed way Servers—Ways to rent computing power on remote machines Processing—Tools for extracting valuable information from large datasets Natural Language Processing—Methods for extracting information from human-created text Machine Learning—Tools that automatically perform data analyses, based on results of a one-off analysis Visualization—Applications that present meaningful data graphically Acquisition—Techniques for cleaning up messy public data sources Serialization—Methods to convert data structure or object state into a storable format
A beginner-friendly workshop covering how LLMs work, NLP basics, transformers & attention, prompt engineering, and building AI agents with Retrieval-Augmented Generation (RAG). Includes a live demo: Your First AI Agent.
Hands-on, beginner-friendly workshop covering LLM basics, Python, LangChain, LangGraph, retrieval-augmented generation (RAG), prompt engineering, LangChain introduction, and workflow automation with LangGraph, including a live demo of building your first AI agent.
AI is reshaping NetOps from scripted automation to intelligent, data driven workflows. We will show uses: incident triage, knowledge retrieval, traffic analysis, prediction, and contrast legacy monitoring with ML, NLP, and LLMs. See how RAG, text to SQL, and agent workflows enable real time insights across hybrid data. We will outline data pipelines and MLOps, address accuracy, reliability, cost, compliance, and weigh build vs buy. We will cover API integration and human in the loop guardrails.
We examine the capabilities and challenges of using Large Language Models (LLMs) in task-oriented dialogue settings, particularly situated dynamic Minecraft-like environments. Our work focuses on two interconnected aspects: using LLMs as Minecraft agents in builder and architect roles, and their ability to ask clarification questions in asynchronous instruction-giver/instruction-follower settings. To achieve this we prepared a new unified corpus that combines annotations for reference, ambiguity, and discourse structure, enabling systematic evaluation of clarification behavior. Through platform-based interaction and comparison with human data, we find notable differences: humans rarely ask clarification questions for referential ambiguity but often do for task uncertainty, while LLMs show the opposite tendency. We further explore whether LLMs’ question-asking behavior is influenced by their reasoning capabilities, observing that explicit reasoning increases both the frequency and relevance of clarification questions. Our findings highlight both the promise and current limitations of LLMs in handling ambiguity and improving interactive task performance.
Foundations of LLMs and Python Basics; Understanding Natural Language Processing; Transformers and Attention; LLM Development: Fine-tuning and Prompt Engineering; Retrieval-Augmented Generation (RAG); Introduction to LLM Agents; Advanced Topics for Production LLM Application