talk-data.com
People (31 results)
See all 31 →Activities & events
| Title & Speakers | Event |
|---|---|
|
Named Entity Recognition
2019-06-08 · 18:16
Linh Da
– guest
,
Kyle Polich
– host
Kyle and Linh Da discuss the class of approaches called "Named Entity Recognition" or NER. NER algorithms take any string as input and return a list of "entities" - specific facts and agents in the text along with a classification of the type (e.g. person, date, place). |
|
|
Neural Turing Machines
2019-05-25 · 16:05
Linh Da
– guest
,
Kyle Polich
– host
Kyle and Linh Da discuss the concepts behind the neural Turing machine. |
|
|
Very Large Corpora and Zipf's Law
2019-01-18 · 16:00
Linh Da
– guest
,
Kyle Polich
– host
The earliest efforts to apply machine learning to natural language tended to convert every token (every word, more or less) into a unique feature. While techniques like stemming may have cut the number of unique tokens down, researchers always had to face a problem that was highly dimensional. Naive Bayes algorithm was celebrated in NLP applications because of its ability to efficiently process highly dimensional data. Of course, other algorithms were applied to natural language tasks as well. While different algorithms had different strengths and weaknesses to different NLP problems, an early paper titled Scaling to Very Very Large Corpora for Natural Language Disambiguation popularized one somewhat surprising idea. For many NLP tasks, simply providing a large corpus of examples not only improved accuracy, but it also showed that asymptotically, some algorithms yielded more improvement from working on very, very large corpora. Although not explicitly in about NLP, the noteworthy paper The Unreasonable Effectiveness of Data emphasizes this point further while paying homage to the classic treatise The Unreasonable Effectiveness of Mathematics in the Natural Sciences. In this episode, Kyle shares a few thoughts along these lines with Linh Da. The discussion winds up with a brief introduction to Zipf's law. When applied to natural language, Zipf's law states that the frequency of any given word in a corpus (regardless of language) will be proportional to its rank in the frequency table. |
|
|
Let's Talk About Natural Language Processing
2019-01-04 · 16:15
Kyle Polich
– host
,
Lucy Park
– guest
This episode reboots our podcast with the theme of Natural Language Processing for the next few months. We begin with introductions of Yoshi and Linh Da and then get into a broad discussion about natural language processing: what it is, what some of the classic problems are, and just a bit on approaches. Finishing out the show is an interview with Lucy Park about her work on the KoNLPy library for Korean NLP in Python. If you want to share your NLP project, please join our Slack channel. We're eager to see what listeners are working on! http://konlpy.org/en/latest/ |
|
|
[MINI] Automated Feature Engineering
2017-02-24 · 16:00
Linh Da
– guest
,
Kyle Polich
– host
If a CEO wants to know the state of their business, they ask their highest ranking executives. These executives, in turn, should know the state of the business through reports from their subordinates. This structure is roughly analogous to a process observed in deep learning, where each layer of the business reports up different types of observations, KPIs, and reports to be interpreted by the next layer of the business. In deep learning, this process can be thought of as automated feature engineering. DNNs built to recognize objects in images may learn structures that behave like edge detectors in the first hidden layer. Proceeding layers learn to compose more abstract features from lower level outputs. This episode explore that analogy in the context of automated feature engineering. Linh Da and Kyle discuss a particular image in this episode. The image included below in the show notes is drawn from the work of Lee, Grosse, Ranganath, and Ng in their paper Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. |
|
|
[MINI] Primer on Deep Learning
2017-02-10 · 16:00
Linh Da
– guest
,
Kyle Polich
– host
In this episode, we talk about a high-level description of deep learning. Kyle presents a simple game (pictured below), which is more of a puzzle really, to try and give Linh Da the basic concept. Thanks to our sponsor for this week, the Data Science Association. Please check out their upcoming Dallas conference at dallasdatascience.eventbrite.com |
|
|
[MINI] The CAP Theorem
2016-06-17 · 15:00
Linh Da
– guest
,
Kyle Polich
– host
Distributed computing cannot guarantee consistency, accuracy, and partition tolerance. Most system architects need to think carefully about how they should appropriately balance the needs of their application across these competing objectives. Linh Da and Kyle discuss the CAP Theorem using the analogy of a phone tree for alerting people about a school snow day. |
|
|
[MINI] Bargaining
2016-05-06 · 15:00
Linh Da
– guest
,
Kyle Polich
– host
Bargaining is the process of two (or more) parties attempting to agree on the price for a transaction. Game theoretic approaches attempt to find two strategies from which neither party is motivated to deviate. These strategies are said to be in equilibrium with one another. The equilibriums available in bargaining depend on the the transaction mechanism and the information of the parties. Discounting (how long parties are willing to wait) has a significant effect in this process. This episode discusses some of the choices Kyle and Linh Da made in deciding what offer to make on a house. |
|
|
Potholes
2016-03-25 · 15:24
Linh Da
– guest
,
Ben Berkowitz
– CEO and founder
@ SeeClickFix
,
Kyle Polich
– host
,
Chelsea Ursaner
– Unknown
@ LA City Open Data Team
,
Russ Klettke
– Editor
@ pothole.info
Co-host Linh Da was in a biking accident after hitting a pothole. She sustained an injury that required stitches. This is the story of our quest to file a 311 complaint and track it through the City of Los Angeles's open data portal. My guests this episode are Chelsea Ursaner (LA City Open Data Team), Ben Berkowitz (CEO and founder of SeeClickFix), and Russ Klettke (Editor of pothole.info) |
|
|
[MINI] z-scores
2015-05-15 · 05:08
Linh Da
– guest
,
Kyle Polich
– host
This week's episode dicusses z-scores, also known as standard score. This score describes the distance (in standard deviations) that an observation is away from the mean of the population. A closely related top is the 68-95-99.7 rule which tells us that (approximately) 68% of a normally distributed population lies within one standard deviation of the mean, 95 within 2, and 99.7 within 3. Kyle and Linh Da discuss z-scores in the context of human height. If you'd like to calculate your own z-score for height, you can do so below. They further discuss how a z-score can also describe the likelihood that some statistical result is due to chance. Thus, if the significance of a finding can be said to be 3σ, that means that it's 99.7% likely not due to chance, or only 0.3% likely to be due to chance. |
|