talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Beginning Mathematica and Wolfram for Data Science: Applications in Data Analysis, Machine Learning, and Neural Networks

Enhance your data science programming and analysis with the Wolfram programming language and Mathematica, an applied mathematical tools suite. The book will introduce you to the Wolfram programming language and its syntax, as well as the structure of Mathematica and its advantages and disadvantages. You’ll see how to use the Wolfram language for data science from a theoretical and practical perspective. Learning this language makes your data science code better because it is very intuitive and comes with pre-existing functions that can provide a welcoming experience for those who use other programming languages. You’ll cover how to use Mathematica where data management and mathematical computations are needed. Along the way you’ll appreciate how Mathematica provides a complete integrated platform: it has a mixed syntax as a result of its symbolic and numerical calculations allowing it to carry out various processes without superfluous lines of code. You’ll learn to use its notebooks as a standard format, which also serves to create detailed reports of the processes carried out. What You Will Learn Use Mathematica to explore data and describe the concepts using Wolfram language commands Create datasets, work with data frames, and create tables Import, export, analyze, and visualize data Work with the Wolfram data repository Build reports on the analysis Use Mathematica for machine learning, with different algorithms, including linear, multiple, and logistic regression; decision trees; and data clustering Who This Book Is For Data scientists new to using Wolfram and Mathematica as a language/tool to program in. Programmers should have some prior programming experience, but can be new to the Wolfram language.

No episódio de hoje do seu podcast de ciência de dados favorito, nós iremos falar sobre a linguagem R e como ela tem sido utilizada em Data Science, Machine Learning e Análise de Dados. E para nos dar uma verdadeira aula sobre tudo que a ferramenta tem para nos oferecer, convidamos Gabriela de Queiroz — Machine Learning Manager na IBM e fundadora do R-Ladies — e Athos Damiani — Cientista de Dados e cofundador do Curso R. Vem com a gente que esse podcast está incrível!

Acesse nosso post do Medium para ter acesso a links e referências: https://medium.com/data-hackers/linguagem-r-com-gabriela-de-queiroz-e-athos-damiani-data-hackers-podcast-35-a21b053e2636

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, IBM Expert Services Delivery, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts.

This week on Making Data Simple, we have Kristen Summers who is a distinguished Engineer in Cloud and Cognitive Expert Labs. Kristen has worked in Artificial Intelligence and Data Science, PHD in Computer Science, and leads Data Science within our Expert Labs, 

Show Notes 2: 08 - More time needs to be spend on culture and talent management. 3:55 - What does data driven culture mean? 8:49 – What do you see driving fundamental culture? 11:14 - What common tool do we have? 12:55 – What is communicate about data? 14:42 – How do you know you’re doing it well? 17:29 - How do you define AI talent? 23:18 - Describe a Data Scientist? 27:25 - Common Organizational Structures  31:49 - How do you manage and grow AI talent? IBM Skills Academy     Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Big Data Science in Finance

Explains the mathematics, theory, and methods of Big Data as applied to finance and investing Data science has fundamentally changed Wall Street—applied mathematics and software code are increasingly driving finance and investment-decision tools. Big Data Science in Finance examines the mathematics, theory, and practical use of the revolutionary techniques that are transforming the industry. Designed for mathematically-advanced students and discerning financial practitioners alike, this energizing book presents new, cutting-edge content based on world-class research taught in the leading Financial Mathematics and Engineering programs in the world. Marco Avellaneda, a leader in quantitative finance, and quantitative methodology author Irene Aldridge help readers harness the power of Big Data. Comprehensive in scope, this book offers in-depth instruction on how to separate signal from noise, how to deal with missing data values, and how to utilize Big Data techniques in decision-making. Key topics include data clustering, data storage optimization, Big Data dynamics, Monte Carlo methods and their applications in Big Data analysis, and more. This valuable book: Provides a complete account of Big Data that includes proofs, step-by-step applications, and code samples Explains the difference between Principal Component Analysis (PCA) and Singular Value Decomposition (SVD) Covers vital topics in the field in a clear, straightforward manner Compares, contrasts, and discusses Big Data and Small Data Includes Cornell University-tested educational materials such as lesson plans, end-of-chapter questions, and downloadable lecture slides Big Data Science in Finance: Mathematics and Applications is an important, up-to-date resource for students in economics, econometrics, finance, applied mathematics, industrial engineering, and business courses, and for investment managers, quantitative traders, risk and portfolio managers, and other financial practitioners.

We talked about development advocacy for data science.

We covered

The role of a developer advocate The skills needed for the job and the responsibilities How to become a developer advocate

You can find Elle on:

Twitter: https://twitter.com/DrElleOBrien LinkedIn: https://linkedin.com/in/drelleobrien DVC's youtube channel: https://www.youtube.com/channel/UC37rp97Go-xIX3aNFVHhXfQ

Join DataTalks.Club: https://datatalks.club

Send us a text Rob Thomas, leader of IBM’s Data and AI division, talks with host Al Martin about the need to de-mystify AI. In particular, Rob recommends a "fail fast" approach to data science: run wide-ranging but short-term experiments — and expect disappointment on the way to insight. This episode offers a host of such suggestions, plus thoughtful leadership advice and tips for motivation. Part 1 of 2.


Show Notes 00:00 - Check us out on YouTube and SoundCloud.  00:10 - Connect with Producer Steve Moore on LinkedIn and Twitter.  00:15 - Connect with Producer Liam Seston on LinkedIn and Twitter.  00:20 - Connect with Producer Rachit Sharma on LinkedIn.  00:25 - Connect with Host Al Martin on LinkedIn and Twitter.  00:55 - Connect with Rob Thomas on LinkedIn and Twitter. 04:01 - Discover what big data and A.I. have in store.  06:22 - Learn more about IBM's Think conference here. 06:48 - Read more on Watson anywhere here. 10:24 - There is no A.I. without I.A. 23:21 - Check out Al's talk at Think 2019 with Jeff Jonas here. 23:51 - Check out this interview with Rob Thomas at Think 2019 here. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Predictive Analytics: Data Mining, Machine Learning and Data Science for Practitioners, 2nd Edition

Use Predictive Analytics to Uncover Hidden Patterns and Correlations and Improve Decision-Making Using predictive analytics techniques, decision-makers can uncover hidden patterns and correlations in their data and leverage these insights to improve many key business decisions. In this thoroughly updated guide, Dr. Dursun Delen illuminates state-of-the-art best practices for predictive analytics for both business professionals and students. Delen provides a holistic approach covering key data mining processes and methods, relevant data management techniques, tools and metrics, advanced text and web mining, big data integration, and much more. Balancing theory and practice, Delen presents intuitive conceptual illustrations, realistic example problems, and real-world case studiesincluding lessons from failed projects. It is all designed to help you gain a practical understanding you can apply for profit. * Leverage knowledge extracted via data mining to make smarter decisions * Use standardized processes and workflows to make more trustworthy predictions * Predict discrete outcomes (via classification), numeric values (via regression), and changes over time (via time-series forecasting) * Understand predictive algorithms drawn from traditional statistics and advanced machine learning * Discover cutting-edge techniques, and explore advanced applications ranging from sentiment analysis to fraud detection .

Applied Data Science Using PySpark: Learn the End-to-End Predictive Model-Building Cycle

Discover the capabilities of PySpark and its application in the realm of data science. This comprehensive guide with hand-picked examples of daily use cases will walk you through the end-to-end predictive model-building cycle with the latest techniques and tricks of the trade. Applied Data Science Using PySpark is divided unto six sections which walk you through the book. In section 1, you start with the basics of PySpark focusing on data manipulation. We make you comfortable with the language and then build upon it to introduce you to the mathematical functions available off the shelf. In section 2, you will dive into the art of variable selection where we demonstrate various selection techniques available in PySpark. In section 3, we take you on a journey through machine learning algorithms, implementations, and fine-tuning techniques. We will also talk about different validation metrics and how to use them for picking the best models. Sections 4 and 5 go through machine learning pipelines and various methods available to operationalize the model and serve it through Docker/an API. In the final section, you will cover reusable objects for easy experimentation and learn some tricks that can help you optimize your programs and machine learning pipelines. By the end of this book, you will have seen the flexibility and advantages of PySpark in data science applications. This book is recommended to those who want to unleash the power of parallel computing by simultaneously working with big datasets. What You Will Learn Build an end-to-end predictive model Implement multiple variable selection techniques Operationalize models Master multiple algorithms and implementations Who This Book is For Data scientists and machine learning and deep learning engineers who want to learn and use PySpark for real-time analysis of streamingdata.

Implementing dbt at large enterprises
video
by Ryan Goltz (Chesapeake Energy) , Ben Singleton (JetBlue) , Amy Chen (Fishtown Analytics)

What does it look like to implement dbt at an organization where the number of employees is in the thousands? In this video we'll learn from the people who have answered exactly this question at organizations like JetBlue and Chesapeake Energy.

Speakers: Chris Holliday (Moderator), Senior VP, Client Management with Visual BI Amy Chen, Solutions Architect with Fishtown Analytics Ryan Goltz, Lead Data Strategist with Chesapeake Energy Ben Singleton, Director of Data Science & Analytics with JetBlue

We talked about: 

Dat's career so far and the startup he co-founded (Priceloop) Who to hire first in a data team How to hire the first data scientist

And many other things!

You can find Dat on LinkedIn: https://www.linkedin.com/in/dat-tran-a1602320/

Join DataTalksClub: https://datatalks.club

Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success. In this episode Jesse Anderson shares the lessons that he has learned while working with dozens of businesses across industries to determine the team structures and communication styles that have generated the best results. If you are struggling to deliver value from big data, or just starting down the path of building the organizational capacity to turn raw information into valuable products then this is a conversation that you don’t want to miss.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. Your host is Tobias Macey and today I’m interviewing Jesse Anderson about best practices for organizing and managing data teams

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of how you view the mission and responsibilities of a data team?

What are the critical elements of a successful data team? Beyond the core pillars of data science, data engineering, and operations, what other specialized roles do you find hel

Applied Regression Modeling, 3rd Edition

Master the fundamentals of regression without learning calculus with this one-stop resource The newly and thoroughly revised 3rd Edition of Applied Regression Modeling delivers a concise but comprehensive treatment of the application of statistical regression analysis for those with little or no background in calculus. Accomplished instructor and author Dr. Iain Pardoe has reworked many of the more challenging topics, included learning outcomes and additional end-of-chapter exercises, and added coverage of several brand-new topics including multiple linear regression using matrices. The methods described in the text are clearly illustrated with multi-format datasets available on the book's supplementary website. In addition to a fulsome explanation of foundational regression techniques, the book introduces modeling extensions that illustrate advanced regression strategies, including model building, logistic regression, Poisson regression, discrete choice models, multilevel models, Bayesian modeling, and time series forecasting. Illustrations, graphs, and computer software output appear throughout the book to assist readers in understanding and retaining the more complex content. Applied Regression Modeling covers a wide variety of topics, like: Simple linear regression models, including the least squares criterion, how to evaluate model fit, and estimation/prediction Multiple linear regression, including testing regression parameters, checking model assumptions graphically, and testing model assumptions numerically Regression model building, including predictor and response variable transformations, qualitative predictors, and regression pitfalls Three fully described case studies, including one each on home prices, vehicle fuel efficiency, and pharmaceutical patches Perfect for students of any undergraduate statistics course in which regression analysis is a main focus, Applied Regression Modeling also belongs on the bookshelves of non-statistics graduate students, including MBAs, and for students of vocational, professional, and applied courses like data science and machine learning.

Deployment and Usage Guide for Running AI Workloads on Red Hat OpenShift and NVIDIA DGX Systems with IBM Spectrum Scale

This IBM® Redpaper publication describes the architecture, installation procedure, and results for running a typical training application that works on an automotive data set in an orchestrated and secured environment that provides horizontal scalability of GPU resources across physical node boundaries for deep neural network (DNN) workloads. This paper is mostly relevant for systems engineers, system administrators, or system architects that are responsible for data center infrastructure management and typical day-to-day operations such as system monitoring, operational control, asset management, and security audits. This paper also describes IBM Spectrum® LSF® as a workload manager and IBM Spectrum Discover as a metadata search engine to find the right data for an inference job and automate the data science workflow. With the help of this solution, the data location, which may be on different storage systems, and time of availability for the AI job can be fully abstracted, which provides valuable information for data scientists.

podcast_episode
by Kyle Polich , Jonathan Lai (University of Rochester) , Jiebo Luo (University of Rochester) , Neil Yeung (University of Rochester)

As the COVID-19 pandemic continues, the public (or at least those with Twitter accounts) are sharing their personal opinions about mask-wearing via Twitter. What does this data tell us about public opinion? How does it vary by demographic? What, if anything, can make people change their minds? Today we speak to, Neil Yeung and Jonathan Lai, Undergraduate students in the Department of Computer Science at the University of Rochester, and Professor of Computer Science, Jiebo-Luoto to discuss their recent paper. Face Off: Polarized Public Opinions on Personal Face Mask Usage during the COVID-19 Pandemic. Works Mentioned https://arxiv.org/abs/2011.00336 Emails: Neil Yeung [email protected] Jonathan Lia [email protected] Jiebo Luo [email protected] Thanks to our sponsors! Springboard School of Data offers a comprehensive career program encompassing data science, analytics, engineering, and Machine Learning. All courses are online and tailored to fit the lifestyle of working professionals. Up to 20 Data Skeptic listeners will receive $500 scholarships. Apply today at springboard.com/datasketpic Check out Brilliant's group theory course to learn about object-oriented design! Brilliant is great for learning something new or to get an easy-to-look-at review of something you already know. Check them out a Brilliant.org/dataskeptic to get 20% off of a year of Brilliant Premium!

Leading with AI and Analytics: Build Your Data Science IQ to Drive Business Value

Lead your organization to become evidence-driven Data. It’s the benchmark that informs corporate projections, decision-making, and analysis. But, why do many organizations that see themselves as data-driven fail to thrive? In Leading with AI and Analytics, two renowned experts from the Kellogg School of Management show business leaders how to transform their organization to become evidence-driven, which leads to real, measurable changes that can help propel their companies to the top of their industries. The availability of unprecedented technology-enabled tools has made AI (Artificial Intelligence) an essential component of business analytics. But what’s often lacking are the leadership skills to integrate these technologies to achieve maximum value. Here, the authors provide a comprehensive game plan for developing that all-important human factor to get at the heart of data science: the ability to apply analytical thinking to real-world problems. Each of these tools and techniques comes to powerful life through a wealth of powerful case studies and real-world success stories. Inside, you’ll find the essential tools to help you: Written for anyone in a leadership or management role—from C-level/unit team managers to rising talent—this powerful, hands-on guide meets today’s growing need for real-world tools to lead and succeed with data. Develop a strong data science intuition quotient Lead and scale AI and analytics throughout your organization Move from “best-guess” decision making to evidence-based decisions Craft strategies and tactics to create real impact

Essential Statistics for Non-STEM Data Analysts

Essential Statistics for Non-STEM Data Analysts is your comprehensive guide to mastering the statistical concepts needed for data science. By working through real-world datasets and Python-based examples, you'll learn how to interpret data and build insightful analyses. This book demystifies statistics, making it accessible to anyone aiming to become proficient in data analysis. What this Book will help me do Learn how to preprocess, clean, and prepare data for analysis using Python. Master the foundations of statistical methods such as hypothesis testing and probability theory. Develop skills to interpret and explain statistical results in the context of data science. Understand how statistical concepts apply to machine learning tasks like classification and regression. Build confidence in statistical principles to tackle interviews and enhance your career prospects. Author(s) None Li is an experienced data scientist and educator with a strong focus on making abstract statistical concepts intuitive and applicable. With a background in designing data science curriculums, None has a passion for teaching statistics to individuals from diverse and often non-mathematical backgrounds. Through clear explanations and practical examples, None aims to empower everyone to excel in data analysis and machine learning. Who is it for? This book caters specifically to data analysts, data science enthusiasts, and developers eager to enhance their statistical knowledge. It's crafted for readers transitioning into data science who may lack a strong mathematical or statistics background. If you have a basic grasp of Python programming and a keen interest in understanding how to work effectively with data, this book is a perfect fit. Beginners and students aiming to familiarize themselves with statistical foundations for data-oriented careers will greatly benefit from this resource.

Machine Learning and Data Science Blueprints for Finance

Over the next few decades, machine learning and data science will transform the finance industry. With this practical book, analysts, traders, researchers, and developers will learn how to build machine learning algorithms crucial to the industry. You'll examine ML concepts and over 20 case studies in supervised, unsupervised, and reinforcement learning, along with natural language processing (NLP). Ideal for professionals working at hedge funds, investment and retail banks, and fintech firms, this book also delves deep into portfolio management, algorithmic trading, derivative pricing, fraud detection, asset price prediction, sentiment analysis, and chatbot development. You'll explore real-life problems faced by practitioners and learn scientifically sound solutions supported by code and examples. This book covers: Supervised learning regression-based models for trading strategies, derivative pricing, and portfolio management Supervised learning classification-based models for credit default risk prediction, fraud detection, and trading strategies Dimensionality reduction techniques with case studies in portfolio management, trading strategy, and yield curve construction Algorithms and clustering techniques for finding similar objects, with case studies in trading strategies and portfolio management Reinforcement learning models and techniques used for building trading strategies, derivatives hedging, and portfolio management NLP techniques using Python libraries such as NLTK and scikit-learn for transforming text into meaningful representations

Reinforcement Learning

Reinforcement learning (RL) will deliver one of the biggest breakthroughs in AI over the next decade, enabling algorithms to learn from their environment to achieve arbitrary goals. This exciting development avoids constraints found in traditional machine learning (ML) algorithms. This practical book shows data science and AI professionals how to learn by reinforcement and enable a machine to learn by itself. Author Phil Winder of Winder Research covers everything from basic building blocks to state-of-the-art practices. You'll explore the current state of RL, focus on industrial applications, learn numerous algorithms, and benefit from dedicated chapters on deploying RL solutions to production. This is no cookbook; doesn't shy away from math and expects familiarity with ML. Learn what RL is and how the algorithms help solve problems Become grounded in RL fundamentals including Markov decision processes, dynamic programming, and temporal difference learning Dive deep into a range of value and policy gradient methods Apply advanced RL solutions such as meta learning, hierarchical learning, multi-agent, and imitation learning Understand cutting-edge deep RL algorithms including Rainbow, PPO, TD3, SAC, and more Get practical examples through the accompanying website

The Big R-Book

Introduces professionals and scientists to statistics and machine learning using the programming language R Written by and for practitioners, this book provides an overall introduction to R, focusing on tools and methods commonly used in data science, and placing emphasis on practice and business use. It covers a wide range of topics in a single volume, including big data, databases, statistical machine learning, data wrangling, data visualization, and the reporting of results. The topics covered are all important for someone with a science/math background that is looking to quickly learn several practical technologies to enter or transition to the growing field of data science. The Big R-Book for Professionals: From Data Science to Learning Machines and Reporting with R includes nine parts, starting with an introduction to the subject and followed by an overview of R and elements of statistics. The third part revolves around data, while the fourth focuses on data wrangling. Part 5 teaches readers about exploring data. In Part 6 we learn to build models, Part 7 introduces the reader to the reality in companies, Part 8 covers reports and interactive applications and finally Part 9 introduces the reader to big data and performance computing. It also includes some helpful appendices. Provides a practical guide for non-experts with a focus on business users Contains a unique combination of topics including an introduction to R, machine learning, mathematical models, data wrangling, and reporting Uses a practical tone and integrates multiple topics in a coherent framework Demystifies the hype around machine learning and AI by enabling readers to understand the provided models and program them in R Shows readers how to visualize results in static and interactive reports Supplementary materials includes PDF slides based on the book’s content, as well as all the extracted R-code and is available to everyone on a Wiley Book Companion Site The Big R-Book is an excellent guide for science technology, engineering, or mathematics students who wish to make a successful transition from the academic world to the professional. It will also appeal to all young data scientists, quantitative analysts, and analytics professionals, as well as those who make mathematical models.