talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Practical MATLAB Modeling with Simulink: Programming and Simulating Ordinary and Partial Differential Equations

Employ the essential and hands-on tools and functions of MATLAB's ordinary differential equation (ODE) and partial differential equation (PDE) packages, which are explained and demonstrated via interactive examples and case studies. This book contains dozens of simulations and solved problems via m-files/scripts and Simulink models which help you to learn programming and modeling of more difficult, complex problems that involve the use of ODEs and PDEs. You’ll become efficient with many of the built-in tools and functions of MATLAB/Simulink while solving more complex engineering and scientific computing problems that require and use differential equations. Practical MATLAB Modeling with Simulink explains various practical issues of programming and modelling. After reading and using this book, you'll be proficient at using MATLAB and applying the source code from the book's examples as templates for your own projects in data science or engineering. What You Will Learn Model complex problems using MATLAB and Simulink Gain the programming and modeling essentials of MATLAB using ODEs and PDEs Use numerical methods to solve 1st and 2nd order ODEs Solve stiff, higher order, coupled, and implicit ODEs Employ numerical methods to solve 1st and 2nd order linear PDEs Solve stiff, higher order, coupled, and implicit PDEs Who This Book Is For Engineers, programmers, data scientists, and students majoring in engineering, applied/industrial math, data science, and scientific computing. This book continues where Apress' Beginning MATLAB and Simulink leaves off.

Modern Big Data Architectures

Provides an up-to-date analysis of big data and multi-agent systems The term Big Data refers to the cases, where data sets are too large or too complex for traditional data-processing software. With the spread of new concepts such as Edge Computing or the Internet of Things, production, processing and consumption of this data becomes more and more distributed. As a result, applications increasingly require multiple agents that can work together. A multi-agent system (MAS) is a self-organized computer system that comprises multiple intelligent agents interacting to solve problems that are beyond the capacities of individual agents. Modern Big Data Architectures examines modern concepts and architecture for Big Data processing and analytics. This unique, up-to-date volume provides joint analysis of big data and multi-agent systems, with emphasis on distributed, intelligent processing of very large data sets. Each chapter contains practical examples and detailed solutions suitable for a wide variety of applications. The author, an internationally-recognized expert in Big Data and distributed Artificial Intelligence, demonstrates how base concepts such as agent, actor, and micro-service have reached a point of convergence—enabling next generation systems to be built by incorporating the best aspects of the field. This book: Illustrates how data sets are produced and how they can be utilized in various areas of industry and science Explains how to apply common computational models and state-of-the-art architectures to process Big Data tasks Discusses current and emerging Big Data applications of Artificial Intelligence Modern Big Data Architectures: A Multi-Agent Systems Perspective is a timely and important resource for data science professionals and students involved in Big Data analytics, and machine and artificial learning.

Announcing Journal Club I am pleased to announce Data Skeptic is launching a new spin-off show called "Journal Club" with similar themes but a very different format to the Data Skeptic everyone is used to. In Journal Club, we will have a regular panel and occasional guest panelists to discuss interesting news items and one featured journal article every week in a roundtable discussion. Each week, I'll be joined by Lan Guo and George Kemp for a discussion of interesting data science related news articles and a featured journal or pre-print article. We hope that this podcast will give listeners an introduction to the works we cover and how people discuss these works. Our topics will often coincide with the original Data Skeptic podcast's current Interpretability theme, but we have few rules right now or what we pick. We enjoy discussing these items with each other and we hope you will do. In the coming weeks, we will start opening up the guest chair more often to bring new voices to our discussion. After that we'll be looking for ways we can engage with our audience. Keep reading and thanks for listening! Kyle

Build a Career in Data Science

You are going to need more than technical knowledge to succeed as a data scientist. Build a Career in Data Science teaches you what school leaves out, from how to land your first job to the lifecycle of a data science project, and even how to become a manager. About the Technology What are the keys to a data scientist’s long-term success? Blending your technical know-how with the right “soft skills” turns out to be a central ingredient of a rewarding career. About the Book Build a Career in Data Science is your guide to landing your first data science job and developing into a valued senior employee. By following clear and simple instructions, you’ll learn to craft an amazing resume and ace your interviews. In this demanding, rapidly changing field, it can be challenging to keep projects on track, adapt to company needs, and manage tricky stakeholders. You’ll love the insights on how to handle expectations, deal with failures, and plan your career path in the stories from seasoned data scientists included in the book. What's Inside Creating a portfolio of data science projects Assessing and negotiating an offer Leaving gracefully and moving up the ladder Interviews with professional data scientists About the Reader For readers who want to begin or advance a data science career. About the Authors Emily Robinson is a data scientist at Warby Parker. Jacqueline Nolis is a data science consultant and mentor. Quotes Full of useful advice, real-case scenarios, and contributions from professionals in the industry. - Sebastián Palma Mardones, ArchDaily The perfect companion for someone who wants to be a successful data scientist! - Gustavo Gomes, Brightcove Insightful overview of all aspects of a data science career. - Krzysztof Jędrzejewski, Pearson Highly recommended. - Hagai Luger, Clarizen

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Dr. Joe Sutherland (Search Discovery) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)
NLP

Did you know that there were monks in the 1400s doing text-based sentiment analysis? Can you name the 2016 movie that starred Amy Adams as a linguist? Have you ever laid awake at night wondering if stopword removal is ever problematic? Is the best therapist you ever had named ELIZA? The common theme across all of these questions is the broad and deep topic of natural language processing (NLP), a topic we've been wanting to form and exchange words regarding for quite some time. Dr. Joe Sutherland, the Head of Data Science at Search Discovery, joined the discussion and converted many of his thoughts on the subject into semantic constructs that, ultimately, were digitized into audio files for your auditory consumption. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next. Abstract This week on Making Data Simple, our guest is Emmanuel Letouzé, PhD, Director and Co-Founder of Data-Pop Alliance. Emmanuel comes from an extensive academic background, specializing in subjects such as data science, political science, economics and demography. He speaks about key takeaways he's had from his research, applying the knowledge to new projects and issues, including the recent pandemic of COVID-19.  Connect with Emmanuel LinkedIn Datapopalliance.org Twitter Art Manu Cartoons Twitter Show Notes 03:37 - Get some answers to your questions on social distancing in this NYTimes article here. 16:19 - Learn more on demography here. 20:31 - AI can now make predictions on demography on only knowing your name. Find out more here. Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn.  Producer Mark Simmonds - LinkedIn.  Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next. Abstract This week, we are joined by Professor Yoon Chung Han of San Jose State University. Yoon can be described as a digital media artist, leveraging data science methodologies within her creative projects. She talks us through some of the innovating projects she has been working on, while offering her insight on the state of the industry - specifically from an academic perspective. Connect with Yoon Portofolio Website LinkedIn Twitter Instagram Show Notes 02:26 - New to digital art? Check out this helpful article on where to start.  06:38 - Click here to learn more on the importance of art. 16:40 - Learn about data visualization here. 18:39 - Find out more about Processing and its' capabilities here. Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn.  Producer Mark Simmonds - LinkedIn.  Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Pandas 1.x Cookbook - Second Edition

The 'Pandas 1.x Cookbook' offers a recipe-based guide for mastering the powerful Python library, pandas. You will gain practical knowledge for handling and manipulating data efficiently, from the fundamentals to advanced techniques. The book is an essential resource for exploring and analyzing datasets with pandas. What this Book will help me do Understand and apply data exploration techniques in pandas. Use pandas to manipulate, aggregate, and clean datasets to extract meaningful insights. Combine pandas with Matplotlib and Seaborn to create effective visualizations. Perform time series analysis and transform datasets for machine learning. Implement workflows for handling large-scale data that exceeds your computer's memory. Author(s) Matthew Harrison and Theodore Petrou are highly experienced educators and practitioners in data science and Python programming. With their extensive expertise in using pandas, they provide insights through practical exercises and approachable narratives. Their aim is to make complex concepts accessible to learners of varying skill levels. Who is it for? This book is ideal for Python programmers, analysts, and data scientists seeking to expand their data handling and analysis capabilities. It caters to both beginners who are new to pandas and those looking to deepen their understanding of its advanced features. If your goal is to explore, clean, and analyze complex datasets efficiently, this book is tailored for you.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.  Abstract This week on Making Data Simple, our guest is Aarti Cherian, Program Director for IBM's Cloud Pak for Data and Watson Data Science Marketing. Aarti discusses key marketing tactics that are currently leveraged by teams at IBM.  Connect with Aarti LinkedIn Twitter Show Notes 3:01 - Check out this article on marketing techniques for tech companies.  14:41 - Not sure what B to B means? Find out here. 21:24 - Learn more about IBM Cloud Pak for Data here. 21:28 - Learn more about IBM Cloud Pak for Data System here. Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn.  Producer Mark Simmonds - LinkedIn.  Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Fala, Data Hackers! Seja bem-vindo a mais um episódio do podcast de Ciência de Dados da maior comunidade de Data Science do Brasil-sil-sil! No episódio de hoje falaremos de uma das melhores amigas da área de dados: a Estatística!

No episódio de hoje, convidamos os Estatísticos Luciana Lima — Head de Analytics na A3Data — e André Calaça — Co-fundador da Oper — para falar sobre como eles trabalham com Data Science, como Cientistas de Dados podem aprender Estatísticas, como Estatísticos podem se tornar Cientistas de Dados, e se Python é melhor que R mesmo.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.  Abstract This week on Making Data Simple, our guest is Frank Kane, Founder of Sundog Software. Frank's diverse career has allowed him to develop a thorough understanding of various data science and business concepts. The conversation ranges from how Frank got his start developing video games, to the importance of maintaining your skills for personal marketability. Host Al Martin and Frank also discuss the growing concern for ethics in computing that may be unknown to some.    Connect with Frank LinkedIn Sundog Education Udemy Twitter Show Notes 07:07 - See here how Netflix uses machine learning to recommend what to watch.  13:21 - "Good intention are not enough." Click here to checkout a similar Forbes article. 15:52 - Check out this medium article, emphasizing the need for humanities majors in tech. 27:18 - Here are 5 ways to keep up your coding skills while working as a manager. Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn.  Producer Mark Simmonds - LinkedIn.  Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Advances in Data Science

Data science unifies statistics, data analysis and machine learning to achieve a better understanding of the masses of data which are produced today, and to improve prediction. Special kinds of data (symbolic, network, complex, compositional) are increasingly frequent in data science. These data require specific methodologies, but there is a lack of reference work in this field. Advances in Data Science fills this gap. It presents a collection of up-to-date contributions by eminent scholars following two international workshops held in Beijing and Paris. The 10 chapters are organized into four parts: Symbolic Data, Complex Data, Network Data and Clustering. They include fundamental contributions, as well as applications to several domains, including business and the social sciences.

Principles of Managerial Statistics and Data Science

Introduces readers to the principles of managerial statistics and data science, with an emphasis on statistical literacy of business students Through a statistical perspective, this book introduces readers to the topic of data science, including Big Data, data analytics, and data wrangling. Chapters include multiple examples showing the application of the theoretical aspects presented. It features practice problems designed to ensure that readers understand the concepts and can apply them using real data. Over 100 open data sets used for examples and problems come from regions throughout the world, allowing the instructor to adapt the application to local data with which students can identify. Applications with these data sets include: Assessing if searches during a police stop in San Diego are dependent on driver’s race Visualizing the association between fat percentage and moisture percentage in Canadian cheese Modeling taxi fares in Chicago using data from millions of rides Analyzing mean sales per unit of legal marijuana products in Washington state Topics covered in Principles of Managerial Statistics and Data Science include:data visualization; descriptive measures; probability; probability distributions; mathematical expectation; confidence intervals; and hypothesis testing. Analysis of variance; simple linear regression; and multiple linear regression are also included. In addition, the book offers contingency tables, Chi-square tests, non-parametric methods, and time series methods. The textbook: Includes academic material usually covered in introductory Statistics courses, but with a data science twist, and less emphasis in the theory Relies on Minitab to present how to perform tasks with a computer Presents and motivates use of data that comes from open portals Focuses on developing an intuition on how the procedures work Exposes readers to the potential in Big Data and current failures of its use Supplementary material includes: a companion website that houses PowerPoint slides; an Instructor's Manual with tips, a syllabus model, and project ideas; R code to reproduce examples and case studies; and information about the open portal data Features an appendix with solutions to some practice problems Principles of Managerial Statistics and Data Science is a textbook for undergraduate and graduate students taking managerial Statistics courses, and a reference book for working business professionals.

Statistics and Probability with Applications for Engineers and Scientists Using MINITAB, R and JMP, 2nd Edition

Introduces basic concepts in probability and statistics to data science students, as well as engineers and scientists Aimed at undergraduate/graduate-level engineering and natural science students, this timely, fully updated edition of a popular book on statistics and probability shows how real-world problems can be solved using statistical concepts. It removes Excel exhibits and replaces them with R software throughout, and updates both MINITAB and JMP software instructions and content. A new chapter discussing data mining—including big data, classification, machine learning, and visualization—is featured. Another new chapter covers cluster analysis methodologies in hierarchical, nonhierarchical, and model based clustering. The book also offers a chapter on Response Surfaces that previously appeared on the book’s companion website. Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP, Second Edition is broken into two parts. Part I covers topics such as: describing data graphically and numerically, elements of probability, discrete and continuous random variables and their probability distributions, distribution functions of random variables, sampling distributions, estimation of population parameters and hypothesis testing. Part II covers: elements of reliability theory, data mining, cluster analysis, analysis of categorical data, nonparametric tests, simple and multiple linear regression analysis, analysis of variance, factorial designs, response surfaces, and statistical quality control (SQC) including phase I and phase II control charts. The appendices contain statistical tables and charts and answers to selected problems. Features two new chapters—one on Data Mining and another on Cluster Analysis Now contains R exhibits including code, graphical display, and some results MINITAB and JMP have been updated to their latest versions Emphasizes the p-value approach and includes related practical interpretations Offers a more applied statistical focus, and features modified examples to better exhibit statistical concepts Supplemented with an Instructor's-only solutions manual on a book’s companion website Statistics and Probability with Applications for Engineers and Scientists using MINITAB, R and JMP is an excellent text for graduate level data science students, and engineers and scientists. It is also an ideal introduction to applied statistics and probability for undergraduate students in engineering and the natural sciences.

Summary Every business collects data in some fashion, but sometimes the true value of the collected information only comes when it is combined with other data sources. Data trusts are a legal framework for allowing businesses to collaboratively pool their data. This allows the members of the trust to increase the value of their individual repositories and gain new insights which would otherwise require substantial effort in duplicating the data owned by their peers. In this episode Tom Plagge and Greg Mundy explain how the BrightHive platform serves to establish and maintain data trusts, the technical and organizational challenges they face, and the outcomes that they have witnessed. If you are curious about data sharing strategies or data collaboratives, then listen now to learn more!

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Tom Plagge and Gregory Mundy about BrightHive, a platform for building data trusts

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what a data trust is?

Why might an organization want to build one?

What is BrightHive and what is its origin story? Beyond having a storage location with access controls, what are the components of a data trust that are necessary for them to be viable? What are some of the challenges that are common in establishing an agreement among organizations who are participating in a data trust?

What are the responsibilities of each of the participants in a data trust? For an individual or organization who wants to participate in an existing trust, what is involved in gaining access?

How does BrightHive support the process of building a data trust? How is ownership of derivative data sets/data products and associated intellectual property handled in the context of a trust? How is the technical architecture of BrightHive implemented and how has it evolved since it first started? What are some of the ways that you approach the challenge of data privacy in these sharing agreements? What are some legal and technical guards that you implement to encourage ethical uses of the data contained in a trust? What is the motivation for releasing the technical elements of BrightHive as open source? What are some of the most interesting, innovative, or inspirational ways that you have seen BrightHive used? Being a shared platform for empowering other organizations to collaborate I imagine there is a strong focus on long-term sustainability. How are you approaching that problem and what is the business model for BrightHive? What have you found to be the most interesting/unexpected/challenging aspects of building and growing the technical and business infrastructure of BrightHive? What do you have planned for the future of BrightHive?

Contact Info

Tom

LinkedIn tplagge on GitHub

Gregory

LinkedIn gregmundy on GitHub @graygoree on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

BrightHive Data Science For Social Good Workforce Data Initiative NASA NOAA Data Trust Data Collaborative Public Benefit Corporation Terraform Airflow

Podcast.init Episode

Dagster

Podcast Episode

Secure Multi-Party Computation Public Key Encryption AWS Macie Blockchain Smart Contracts

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

FIRESIDE WITH KRIS EWALD

With the proliferation of event based analytics and sophisticated real time data science and the parallel maturing of token based economice there's a cross roads where they meet worth a fireside chat. Often we get stuck in privacy debate or broad statements of data as the new oil, but it deserves a much more nuanced discussion and look ahead at where we can go...

The Data Science Workshop

The Data Science Workshop is designed for beginners looking to step into the rigorous yet rewarding world of data science. By leveraging a hands-on approach, this book demystifies key concepts and guides you gently into creating practical machine learning models with Python. What this Book will help me do Understand supervised and unsupervised learning and their applications. Gain hands-on experience with Python libraries like scikit-learn and pandas for data manipulation. Learn practical use cases of machine learning techniques such as regression and clustering. Discover techniques to ensure robustness in machine learning with hyperparameter tuning and ensembling. Develop efficiency in feature engineering with automated tools to accelerate workflows. Author(s) Anthony So None, Thomas Joseph, Robert Thas John, and Andrew Worsley are seasoned experts in data science and Python programming. Along with Dr. Samuel Asare None, they bring decades of experience and practical knowledge to this book, delivering an engaging and approachable learning experience. Who is it for? This book is targeted toward individuals who are beginners in data science and are eager to acquire foundational knowledge and practical skills. It appeals to those who prefer a structured, hands-on approach to learning, possibly having some prior programming experience or interest in Python. Professionals aspiring to pivot into data-oriented roles or students aiming to strengthen their understanding of data science concepts will find this book particularly valuable. If you're looking to gain confidence in implementing data science projects and solving real-world problems, this text is for you.