talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Mastering Large Datasets with Python

Modern data science solutions need to be clean, easy to read, and scalable. In Mastering Large Datasets with Python, author J.T. Wolohan teaches you how to take a small project and scale it up using a functionally influenced approach to Python coding. You’ll explore methods and built-in Python tools that lend themselves to clarity and scalability, like the high-performing parallelism method, as well as distributed technologies that allow for high data throughput. The abundant hands-on exercises in this practical tutorial will lock in these essential skills for any large-scale data science project. About the Technology Programming techniques that work well on laptop-sized data can slow to a crawl—or fail altogether—when applied to massive files or distributed datasets. By mastering the powerful map and reduce paradigm, along with the Python-based tools that support it, you can write data-centric applications that scale efficiently without requiring codebase rewrites as your requirements change. About the Book Mastering Large Datasets with Python teaches you to write code that can handle datasets of any size. You’ll start with laptop-sized datasets that teach you to parallelize data analysis by breaking large tasks into smaller ones that can run simultaneously. You’ll then scale those same programs to industrial-sized datasets on a cluster of cloud servers. With the map and reduce paradigm firmly in place, you’ll explore tools like Hadoop and PySpark to efficiently process massive distributed datasets, speed up decision-making with machine learning, and simplify your data storage with AWS S3. What's Inside An introduction to the map and reduce paradigm Parallelization with the multiprocessing module and pathos framework Hadoop and Spark for distributed computing Running AWS jobs to process large datasets About the Reader For Python programmers who need to work faster with more data. About the Author J. T. Wolohan is a lead data scientist at Booz Allen Hamilton, and a PhD researcher at Indiana University, Bloomington. Quotes A clear and efficient path to mastery of the map and reduce paradigm for developers of all levels. - Justin Fister, GrammarBot An amazing book for anybody looking to add parallel processing and the map/reduce pattern to their toolkit. - Gary Bake, Radius Payment Solutions Learn fundamentals of MapReduce and other core concepts and save money on expensive hardware! - Al Krinker, USPTO A comprehensive guide to the fundamentals of efficient Python data processing. - Craig Pfeifer, MITRE Corporation

Data Science Programming All-in-One For Dummies

Your logical, linear guide to the fundamentals of data science programming Data science is exploding—in a good way—with a forecast of 1.7 megabytes of new information created every second for each human being on the planet by 2020 and 11.5 million job openings by 2026. It clearly pays dividends to be in the know. This friendly guide charts a path through the fundamentals of data science and then delves into the actual work: linear regression, logical regression, machine learning, neural networks, recommender engines, and cross-validation of models. Data Science Programming All-In-One For Dummies is a compilation of the key data science, machine learning, and deep learning programming languages: Python and R. It helps you decide which programming languages are best for specific data science needs. It also gives you the guidelines to build your own projects to solve problems in real time. Get grounded: the ideal start for new data professionals What lies ahead: learn about specific areas that data is transforming Be meaningful: find out how to tell your data story See clearly: pick up the art of visualization Whether you’re a beginning student or already mid-career, get your copy now and add even more meaning to your life—and everyone else’s!

Practical Data Science with R, Second Edition

Practical Data Science with R, Second Edition takes a practice-oriented approach to explaining basic principles in the ever expanding field of data science. You’ll jump right to real-world use cases as you apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support. About the Technology Evidence-based decisions are crucial to success. Applying the right data analysis techniques to your carefully curated business data helps you make accurate predictions, identify trends, and spot trouble in advance. The R data analysis platform provides the tools you need to tackle day-to-day data analysis and machine learning tasks efficiently and effectively. About the Book Practical Data Science with R, Second Edition is a task-based tutorial that leads readers through dozens of useful, data analysis practices using the R language. By concentrating on the most important tasks you’ll face on the job, this friendly guide is comfortable both for business analysts and data scientists. Because data is only useful if it can be understood, you’ll also find fantastic tips for organizing and presenting data in tables, as well as snappy visualizations. What's Inside Statistical analysis for business pros Effective data presentation The most useful R tools Interpreting complicated predictive models About the Reader You’ll need to be comfortable with basic statistics and have an introductory knowledge of R or another high-level programming language. About the Authors Nina Zumel and John Mount founded a San Francisco–based data science consulting firm. Both hold PhDs from Carnegie Mellon University and blog on statistics, probability, and computer science. Quotes Full of useful shared experience and practical advice. Highly recommended. - From the Foreword by Jeremy Howard and Rachel Thomas Great examples and an informative walk-through of the data science process. - David Meza, NASA Offers interesting perspectives that cover many aspects of practical data science; a good reference. - Pascal Barbedor, BL SET R you ready to get data science done the right way? - Taylor Dolezal, Disney Studios

Practical DataOps: Delivering Agile Data Science at Scale

Gain a practical introduction to DataOps, a new discipline for delivering data science at scale inspired by practices at companies such as Facebook, Uber, LinkedIn, Twitter, and eBay. Organizations need more than the latest AI algorithms, hottest tools, and best people to turn data into insight-driven action and useful analytical data products. Processes and thinking employed to manage and use data in the 20th century are a bottleneck for working effectively with the variety of data and advanced analytical use cases that organizations have today. This book provides the approach and methods to ensure continuous rapid use of data to create analytical data products and steer decision making. Practical DataOps shows you how to optimize the data supply chain from diverse raw data sources to the final data product, whether the goal is a machine learning model or other data-orientated output. The book provides an approach to eliminate wasted effort and improve collaboration between data producers, data consumers, and the rest of the organization through the adoption of lean thinking and agile software development principles. This book helps you to improve the speed and accuracy of analytical application development through data management and DevOps practices that securely expand data access, and rapidly increase the number of reproducible data products through automation, testing, and integration. The book also shows how to collect feedback and monitor performance to manage and continuously improve your processes and output. What You Will Learn Develop a data strategy for your organization to help it reach its long-term goals Recognize and eliminate barriers to delivering data to users at scale Work on the right things for the right stakeholders through agile collaboration Create trust in data via rigorous testing and effective data management Build a culture of learning and continuous improvement through monitoring deployments and measuring outcomes Create cross-functional self-organizing teams focused on goals not reporting lines Build robust, trustworthy, data pipelines in support of AI, machine learning, and other analytical data products Who This Book Is For Data science and advanced analytics experts, CIOs, CDOs (chief data officers), chief analytics officers, business analysts, business team leaders, and IT professionals (data engineers, developers, architects, and DBAs) supporting data teams who want to dramatically increase the value their organization derives from data. The book is ideal for data professionals who want to overcome challenges of long delivery time, poor data quality, high maintenance costs, and scaling difficulties in getting data science output and machine learning into customer-facing production.

Como Data Science irá acabar nos próximos anos? A área vai morrer de vez ou vai se adaptar? O que podemos fazer para salvá-la e continuar trazendo valor para o mundo? É isso que você vai saber no papo de hoje do podcast do Data Hackers.

Para esse episódio, trouxemos todos os Community Managers do Data Hackers para imaginarmos o que pode acabar com Data Science nos próximos anos. Com a gente no episódio de hoje estão Marlesson Santana, Mario Filho, e Pietro Oliveira, trazendo de forma descontraída as suas opiniões.

Acesse nosso post para ter acesso as referências que falamos: https://medium.com/data-hackers/como-data-science-ir%C3%A1-morrer-data-hackers-podcast-18-112e545e73d9

Beginning MATLAB and Simulink: From Novice to Professional

Employ essential and hands-on tools and functions of the MATLAB and Simulink packages, which are explained and demonstrated via interactive examples and case studies. This book contains dozens of simulation models and solved problems via m-files/scripts and Simulink models which help you to learn programming and modeling essentials. You’ll become efficient with many of the built-in tools and functions of MATLAB/Simulink while solving engineering and scientific computing problems. Beginning MATLAB and Simulink explains various practical issues of programming and modelling in parallel by comparing MATLAB and Simulink. After reading and using this book, you'll be proficient at using MATLAB and applying the source code from the book's examples as templates for your own projects in data science or engineering. What You Will Learn Get started using MATLAB and Simulink Carry out data visualization with MATLAB Gain the programming and modeling essentials of MATLAB Build a GUI with MATLAB Work with integration and numerical root finding methods Apply MATLAB to differential equations-based models and simulations Use MATLAB for data science projects Who This Book Is For Engineers, programmers, data scientists, and students majoring in engineering and scientific computing.

The Decision Maker's Handbook to Data Science: A Guide for Non-Technical Executives, Managers, and Founders

Data science is expanding across industries at a rapid pace, and the companies first to adopt best practices will gain a significant advantage. To reap the benefits, decision makers need to have a confident understanding of data science and its application in their organization. It is easy for novices to the subject to feel paralyzed by intimidating buzzwords, but what many don’t realize is that data science is in fact quite multidisciplinary—useful in the hands of business analysts, communications strategists, designers, and more. With the second edition of The Decision Maker’s Handbook to Data Science, you will learn how to think like a veteran data scientist and approach solutions to business problems in an entirely new way. Author Stylianos Kampakis provides you with the expertise and tools required to develop a solid data strategy that is continuously effective. Ethics and legal issues surrounding data collection and algorithmic bias are some common pitfalls that Kampakis helps you avoid, while guiding you on the path to build a thriving data science culture at your organization. This updated and revised second edition, includes plenty of case studies, tools for project assessment, and expanded content for hiring and managing data scientists Data science is a language that everyone at a modern company should understand across departments. Friction in communication arises most often when management does not connect with what a data scientist is doing or how impactful data collection and storage can be for their organization. The Decision Maker’s Handbook to Data Science bridges this gap and readies you for both the present and future of your workplace in this engaging, comprehensive guide. What You Will Learn Understand how data science can be used within your business. Recognize the differences between AI, machine learning, and statistics. Become skilled at thinking like a data scientist, without being one. Discover how to hire and manage data scientists. Comprehend how to build the right environment in order to make your organization data-driven. Who This Book Is For Startup founders, product managers, higher level managers, and any other non-technical decision makers who are thinking to implement data science in their organization and hire data scientists. A secondary audience includes people looking for a soft introduction into the subject of data science.

Reporting, Predictive Analytics, and Everything in Between

Business decisions today are tactical and strategic at the same time. How do you respond to a competitor’s price change? Or to specific technology changes? What new products, markets, or businesses should you pursue? Decisions like these are based on information from only one source: data. With this practical report, technical and non-technical leaders alike will explore the fundamental elements necessary to embark on a data analytics initiative. Is your company planning or contemplating a data analytics initiative? Authors Brett Stupakevich, David Sweenor, and Shane Swiderek from TIBCO guide you through several analytics options. IT leaders, product developers, analytics leaders, data analysts, data scientists, and business professionals will learn how to deploy analytic components in streaming and embedded systems using one of five platforms. You’ll examine: Analytics platforms including embedded BI, reporting, data exploration & discovery, streaming BI, and data science & machine learning The business problems each option solves and the capabilities and requirements of each How to identify the right analytics type for your particular use case Key considerations and the level of investment for each analytics platform

Advanced Statistics with Applications in R

Advanced Statistics with Applications in R fills the gap between several excellent theoretical statistics textbooks and many applied statistics books where teaching reduces to using existing packages. This book looks at what is under the hood. Many statistics issues including the recent crisis with p-value are caused by misunderstanding of statistical concepts due to poor theoretical background of practitioners and applied statisticians. This book is the product of a forty-year experience in teaching of probability and statistics and their applications for solving real-life problems. There are more than 442 examples in the book: basically every probability or statistics concept is illustrated with an example accompanied with an R code. Many examples, such as Who said π? What team is better? The fall of the Roman empire, James Bond chase problem, Black Friday shopping, Free fall equation: Aristotle or Galilei, and many others are intriguing. These examples cover biostatistics, finance, physics and engineering, text and image analysis, epidemiology, spatial statistics, sociology, etc. Advanced Statistics with Applications in R teaches students to use theory for solving real-life problems through computations: there are about 500 R codes and 100 datasets. These data can be freely downloaded from the author's website dartmouth.edu/~eugened. This book is suitable as a text for senior undergraduate students with major in statistics or data science or graduate students. Many researchers who apply statistics on the regular basis find explanation of many fundamental concepts from the theoretical perspective illustrated by concrete real-world applications.

Managing Data Science

Discover how to successfully manage data science projects and build high-performing teams with 'Managing Data Science.' This book provides actionable insights on handling the entire data science workflow, from conception to production, and addresses common challenges with practical strategies. What this Book will help me do Understand the fundamentals of building scalable and efficient data science pipelines. Acquire techniques to manage every stage of data science projects effectively, from prototype to production. Learn proven strategies for assembling, cultivating, and sustaining a skilled data science team. Explore the latest tools, methodologies, and best practices in ModelOps and DevOps for data science. Gain insights into troubleshooting and optimizing data science workflows to achieve organizational goals. Author(s) None Dubovikov is a seasoned expert in data science and project management, bringing years of hands-on experience to both domains. With a passion for leveraging data to drive business success, None guides readers through building sustainable practices and effective teams in the growing field of data science. Who is it for? This book is perfect for data science professionals, project managers, and business leaders seeking practical guidance to reap the benefits of data-driven decision-making. Designed for readers with a foundational understanding of data science, it helps bridge the gap between technical expertise and managerial efficiency.

Episódio novo do seu podcast de Ciência de Dados no ar! Dessa vez nós iremos falar sobre um dos mais importantes passos na jornada de conseguir um emprego em Data Science: o processo seletivo.

Que tipo de perguntas são feitas em um processo seletivo? Como é o teste técnico? Como negociar salário? Isso e muito mais nós iremos conversar nesse episódio completíssimo. E, para nos ajudar nesse papo, nós convidamos Juliana Forlin — Data Lead Teacher at Ironhack — para nos contar sobre sua experiência em processos e entrevistas para Data Science.

Acesse nosso post do Medium para ter acesso aos links do episódio: https://medium.com/data-hackers/processos-seletivos-em-data-science-data-hackers-podcast-17-95d7968bbd2

Business Analytics, Volume II

This business analytics (BA) text discusses the models based on fact-based data to measure past business performance to guide an organization in visualizing and predicting future business performance and outcomes. It provides a comprehensive overview of analytics in general with an emphasis on predictive analytics. Given the booming interest in analytics and data science, this book is timely and informative. It brings many terms, tools, and methods of analytics together. The first three chapters provide an introduction to BA, importance of analytics, types of BA-descriptive, predictive, and prescriptive-along with the tools and models. Business intelligence (BI) and a case on descriptive analytics are discussed. Additionally, the book discusses on the most widely used predictive models, including regression analysis, forecasting, data mining, and an introduction to recent applications of predictive analytics-machine learning, neural networks, and artificial intelligence. The concluding chapter discusses on the current state, job outlook, and certifications in analytics.

Clustering Methodology for Symbolic Data

Covers everything readers need to know about clustering methodology for symbolic data—including new methods and headings—while providing a focus on multi-valued list data, interval data and histogram data This book presents all of the latest developments in the field of clustering methodology for symbolic data—paying special attention to the classification methodology for multi-valued list, interval-valued and histogram-valued data methodology, along with numerous worked examples. The book also offers an expansive discussion of data management techniques showing how to manage the large complex dataset into more manageable datasets ready for analyses. Filled with examples, tables, figures, and case studies, Clustering Methodology for Symbolic Data begins by offering chapters on data management, distance measures, general clustering techniques, partitioning, divisive clustering, and agglomerative and pyramid clustering. Provides new classification methodologies for histogram valued data reaching across many fields in data science Demonstrates how to manage a large complex dataset into manageable datasets ready for analysis Features very large contemporary datasets such as multi-valued list data, interval-valued data, and histogram-valued data Considers classification models by dynamical clustering Features a supporting website hosting relevant data sets Clustering Methodology for Symbolic Data will appeal to practitioners of symbolic data analysis, such as statisticians and economists within the public sectors. It will also be of interest to postgraduate students of, and researchers within, web mining, text mining and bioengineering.

Mastering pandas - Second Edition

Mastering pandas is the ultimate guide to harnessing the power of the pandas library for data analysis. Covering everything from installation to advanced techniques, this book provides comprehensive instructions and examples to help you perform efficient data manipulation and visualization. Explore key features of pandas, such as multi-indexing and time series analysis, and become proficient in actionable analytics. What this Book will help me do Master importing and managing datasets of various formats using pandas. Expertly handle missing data and clean datasets for robust analysis. Create powerful visualizations and reports using pandas and Jupyter notebooks. Leverage advanced indexing and grouping techniques to derive insights. Utilize pandas for time series analysis to analyze trends and patterns. Author(s) None Kumar is an experienced data scientist specializing in data analysis and visualization using Python. With a deep understanding of the pandas library, None has been helping professionals and enthusiasts alike to make data-driven decisions. Known for an example-driven teaching style, None bridges complex theoretical concepts with practical applications in data science. Who is it for? If you're a data scientist, analyst, or Python developer seeking to enhance your data analysis capabilities, this book is for you. Prior knowledge of Python is beneficial but not mandatory, as foundational concepts are explained. This guide spans beginner to advanced topics, accommodating users looking to deepen their skills and those aiming to start with pandas.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.  Abstract Our guest for this week is Carlvin Paris, IBM North America Data Science & AI Sales Leader. Host Al Martin helps curate the conversation towards decision optimization and analytics, while Carlvin offers his insight to the state of the data science industry. Tune in for a technical, yet approachable discussion. Connect with Carlvin LinkedIn Show Notes 05:37 - Take a look at this article which aims to explain the journey of a data scientist. 08:33 - Learn more about descriptive, predictive, and prescriptive analytics here. 26:00 - Check out the IBM DTE site and YouTube channel to increase your knowledge of other Data and A.I. concepts. 26:30 - Take a look at Informs website through this link.  Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn.  Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Today's guest is the talented and intelligent Lillian Pierson. Lillian is a Data Science Instagram rock star with 600K+ followers across social media. Currently, she is the data science instructor for multiple courses on LinkedIn Learning, as well as an author, entrepreneur, coach and social media genius. In this new BI Masterclass, Lillian is going to teach you how to build a solid enterprise-wide data strategy that scales. Stay tuned to get Lillian's best tips for social media, data strategies, and collecting data for use cases In this episode, you'll learn: [10:07] The definition of enterprise-wide data strategy and who it is for [11:35] Lillian's "Aha" moment and her two different types of clients [13:21] Key Quote: "What is the point of all this big data talk? The point is that they generate business value from that new knowledge." - Lillian Pierson For full show notes, his book give away, and the links mentioned visit: https://bibrainz.com/podcast/36 Sponsor This exciting season of AOF is sponsored by our BI Data Storytelling Mastery Accelerator 3-Day Live workshop. Our second one is coming up on Jan 28-30 and registration is open!  At the end of three days, you'll leave with the tools, techniques, and resources you need to engage your users. Register today!   Enjoyed the Show?  Please leave us a review on iTunes

Cognitive Computing Featuring the IBM Power System AC922

This IBM® Redpaper publication describes the advantages of using IBM Power System AC922 for cognitive solutions, and how it can enhance clients' businesses. In order to optimize the hardware and software, IBM partners with NVIDIA, Mellanox, H2O.ai, SQream, Kinetica, and other prominent companies to design the Power AC922 server, specifically enhanced for the cognitive era. Most of its outstanding hardware features, such as NVIDIA NVLink 2.0 and PCIe 4.0, are described in this publication to illustrate the advantages that clients can realize in comparison with IBM competitors. We also include a brief description about what cognitive computing is, and how to use IBM Watson® Machine Learning cognitive solutions to bring more value to your business ecosystem. Additionally, we show performance charts that show the advantages of using Power AC922 versus x86 competitors. In the last chapter, we describe the most remarkable use cases in which IBM solves real problems using cognitive solutions. This IBM Redpaper publication is aimed at IT technical audiences, especially decision-making levels that need a full look at the benefits and improvements that an IBM Cognitive Solution can offer. It also provides valuable information to data science professionals, enabling them to plan their modeling needs. Finally, it offers information to the infrastructure support group in charge of maintaining the solution.

Mastering Spark with R

If you’re like most R users, you have deep knowledge and love for statistics. But as your organization continues to collect huge amounts of data, adding tools such as Apache Spark makes a lot of sense. With this practical book, data scientists and professionals working with large-scale data applications will learn how to use Spark from R to tackle big data and big compute problems. Authors Javier Luraschi, Kevin Kuo, and Edgar Ruiz show you how to use R with Spark to solve different data analysis problems. This book covers relevant data science topics, cluster computing, and issues that should interest even the most advanced users. Analyze, explore, transform, and visualize data in Apache Spark with R Create statistical models to extract information and predict outcomes; automate the process in production-ready workflows Perform analysis and modeling across many machines using distributed computing techniques Use large-scale data from multiple sources and different formats with ease from within Spark Learn about alternative modeling frameworks for graph processing, geospatial analysis, and genomics at scale Dive into advanced topics including custom transformations, real-time data processing, and creating custom Spark extensions