data

Hands-On Web Scraping with Python

2019-07-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Anish Chapagain

API Python Cyber Security Selenium data-science data-science-tasks web-scraping

This book, "Hands-On Web Scraping with Python", is your comprehensive guide to mastering web scraping techniques and tools. Harnessing the power of Python libraries like Scrapy, Beautiful Soup, and Selenium, you'll learn how to extract and analyze data from websites effectively and efficiently. What this Book will help me do Master the foundational concepts of web scraping using Python. Efficiently use libraries such as Scrapy, Beautiful Soup, and Selenium for data extraction. Handle advanced scenarios such as forms, logins, and dynamic content in scraping. Leverage XPath, CSS selectors, and Regex for precise data targeting and processing. Improve scraping reliability and manage challenges like cookies, API use, and web security. Author(s) None Chapagain is an accomplished Python programmer and an expert in web scraping methodologies. With years of experience in applying Python to solve practical data challenges, they bring a clear and insightful approach to teaching these skills. Readers appreciate their practical examples and ready-to-use guidance for real-world applications. Who is it for? This book is designed for Python developers and data enthusiasts eager to master web scraping. Whether you're a beginner looking to dep dive into new techniques or an analyst needing reliable data extraction methods, this book offers clear guidance. A basic understanding of Python is recommended to fully benefit from this text.

IBM Personal Communications and IBM z/OS TTLS Enablement: Technical Enablement Series

2019-07-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Chris Van Wagner

IBM Cyber Security data-engineering

The purpose of this document is to complete the task of introducing Transport Layer Security to z/OS® so IBM Personal Communications (PCOMM) uses TLS security. This document walks you through enabling Tunneled Transport Layer Security (TTLS) on your IBM z/OS for use with a PCOMM TN3270 connection. When you complete this task, you require a certificate to access your TN3270 PCOMM session. You work with the following products and components: TN3270 TCPIP PAGENT INET (maybe) IBM RACF® This document assumes that the reader has extensive knowledge of z/OS security administration and these products and components. This document is part of the Technical Enablement Series that was created at the IBM Client Experience Centers.

Data Science Strategy For Dummies

2019-07-11 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ulrika Jägare

Analytics Big Data Data Science data-science

All the answers to your data science questions Over half of all businesses are using data science to generate insights and value from big data. How are they doing it? Data Science Strategy For Dummies answers all your questions about how to build a data science capability from scratch, starting with the “what” and the “why” of data science and covering what it takes to lead and nurture a top-notch team of data scientists. With this book, you’ll learn how to incorporate data science as a strategic function into any business, large or small. Find solutions to your real-life challenges as you uncover the stories and value hidden within data. Learn exactly what data science is and why it’s important Adopt a data-driven mindset as the foundation to success Understand the processes and common roadblocks behind data science Keep your data science program focused on generating business value Nurture a top-quality data science team In non-technical language, Data Science Strategy For Dummies outlines new perspectives and strategies to effectively lead analytics and data science functions to create real value.

Bayesian Statistics the Fun Way

2019-07-09 · O'Reilly Data Science Books O'Reilly Amazon

book

by Will Kurt

bayesian-statistics data-science data-science-tasks statistics

Probability and statistics are increasingly important in a huge range of professions. But many people use data in ways they don’t even understand, meaning they aren’t getting the most from it. Bayesian Statistics the Fun Way will change that. This book will give you a complete understanding of Bayesian statistics through simple explanations and un-boring examples. Find out the probability of UFOs landing in your garden, how likely Han Solo is to survive a flight through an asteroid belt, how to win an argument about conspiracy theories, and whether a burglary really was a burglary, to name a few examples. By using these off-the-beaten-track examples, the author actually makes learning statistics fun. And you’ll learn real skills, like how to: •How to measure your own level of uncertainty in a conclusion or belief •Calculate Bayes theorem and understand what it’s useful for •Find the posterior, likelihood, and prior to check the accuracy of your conclusions •Calculate distributions to see the range of your data •Compare hypotheses and draw reliable conclusions from them Next time you find yourself with a sheaf of survey results and no idea what to do with them, turn to Bayesian Statistics the Fun Way to get the most value from your data.

IBM FlashSystem 900 Model AE3 Product Guide

2019-07-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eike Schenk , Jon Herd , Detlef Helmbrecht

Analytics Cloud Computing IBM data-engineering

Today's global organizations depend on the ability to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900 Model AE3, they can make faster decisions based on real-time insights. Thus, they unleash the power of demanding applications, including these: Online transaction processing (OLTP) and analytical databases Virtual desktop infrastructures (VDIs) Technical computing applications Cloud environments Easy to deploy and manage, IBM FlashSystem® 900 Model AE3 is designed to accelerate the applications that drive your business. Powered by IBM FlashCore® Technology, IBM FlashSystem Model AE3 provides the following characteristics: Accelerate business-critical workloads, real-time analytics, and cognitive applications with the consistent microsecond latency and extreme reliability of IBM FlashCore technology Improve performance and help lower cost with new inline data compression Help reduce capital and operational expenses with IBM enhanced 3D triple-level cell (3D TLC) flash Protect critical data assets with patented IBM Variable Stripe RAID™ Power faster insights with IBM FlashCore including hardware-accelerated nonvolatile memory (NVM) architecture, purpose-engineered IBM MicroLatency® modules and advanced flash management FlashSystem 900 Model AE3 can be configured in capacity points as low as 14.4 TB to 180 TB usable and up to 360 TB effective capacity after RAID 5 protection and compression. You can couple this product with either 16 Gbps, 8 Gbps Fibre Channel, 16 Gbps NVMe over Fibre Channel, or 40 Gbps InfiniBand connectivity. Thus, the IBM FlashSystem 900 Model AE3 provides extreme performance to existing and next generation infrastructure.

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize V8.2.1

2019-07-04 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pawel Brodacki , Jon Tate , Frank Enders , Rodrigo Suzuki , Sergey Kubin , Jack Armstrong , Danilo Miyasiro , Tiago Bastos

IBM data-engineering

This IBM® Redbooks® publication is a detailed technical guide to the IBM System Storage® SAN Volume Controller (SVC), which is powered by IBM Spectrum™ Virtualize V8.2.1. IBM SAN Volume Controller is a virtualization appliance solution that maps virtualized volumes that are visible to hosts and applications to physical volumes on storage devices. Each server within the storage area network (SAN) has its own set of virtual storage addresses that are mapped to physical addresses. If the physical addresses change, the server continues running by using the same virtual addresses that it had before. Therefore, volumes or storage can be added or moved while the server is still running. The IBM virtualization technology improves the management of information at the block level in a network, which enables applications and servers to share storage devices on a network.

Global Contract Logistics

2019-07-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Steven Morgan

data-engineering log-data

Tackles the growing complexity of contracting logistics in a technologically accelerating world.

IBM Hybrid Solution for Scalable Data Solutions using IBM Spectrum Scale

2019-07-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

Cloud Computing IBM data-engineering

This document is intended to facilitate the deployment of the scalable hybrid cloud solution for data agility and collaboration using IBM® Spectrum Scale across multiple public clouds. To complete the tasks it describes, you must understand IBM Spectrum Scale and IBM Spectrum Scale Active File Management (AFM). The information in this document is distributed on an basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Scale or IBM Spectrum Scale Active File Management are supported and entitled, and where the issues are specific to a blueprint implementation.

Big Data Simplified

2019-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sayan Goswami , Sourabh Mukherjee , Amit Kumar Das

Big Data Cassandra Data Science Hadoop Hive IoT Kafka MongoDB NoSQL Python Spark data-engineering

"Big Data Simplified blends technology with strategy and delves into applications of big data in specialized areas, such as recommendation engines, data science and Internet of Things (IoT) and enables a practitioner to make the right technology choice. The steps to strategize a big data implementation are also discussed in detail. This book presents a holistic approach to the topic, covering a wide landscape of big

data technologies like Hadoop 2.0 and package implementations, such as Cloudera. In-depth discussion of associated technologies, such as MapReduce, Hive, Pig, Oozie, ApacheZookeeper, Flume, Kafka, Spark, Python and NoSQL databases like Cassandra, MongoDB, GraphDB, etc., is also included.

Associations and Correlations

2019-06-28 · O'Reilly Data Science Books O'Reilly Amazon

book

by Lee Baker

Analytics Data Analytics data-science data-science-tasks statistics

"Associations and Correlations: Unearth the powerful insights buried in your data" is a comprehensive guide for understanding and utilizing associations and correlations in data analysis. This book walks you through methods of classifying data, selecting appropriate statistical tests, and interpreting results effectively. By the end, you'll have mastered how to reveal data insights clearly and reliably. What this Book will help me do Identify and prepare datasets suitable for analysis with confidence. Understand and apply the principles of associations and correlations in data analytics. Use statistical tests to uncover univariate and multivariate relationships. Classify and interpret data into qualitative and quantitative segments effectively. Develop visual representations of data relationships to communicate insights clearly. Author(s) Lee Baker is an experienced statistician and data scientist with a passion for education. With years of teaching and mentoring professionals in data analysis, Lee excels in breaking down complex statistical concepts into understandable insights. Lee's approachable style aims to empower learners to harness their data's full potential. Who is it for? This book is designed for budding data analysts and data scientists, targeting those starting their journey into data analytics. It serves well as an introduction to the fundamentals of associations and correlations, making it suitable for beginners. If you seek a foundational understanding or a recap of key concepts, this book is for you.

Multicloud Storage as a Service using vRealize Automation and IBM Spectrum Storage

2019-06-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

AWS Cloud Computing IBM VMware cloud-storage data-engineering storage-repositories

This document is intended to facilitate the deployment of the Multicloud Solution for Business Continuity and Storage as service by using IBM Spectrum Virtualize for Public Cloud on Amazon Web Services (AWS). To complete the tasks it describes, you must understand IBM FlashSystem 9100, IBM Spectrum Virtualize for Public Cloud, IBM Spectrum Connect, VMware vRealize Orchestrator, and vRealize Automation and AWS Cloud. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Storwize or IBM FlashSystem storage devices are supported and entitled and where the issues are specific to a blueprint implementation.

Data Engineers vs. Data Scientists

2019-06-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jesse Anderson (Big Data Institute)

data-science data-science-as-a-profession

Data engineers and data scientists are not interchangeable—and misperceptions of their roles can hurt teams and compromise productivity. This article clears up the differences of each role and how to best optimize these roles.

Probability and Statistics for Computer Scientists, 3rd Edition

2019-06-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michael Baron

data-science data-science-tasks statistics

Probability and statistical methods, simulation techniques, and modeling tools. This third edition textbook adds R, including codes for data analysis examples, helps students solve problems, make optimal decisions in select stochastic models, probabilities and forecasts, and evaluate performance of computer systems and networks.

R Cookbook, 2nd Edition

2019-06-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Paul Teetor , JD Long

R data-science data-science-tools r

Perform data analysis with R quickly and efficiently with more than 275 practical recipes in this expanded second edition. The R language provides everything you need to do statistical work, but its structure can be difficult to master. These task-oriented recipes make you productive with R immediately. Solutions range from basic tasks to input and output, general statistics, graphics, and linear regression. Each recipe addresses a specific problem and includes a discussion that explains the solution and provides insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an intermediate user, this book will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process. Create vectors, handle variables, and perform basic functions Simplify data input and output Tackle data structures such as matrices, lists, factors, and data frames Work with probability, probability distributions, and random variables Calculate statistics and confidence intervals and perform statistical tests Create a variety of graphic displays Build statistical models with linear regressions and analysis of variance (ANOVA) Explore advanced statistical techniques, such as finding clusters in your data

Streaming Data

2019-06-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andy Oram

AI/ML Analytics Big Data CI/CD Cloud Computing DevOps Kubernetes Data Streaming data-engineering streaming-architecture streaming-messaging

Managers and staff responsible for planning, hiring, and allocating resources need to understand how streaming data can fundamentally change their organizations. Companies everywhere are disrupting business, government, and society by using data and analytics to shape their business. Even if you don’t have deep knowledge of programming or digital technology, this high-level introduction brings data streaming into focus. You won’t find math or programming details here, or recommendations for particular tools in this rapidly evolving space. But you will explore the decision-making technologies and practices that organizations need to process streaming data and respond to fast-changing events. By describing the principles and activities behind this new phenomenon, author Andy Oram shows you how streaming data provides hidden gems of information that can transform the way your business works. Learn where streaming data comes from and how companies put it to work Follow a simple data processing project from ingesting and analyzing data to presenting results Explore how (and why) big data processing tools have evolved from MapReduce to Kubernetes Understand why streaming data is particularly useful for machine learning projects Learn how containers, microservices, and cloud computing led to continuous integration and DevOps

The Care and Feeding of Data Scientists

2019-06-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michelangelo D'Agostino , Katie Malone

Agile/Scrum Analytics Data Science data-science data-science-as-a-profession

As a discipline, data science is relatively young, but the job of managing data scientists is younger still. Many people undertake this management position without the tools, mentorship, or role models they need to do it well. This report examines the steps necessary to build, manage, sustain, and retain a growing data science team. You’ll learn how data science management is similar to but distinct from other management types. Michelangelo D’Agostino, VP of Data Science and Engineering at ShopRunner, and Katie Malone, Director of Data Science at Civis Analytics, provide concrete tips for balancing and structuring a data science team. The authors provide tips for balancing and structuring a data science team, recruiting and interviewing the best candidates, and keeping them productive and happy once they're in place. In this report, you'll: Explore data scientist archetypes, such as operations and research, that fit your organization Devise a plan to recruit, interview, and hire members for your data science team Retain your hires by providing challenging work and learning opportunities Explore Agile and OKR methodology to determine how your team will work together Provide your team with a career ladder through guidance and mentorship

Digital Processing of Random Oscillations

2019-06-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Viacheslav Karmalita

ELK data-science data-science-tasks stata statistics

This book deals with the autoregressive method for digital processing of random oscillations. The method is based on a one-to-one transformation of the numeric factors of the Yule series model to linear elastic system characteristics. This parametric approach allowed to develop a formal processing procedure from the experimental data to obtain estimates of logarithmic decrement and natural frequency of random oscillations. A straightforward mathematical description of the procedure makes it possible to optimize a discretization of oscillation realizations providing efficient estimates. The derived analytical expressions for confidence intervals of estimates enable a priori evaluation of their accuracy. Experimental validation of the method is also provided. Statistical applications for the analysis of mechanical systems arise from the fact that the loads experienced by machineries and various structures often cannot be described by deterministic vibration theory. Therefore, a sufficient description of real oscillatory processes (vibrations) calls for the use of random functions. In engineering practice, the linear vibration theory (modeling phenomena by common linear differential equations) is generally used. This theory’s fundamental concepts such as natural frequency, oscillation decrement, resonance, etc. are credited for its wide use in different technical tasks. In technical applications two types of research tasks exist: direct and inverse. The former allows to determine stochastic characteristics of the system output X(t) resulting from a random process E(t) when the object model is considered known. The direct task enables to evaluate the effect of an operational environment on the designed object and to predict its operation under various loads. The inverse task is aimed at evaluating the object model on known processes E(t) and X(t), i.e. finding model (equations) factors. This task is usually met at the tests of prototypes to identify (or verify) its model experimentally. To characterize random processes a notion of "shaping dynamic system" is commonly used. This concept allows to consider the observing process as the output of a hypothetical system with the input being stationary Gauss-distributed ("white") noise. Therefore, the process may be exhaustively described in terms of parameters of that system. In the case of random oscillations, the "shaping system" is an elastic system described by the common differential equation of the second order: X ̈(t)+2hX ̇(t)+ ω_0^2 X(t)=E(t), where ω0 = 2π/Т0 is the natural frequency, T0 is the oscillation period, and h is a damping factor. As a result, the process X(t) can be characterized in terms of the system parameters – natural frequency and logarithmic oscillations decrement δ = hT0 as well as the process variance. Evaluation of these parameters is subjected to experimental data processing based on frequency or time-domain representations of oscillations. It must be noted that a concept of these parameters evaluation did not change much during the last century. For instance, in case of the spectral density utilization, evaluation of the decrement values is linked with bandwidth measurements at the points of half-power of the observed oscillations. For a time-domain presentation, evaluation of the decrement requires measuring covariance values delayed by a time interval divisible by T0. Both estimation procedures are derived from a continuous description of research phenomena, so the accuracy of estimates is linked directly to the adequacy of discrete representation of random oscillations. This approach is similar a concept of transforming differential equations to difference ones with derivative approximation by corresponding finite differences. The resulting discrete model, being an approximation, features a methodical error which can be decreased but never eliminated. To render such a presentation more accurate it is imperative to decrease the discretization interval and to increase realization size growing requirements for computing power. The spectral density and covariance function estimates comprise a non-parametric (non-formal) approach. In principle, any non-formal approach is a kind of art i.e. the results depend on the performer’s skills. Due to interference of subjective factors in spectral or covariance estimates of random signals, accuracy of results cannot be properly determined or justified. To avoid the abovementioned difficulties, the application of linear time-series models with well-developed procedures for parameter estimates is more advantageous. A method for the analysis of random oscillations using a parametric model corresponding discretely (no approximation error) with a linear elastic system is developed and presented in this book. As a result, a one-to-one transformation of the model’s numerical factors to logarithmic decrement and natural frequency of random oscillations is established. It allowed to develop a formal processing procedure from experimental data to obtain the estimates of δ and ω0. The proposed approach allows researchers to replace traditional subjective techniques by a formal processing procedure providing efficient estimates with analytically defined statistical uncertainties.

Getting Started with Tableau 2019.2 - Second Edition

2019-06-14 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tristan Guillevin

Analytics BI DataViz Tableau data-science data-science-tasks data-visualization

"Getting Started with Tableau 2019.2" is your primer to mastering the latest version of Tableau, a leading tool for data visualization and analysis. Whether you're new to Tableau or looking to upgrade your skills, this book will guide you through both foundational and advanced features, enabling you to create impactful dashboards and visual analytics. What this Book will help me do Understand and utilize the latest features introduced in Tableau 2019.2, including natural language queries in Ask Data. Learn how to connect to diverse data sources, transform data by pivoting fields, and split columns effectively. Gain skills to design intuitive data visualizations and dashboards using various Tableau mark types and properties. Develop interactive and storytelling-based dashboards to communicate insights visually and effectively. Discover methods to securely share your analyses through Tableau Server, enhancing collaboration. Author(s) Tristan Guillevin is an experienced data visualization consultant and an expert in Tableau. Having helped several organizations adopt Tableau for business intelligence, he brings a practical and results-oriented approach to teaching. Tristan's philosophy is to make data accessible and actionable for everyone, no matter their technical background. Who is it for? This book is ideal for Tableau users and data professionals looking to enhance their skills on Tableau 2019.2. If you're passionate about uncovering insights from data but need the right tools to communicate and collaborate effectively, this book is for you. It's suited for those with some prior experience in Tableau but also offers introductory content for newcomers. Whether you're a business analyst, data enthusiast, or BI professional, this guide will build solid foundations and sharpen your Tableau expertise.

Deep Learning for Search

2019-06-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tommaso Teofili

AI/ML Data Science Java NLP TensorFlow data-engineering search

Deep Learning for Search teaches you how to improve the effectiveness of your search by implementing neural network-based techniques. By the time you're finished with the book, you'll be ready to build amazing search engines that deliver the results your users need and that get better as time goes on! About the Technology Deep learning handles the toughest search challenges, including imprecise search terms, badly indexed data, and retrieving images with minimal metadata. And with modern tools like DL4J and TensorFlow, you can apply powerful DL techniques without a deep background in data science or natural language processing (NLP). This book will show you how. About the Book Deep Learning for Search teaches you to improve your search results with neural networks. You’ll review how DL relates to search basics like indexing and ranking. Then, you’ll walk through in-depth examples to upgrade your search with DL techniques using Apache Lucene and Deeplearning4j. As the book progresses, you’ll explore advanced topics like searching through images, translating user queries, and designing search engines that improve as they learn! What's Inside Accurate and relevant rankings Searching across languages Content-based image search Search with recommendations About the Reader For developers comfortable with Java or a similar language and search basics. No experience with deep learning or NLP needed. About the Author Tommaso Teofili is a software engineer with a passion for open source and machine learning. As a member of the Apache Software Foundation, he contributes to a number of open source projects, ranging from topics like information retrieval (such as Lucene and Solr) to natural language processing and machine translation (including OpenNLP, Joshua, and UIMA). He currently works at Adobe, developing search and indexing infrastructure components, and researching the areas of natural language processing, information retrieval, and deep learning. He has presented search and machine learning talks at conferences including BerlinBuzzwords, International Conference on Computational Science, ApacheCon, EclipseCon, and others. You can find him on Twitter at @tteofili. Quotes A practical approach that shows you the state of the art in using neural networks, AI, and deep learning in the development of search engines. - From the Foreword by Chris Mattmann, NASA JPL A thorough and thoughtful synthesis of traditional search and the latest advancements in deep learning. - Greg Zanotti, Marquette Partners A well-laid-out deep dive into the latest technologies that will take your search engine to the next level. - Andrew Wyllie, Thynk Health Hands-on exercises teach you how to master deep learning for search-based products. - Antonio Magnaghi, System1

IBM Storage Solutions for Blockchain Platform Version 1.2

2019-06-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

Blockchain Cloud Computing IBM data-engineering ibm-tivoli

This Blueprint is intended to define the infrastructure that is required for a blockchain remote peer and to facilitate the deployment of IBM Blockchain Platform on IBM Cloud Private using that infrastructure. This infrastructure includes the necessary document handler components, such as IBM Blockchain Document Store, and covers the required storage for on-chain and off-chain blockchain data. To complete these tasks, you must have a basic understanding of each of the used components or have access the correct educational material to gain that knowledge.

talk-data.com

Activity Trend

Top Events

Top Speakers

Hands-On Web Scraping with Python

IBM Personal Communications and IBM z/OS TTLS Enablement: Technical Enablement Series

Data Science Strategy For Dummies

Bayesian Statistics the Fun Way

IBM FlashSystem 900 Model AE3 Product Guide

Implementing the IBM System Storage SAN Volume Controller with IBM Spectrum Virtualize V8.2.1

Global Contract Logistics

IBM Hybrid Solution for Scalable Data Solutions using IBM Spectrum Scale

Big Data Simplified

Associations and Correlations

Multicloud Storage as a Service using vRealize Automation and IBM Spectrum Storage

Data Engineers vs. Data Scientists

Probability and Statistics for Computer Scientists, 3rd Edition

R Cookbook, 2nd Edition

Streaming Data

The Care and Feeding of Data Scientists

Digital Processing of Random Oscillations

Getting Started with Tableau 2019.2 - Second Edition

Deep Learning for Search

IBM Storage Solutions for Blockchain Platform Version 1.2