talk-data.com talk-data.com

Topic

GitHub

version_control collaboration code_hosting

31

tagged

Activity Trend

79 peak/qtr
2020-Q1 2026-Q1

Activities

31 activities · Newest first

Generative AI on Microsoft Azure

Companies are now moving generative AI projects from the lab to production environments. To support these increasingly sophisticated applications, they're turning to advanced practices such as multiagent architectures and complex code-based frameworks. This practical handbook shows you how to leverage cutting-edge techniques using Microsoft's powerful ecosystem of tools to deploy trustworthy AI systems tailored to your organization's needs. Written for and by AI professionals, Generative AI on Microsoft Azure goes beyond the technical core aspects, examining underlying principles, tools, and practices in depth, from the art of prompt engineering to strategies for fine-tuning models to advanced techniques like retrieval-augmented generation (RAG) and agentic AI. Through real-world case studies and insights from top experts, you'll learn how to harness AI's full potential on Azure, paving the way for groundbreaking solutions and sustainable success in today's AI-driven landscape. Understand the technical foundations of generative AI and how the technology has evolved over the last few years Implement advanced GenAI applications using Microsoft services like Azure AI Foundry, Copilot, GitHub Models, Azure Databricks, and Snowflake on Azure Leverage patterns, tools, frameworks, and platforms to customize AI projects Manage, govern, and secure your AI-enabled systems with responsible AI practices Build upon expert guidance to avoid common pitfalls, future-proof your applications, and more

Hands-On Software Engineering with Python - Second Edition

Grow your software engineering discipline, incorporating and mastering design, development, testing, and deployment best practices examples in a realistic Python project structure. Key Features Understand what makes Software Engineering a discipline, distinct from basic programming Gain practical insight into updating, refactoring, and scaling an existing Python system Implement robust testing, CI/CD pipelines, and cloud-ready architecture decisions Book Description Software engineering is more than coding; it’s the strategic design and continuous improvement of systems that serve real-world needs. This newly updated second edition of Hands-On Software Engineering with Python expands on its foundational approach to help you grow into a senior or staff-level engineering role. Fully revised for today’s Python ecosystem, this edition includes updated tooling, practices, and architectural patterns. You’ll explore key changes across five minor Python versions, examine new features like dataclasses and type hinting, and evaluate modern tools such as Poetry, pytest, and GitHub Actions. A new chapter introduces high-performance computing in Python, and the entire development process is enhanced with cloud-readiness in mind. You’ll follow a complete redesign and refactor of a multi-tier system from the first edition, gaining insight into how software evolves—and what it takes to do that responsibly. From system modeling and SDLC phases to data persistence, testing, and CI/CD automation, each chapter builds your engineering mindset while updating your hands-on skills. By the end of this book, you'll have mastered modern Python software engineering practices and be equipped to revise and future-proof complex systems with confidence. What you will learn Distinguish software engineering from general programming Break down and apply each phase of the SDLC to Python systems Create system models to plan architecture before writing code Apply Agile, Scrum, and other modern development methodologies Use dataclasses, pydantic, and schemas for robust data modeling Set up CI/CD pipelines with GitHub Actions and cloud build tools Write and structure unit, integration, and end-to-end tests Evaluate and integrate tools like Poetry, pytest, and Docker Who this book is for This book is for Python developers with a basic grasp of software development who want to grow into senior or staff-level engineering roles. It’s ideal for professionals looking to deepen their understanding of software architecture, system modeling, testing strategies, and cloud-aware development. Familiarity with core Python programming is required, as the book focuses on applying engineering principles to maintain, extend, and modernize real-world systems.

Generative AI for Software Developers

Master Generative AI in software development with hands-on guidance, from coding and debugging to testing and deployment, using GitHub Copilot, Amazon Q Developer, and OpenAI APIs to build scalable, AI-powered applications Key Features Hands-on guidance for mastering AI-powered coding, debugging, and deployment with real-world examples Comprehensive coverage of GenAI concepts, prompt engineering, fine-tuning, and SDLC integration Practical strategies for architecting and scaling production-ready AI-driven applications Book Description Generative AI for Software Developers is your practical guide to mastering AI-powered development and staying ahead in a fast-changing industry. Through a structured, hands-on approach, this book helps you understand, implement, and optimize Generative AI in modern software engineering. From AI-assisted coding, debugging, and documentation to testing, deployment, and system design, it equips you with the skills to integrate AI seamlessly into your workflows. You’ll work with tools such as GitHub Copilot, Amazon Q Developer, and OpenAI APIs while learning strategies for prompt engineering, fine-tuning, and building scalable AI-powered applications. Featuring real-world use cases, best practices, and expert insights, this book bridges the gap between experimenting with AI and production deployment. Whether you’re an aspiring AI developer, experienced engineer, or solutions architect, this guide gives you the clarity, confidence, and tactical knowledge to thrive in the GenAI-driven future of software development. Armed with these insights, you’ll be ready to build, integrate, and scale intelligent solutions that enhance every stage of the software development lifecycle. What you will learn Build a secure GenAI application with expert guidance Understand the fundamentals of GenAI and its applications in software engineering Automate coding tasks with tools like GitHub Copilot, Amazon Q Developer, and OpenAI APIs Apply AI for debugging, testing, documentation, and deployment workflows Get to grips with prompt engineering and fine-tuning techniques to optimize AI outputs Implement best practices for architecting and scaling AI-powered applications Build end-to-end GenAI projects, moving from experimentation to production Who this book is for This book is for software developers, engineers, architects, and tech professionals who want to understand the core concepts of Generative AI and its real-world applications, master AI-driven development workflows to improve efficiency and code quality, and leverage tools like GitHub Copilot, Amazon Q Developer, and OpenAI APIs to automate coding tasks.

Building Integrations with MuleSoft

This concise yet comprehensive guide shows developers and architects how to tackle data integration challenges with MuleSoft. Authors Pooja Kamath and Diane Kesler take you through the process necessary to build robust and scalable integration solutions step-by-step. Supported by real-world use cases, Building Integrations with MuleSoft teaches you to identify and resolve performance bottlenecks, handle errors, and ensure the reliability and scalability of your integration solutions. You'll explore MuleSoft's robust set of connectors and their components, and use them to connect to systems and applications from legacy databases to cloud services. Ask the right questions to determine your use case, define requirements, decide on reuse versus rebuild, and create sequence and context diagrams Master tools like the Anypoint Platform, Anypoint Studio, Code Builder, GitHub, and Maven Design APIs with RAML and OAS and craft effective requests and responses Write MUnit tests, validate DataWeave expressions, and use Postman Collections Deploy Mule applications to CloudHub, use API Manager to create API proxies, and secure APIs with Mule OAuth 2.0 Learn message orchestration techniques for routers, transactions, error handling, For Each, Parallel For Each, and batch processing

Modern Business Analytics

Deriving business value from analytics is a challenging process. Turning data into information requires a business analyst who is adept at multiple technologies including databases, programming tools, and commercial analytics tools. This practical guide shows programmers who understand analysis concepts how to build the skills necessary to achieve business value. Author Deanne Larson, data science practitioner and academic, helps you bridge the technical and business worlds to meet these requirements. You'll focus on developing these skills with R and Python using real-world examples. You'll also learn how to leverage methodologies for successful delivery. Learning methodology combined with open source tools is key to delivering successful business analytics and value. This book shows you how to: Apply business analytics methodologies to achieve successful results Cleanse and transform data using R and Python Use R and Python to complete exploratory data analysis Create predictive models to solve business problems in R and Python Use Python, R, and business analytics tools to handle large volumes of data Commit code to GitHub to collaborate with data engineers and data scientists Measure success in business analytics

The Definitive Guide to KQL: Using Kusto Query Language for operations, defending, and threat hunting

Turn the avalanche of raw data from Azure Data Explorer, Azure Monitor, Microsoft Sentinel, and other Microsoft data platforms into actionable intelligence with KQL (Kusto Query Language). Experts in information security and analysis guide you through what it takes to automate your approach to risk assessment and remediation, speeding up detection time while reducing manual work using KQL. This accessible and practical guidedesigned for a broad range of people with varying experience in KQLwill quickly make KQL second nature for information security. Solve real problems with Kusto Query Language and build your competitive advantage: Learn the fundamentals of KQLwhat it is and where it is used Examine the anatomy of a KQL query Understand why data summation and aggregation is important See examples of data summation, including count, countif, and dcount Learn the benefits of moving from raw data ingestion to a more automated approach for security operations Unlock how to write efficient and effective queries Work with advanced KQL operators, advanced data strings, and multivalued strings Explore KQL for day-to-day admin tasks, performance, and troubleshooting Use KQL across Azure, including app services and function apps Delve into defending and threat hunting using KQL Recognize indicators of compromise and anomaly detection Learn to access and contribute to hunting queries via GitHub and workbooks via Microsoft Entra ID

AI-Assisted Programming

Get practical advice on how to leverage AI development tools for all stages of code creation, including requirements, planning, design, coding, debugging, testing, and documentation. With this book, beginners and experienced developers alike will learn how to use a wide range of tools, from general-purpose LLMs (ChatGPT, Gemini, and Claude) to code-specific systems (GitHub Copilot, Tabnine, Cursor, and Amazon CodeWhisperer). You'll also learn about more specialized generative AI tools for tasks such as text-to-image creation. Author Tom Taulli provides a methodology for modular programming that aligns effectively with the way prompts create AI-generated code. This guide also describes the best ways of using general purpose LLMs to learn a programming language, explain code, or convert code from one language to another. This book examines: The core capabilities of AI-based development tools Pros, cons, and use cases of popular systems such as GitHub Copilot and Amazon CodeWhisperer Ways to use ChatGPT, Gemini, Claude, and other generic LLMs for coding Using AI development tools for the software development lifecycle, including requirements, planning, coding, debugging, and testing Prompt engineering for development Using AI-assisted programming for tedious tasks like creating regular expressions, starter code, object-oriented programming classes, and GitHub Actions How to use AI-based low-code and no-code tools, such as to create professional UIs

The Complete Developer

Whether you’ve been in the developer kitchen for decades or are just taking the plunge to do it yourself, The Complete Developer will show you how to build and implement every component of a modern stack—from scratch. You’ll go from a React-driven frontend to a fully fleshed-out backend with Mongoose, MongoDB, and a complete set of REST and GraphQL APIs, and back again through the whole Next.js stack. The book’s easy-to-follow, step-by-step recipes will teach you how to build a web server with Express.js, create custom API routes, deploy applications via self-contained microservices, and add a reactive, component-based UI. You’ll leverage command line tools and full-stack frameworks to build an application whose no-effort user management rides on GitHub logins. You’ll also learn how to: Work with modern JavaScript syntax, TypeScript, and the Next.js framework Simplify UI development with the React library Extend your application with REST and GraphQL APIs Manage your data with the MongoDB NoSQL database Use OAuth to simplify user management, authentication, and authorization Automate testing with Jest, test-driven development, stubs, mocks, and fakes Whether you’re an experienced software engineer or new to DIY web development, The Complete Developer will teach you to succeed with the modern full stack. After all, control matters. Covers: Docker, Express.js, JavaScript, Jest, MongoDB, Mongoose, Next.js, Node.js, OAuth, React, REST and GraphQL APIs, and TypeScript

Low-Code AI

Take a data-first and use-case-driven approach with Low-Code AI to understand machine learning and deep learning concepts. This hands-on guide presents three problem-focused ways to learn no-code ML using AutoML, low-code using BigQuery ML, and custom code using scikit-learn and Keras. In each case, you'll learn key ML concepts by using real-world datasets with realistic problems. Business and data analysts get a project-based introduction to ML/AI using a detailed, data-driven approach: loading and analyzing data; feeding data into an ML model; building, training, and testing; and deploying the model into production. Authors Michael Abel and Gwendolyn Stripling show you how to build machine learning models for retail, healthcare, financial services, energy, and telecommunications. You'll learn how to: Distinguish between structured and unstructured data and the challenges they present Visualize and analyze data Preprocess data for input into a machine learning model Differentiate between the regression and classification supervised learning models Compare different ML model types and architectures, from no code to low code to custom training Design, implement, and tune ML models Export data to a GitHub repository for data management and governance

M-statistics

M-STATISTICS A comprehensive resource providing new statistical methodologies and demonstrating how new approaches work for applications M-statistics introduces a new approach to statistical inference, redesigning the fundamentals of statistics, and improving on the classical methods we already use. This book targets exact optimal statistical inference for a small sample under one methodological umbrella. Two competing approaches are offered: maximum concentration (MC) and mode (MO) statistics combined under one methodological umbrella, which is why the symbolic equation M=MC+MO. M-statistics defines an estimator as the limit point of the MC or MO exact optimal confidence interval when the confidence level approaches zero, the MC and MO estimator, respectively. Neither mean nor variance plays a role in M-statistics theory. Novel statistical methodologies in the form of double-sided unbiased and short confidence intervals and tests apply to major statistical parameters: Exact statistical inference for small sample sizes is illustrated with effect size and coefficient of variation, the rate parameter of the Pareto distribution, two-sample statistical inference for normal variance, and the rate of exponential distributions. M-statistics is illustrated with discrete, binomial, and Poisson distributions. Novel estimators eliminate paradoxes with the classic unbiased estimators when the outcome is zero. Exact optimal statistical inference applies to correlation analysis including Pearson correlation, squared correlation coefficient, and coefficient of determination. New MC and MO estimators along with optimal statistical tests, accompanied by respective power functions, are developed. M-statistics is extended to the multidimensional parameter and illustrated with the simultaneous statistical inference for the mean and standard deviation, shape parameters of the beta distribution, the two-sample binomial distribution, and finally, nonlinear regression. Our new developments are accompanied by respective algorithms and R codes, available at GitHub, and as such readily available for applications. M-statistics is suitable for professionals and students alike. It is highly useful for theoretical statisticians and teachers, researchers, and data science analysts as an alternative to classical and approximate statistical inference.

R Packages, 2nd Edition

Turn your R code into packages that others can easily install and use. With this fully updated edition, developers and data scientists will learn how to bundle reusable R functions, sample data, and documentation together by applying the package development philosophy used by the team that maintains the "tidyverse" suite of packages. In the process, you'll learn how to automate common development tasks using a set of R packages, including devtools, usethis, testthat, and roxygen2. Authors Hadley Wickham and Jennifer Bryan from Posit (formerly known as RStudio) help you create packages quickly, then teach you how to get better over time. You'll be able to focus on what you want your package to do as you progressively develop greater mastery of the structure of a package. With this book, you will: Learn the key components of an R package, including code, documentation, and tests Streamline your development process with devtools and the RStudio IDE Get tips on effective habits such as organizing functions into files Get caught up on important new features in the devtools ecosystem Learn about the art and science of unit testing, using features in the third edition of testthat Turn your existing documentation into a beautiful and user friendly website with pkgdown Gain an appreciation of the benefits of modern code hosting platforms, such as GitHub

Machine Learning for High-Risk Applications

The past decade has witnessed the broad adoption of artificial intelligence and machine learning (AI/ML) technologies. However, a lack of oversight in their widespread implementation has resulted in some incidents and harmful outcomes that could have been avoided with proper risk management. Before we can realize AI/ML's true benefit, practitioners must understand how to mitigate its risks. This book describes approaches to responsible AI—a holistic framework for improving AI/ML technology, business processes, and cultural competencies that builds on best practices in risk management, cybersecurity, data privacy, and applied social science. Authors Patrick Hall, James Curtis, and Parul Pandey created this guide for data scientists who want to improve real-world AI/ML system outcomes for organizations, consumers, and the public. Learn technical approaches for responsible AI across explainability, model validation and debugging, bias management, data privacy, and ML security Learn how to create a successful and impactful AI risk management practice Get a basic guide to existing standards, laws, and assessments for adopting AI technologies, including the new NIST AI Risk Management Framework Engage with interactive resources on GitHub and Colab

R 4 Data Science Quick Reference: A Pocket Guide to APIs, Libraries, and Packages

In this handy, quick reference book you'll be introduced to several R data science packages, with examples of how to use each of them. All concepts will be covered concisely, with many illustrative examples using the following APIs: readr, dibble, forecasts, lubridate, stringr, tidyr, magnittr, dplyr, purrr, ggplot2, modelr, and more. With R 4 Data Science Quick Reference, you'll have the code, APIs, and insights to write data science-based applications in the R programming language. You'll also be able to carry out data analysis. All source code used in the book is freely available on GitHub.. What You'll Learn Implement applicable R 4 programming language specification features Import data with readr Work with categories using forcats, time and dates with lubridate, and strings with stringr Format data using tidyr and then transform that data using magrittr and dplyr Write functions with R for data science, data mining, and analytics-based applications Visualize data with ggplot2 and fit data to models using modelr Who This Book Is For Programmers new to R's data science, data mining, and analytics packages. Some prior coding experience with R in general is recommended.

Python for Data Analysis, 3rd Edition

Get the definitive handbook for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.10 and pandas 1.4, the third edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You'll learn the latest versions of pandas, NumPy, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It's ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. Use the Jupyter notebook and IPython shell for exploratory computing Learn basic and advanced features in NumPy Get started with data analysis tools in the pandas library Use flexible tools to load, clean, transform, merge, and reshape data Create informative visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Analyze and manipulate regular and irregular time series data Learn how to solve real-world data analysis problems with thorough, detailed examples

Learn dbatools in a Month of Lunches

If you work with SQL Server, dbatools is a lifesaver. This book will show you how to use this free and open source PowerShell module to automate just about every SQL server task you can imagine—all in just one month! In Learn dbatools in a Month of Lunches you will learn how to: Perform instance-to-instance and customized migrations Automate security audits, tempdb configuration, alerting, and reporting Schedule and monitor PowerShell tasks in SQL Server Agent Bulk-import any type of data into SQL Server Install dbatools in secure environments Written by a group of expert authors including dbatools creator Chrissy LeMaire, Learn dbatools in a Month of Lunches teaches you techniques that will make you more effective—and efficient—than you ever thought possible. In twenty-eight lunchbreak lessons, you’ll learn the most important use cases of dbatools and the favorite functions of its core developers. Stabilize and standardize your SQL server environment, and simplify your tasks by building automation, alerting, and reporting with this powerful tool. About the Technology For SQL Server DBAs, automation is the key to efficiency. Using the open-source dbatools PowerShell module, you can easily execute tasks on thousands of database servers at once—all from the command line. dbatools gives you over 500 pre-built commands, with countless new options for managing SQL Server at scale. There’s nothing else like it. About the Book Learn dbatools in a Month of Lunches teaches you how to automate SQL Server using the dbatools PowerShell module. Each 30-minute lesson introduces a new automation that will make your daily duties easier. Following the expert advice of dbatools creator Chrissy LeMaire and other top community contributors, you’ll learn to script everything from backups to disaster recovery. What's Inside Performing instance-to-instance and customized migrations Automating security audits, best practices, and standardized configurations Administering SQL Server Agent including running PowerShell scripts effectively Bulk-importing many types of data into SQL Server Executing advanced tasks and increasing efficiency for everyday administration About the Reader For DBAs, accidental DBAs, and systems engineers who manage SQL Server. About the Authors Chrissy LeMaire is a GitHub Star and the creator of dbatools. Rob Sewell is a data engineer and a passionate automator. Jess Pomfret and Cláudio Silva are data platform architects. All are Microsoft MVPs. Quotes All SQL Server professionals should learn dbatools. With its combination of knowledge transfer, anecdotes, and hands-on labs, this book is the perfect way. - From the Foreword by Anna Hoffman, Databases Product Management, Microsoft Excellent guide for dbatools with lots of practical tips! Required reading for anyone interested in dbatools. - Ruben Vandeginste, PeopleWare A must-have for any SQL server developer. - Raushan Kumar Jha, Microsoft If you want to automate all vital aspects of SQL Server, wait no more! Learn dbatools in a month, with guidance from the best minds in the business. - Ranjit Sahai, RAM Consulting

Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist

Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. Updated for the R 4.0 release, this book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R. Beginning Data Science in R 4, Second Edition details how data science is a combination of statistics, computational science, and machine learning. You’ll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this. Modern data analysis requires computational skills and usually a minimum of programming. After reading and using this book, you'll have what you need to get started with R programming with data science applications. Source code will be available to support your next projects as well. Source code is available at github.com/Apress/beg-data-science-r4. What You Will Learn Perform data science and analytics using statistics and the R programming language Visualize and explore data, including working with large data sets found in big data Build an R package Test and check your code Practice version control Profile and optimize your code Who This Book Is For Those with some data science or analytics background, but not necessarily experience with the R programming language.

Deep Learning

Ever since computers began beating us at chess, they've been getting better at a wide range of human activities, from writing songs and generating news articles to helping doctors provide healthcare. Deep learning is the source of many of these breakthroughs, and its remarkable ability to find patterns hiding in data has made it the fastest growing field in artificial intelligence (AI). Digital assistants on our phones use deep learning to understand and respond intelligently to voice commands; automotive systems use it to safely navigate road hazards; online platforms use it to deliver personalized suggestions for movies and books – the possibilities are endless. Deep Learning: A Visual Approach is for anyone who wants to understand this fascinating field in depth, but without any of the advanced math and programming usually required to grasp its internals. If you want to know how these tools work, and use them yourself, the answers are all within these pages. And, if you’re ready to write your own programs, there are also plenty of supplemental Python notebooks in the accompanying Github repository to get you going. The book’s conversational style, extensive color illustrations, illuminating analogies, and real-world examples expertly explain the key concepts in deep learning, including: •How text generators create novel stories and articles •How deep learning systems learn to play and win at human games •How image classification systems identify objects or people in a photo •How to think about probabilities in a way that’s useful to everyday life •How to use the machine learning techniques that form the core of modern AI Intellectual adventurers of all kinds can use the powerful ideas covered in Deep Learning: A Visual Approach to build intelligent systems that help us better understand the world and everyone who lives in it. It’s the future of AI, and this book allows you to fully envision it.

Hands-On Data Visualization

Tell your story and show it with data, using free and easy-to-learn tools on the web. This introductory book teaches you how to design interactive charts and customized maps for your website, beginning with simple drag-and-drop tools such as Google Sheets, Datawrapper, and Tableau Public. You'll also gradually learn how to edit open source code templates like Chart.js, Highcharts, and Leaflet on GitHub. Hands-On Data Visualization takes you step-by-step through tutorials, real-world examples, and online resources. This practical guide is ideal for students, nonprofit organizations, small business owners, local governments, journalists, academics, and anyone who wants to take data out of spreadsheets and turn it into lively interactive stories. No coding experience is required. Build interactive charts and maps and embed them in your website Understand the principles for designing effective charts and maps Learn key data visualization concepts to help you choose the right tools Convert and transform tabular and spatial data to tell your data story Edit and host Chart.js, Highcharts, and Leaflet map code templates on GitHub Learn how to detect bias in charts and maps produced by others

Advanced R 4 Data Programming and the Cloud: Using PostgreSQL, AWS, and Shiny

Program for data analysis using R and learn practical skills to make your work more efficient. This revised book explores how to automate running code and the creation of reports to share your results, as well as writing functions and packages. It includes key R 4 features such as a new color palette for charts, an enhanced reference counting system, and normalization of matrix and array types where matrix objects now formally inherit from the array class, eliminating inconsistencies. Advanced R 4 Data Programming and the Cloud is not designed to teach advanced R programming nor to teach the theory behind statistical procedures. Rather, it is designed to be a practical guide moving beyond merely using R; it shows you how to program in R to automate tasks. This book will teach you how to manipulate data in modern R structures and includes connecting R to databases such as PostgreSQL, cloud services such as Amazon Web Services (AWS), and digital dashboards such as Shiny. Each chapter also includes a detailed bibliography with references to research articles and other resources that cover relevant conceptual and theoretical topics. What You Will Learn Write and document R functions using R 4 Make an R package and share it via GitHub or privately Add tests to R code to ensure it works as intended Use R to talk directly to databases and do complex data management Run R in the Amazon cloud Deploy a Shiny digital dashboard Generate presentation-ready tables and reports using R Who This Book Is For Working professionals, researchers, and students who are familiar with R and basic statistical techniques such as linear regression and who want to learn how to take their R coding and programming to the next level.

Spark in Action, Second Edition

The Spark distributed data processing platform provides an easy-to-implement tool for ingesting, streaming, and processing data from any source. In Spark in Action, Second Edition, you’ll learn to take advantage of Spark’s core features and incredible processing speed, with applications including real-time computation, delayed evaluation, and machine learning. Spark skills are a hot commodity in enterprises worldwide, and with Spark’s powerful and flexible Java APIs, you can reap all the benefits without first learning Scala or Hadoop. About the Technology Analyzing enterprise data starts by reading, filtering, and merging files and streams from many sources. The Spark data processing engine handles this varied volume like a champ, delivering speeds 100 times faster than Hadoop systems. Thanks to SQL support, an intuitive interface, and a straightforward multilanguage API, you can use Spark without learning a complex new ecosystem. About the Book Spark in Action, Second Edition, teaches you to create end-to-end analytics applications. In this entirely new book, you’ll learn from interesting Java-based examples, including a complete data pipeline for processing NASA satellite data. And you’ll discover Java, Python, and Scala code samples hosted on GitHub that you can explore and adapt, plus appendixes that give you a cheat sheet for installing tools and understanding Spark-specific terms. What's Inside Writing Spark applications in Java Spark application architecture Ingestion through files, databases, streaming, and Elasticsearch Querying distributed datasets with Spark SQL About the Reader This book does not assume previous experience with Spark, Scala, or Hadoop. About the Author Jean-Georges Perrin is an experienced data and software architect. He is France’s first IBM Champion and has been honored for 12 consecutive years. Quotes This book reveals the tools and secrets you need to drive innovation in your company or community. - Rob Thomas, IBM An indispensable, well-paced, and in-depth guide. A must-have for anyone into big data and real-time stream processing. - Anupam Sengupta, GuardHat Inc. This book will help spark a love affair with distributed processing. - Conor Redmond, InComm Product Control Currently the best book on the subject! - Markus Breuer, Materna IPS