talk-data.com talk-data.com

Topic

CSV

Comma-Separated Values (CSV)

tabular_data text_based human_readable

4

tagged

Activity Trend

8 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Learning Spark, 2nd Edition

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow

Apache Spark 2.x for Java Developers

Delve into mastering big data processing with 'Apache Spark 2.x for Java Developers.' This book provides a practical guide to implementing Apache Spark using the Java APIs, offering a unique opportunity for Java developers to leverage Spark's powerful framework without transitioning to Scala. What this Book will help me do Learn how to process data from formats like XML, JSON, CSV using Spark Core. Implement real-time analytics using Spark Streaming and third-party tools like Kafka. Understand data querying with Spark SQL and master SQL schema processing. Apply machine learning techniques with Spark MLlib to real-world scenarios. Explore graph processing and analytics using Spark GraphX. Author(s) None Kumar and None Gulati, experienced professionals in Java development and big data, bring their wealth of practical experience and passion for teaching to this book. With a clear and concise writing style, they aim to simplify Spark for Java developers, making big data approachable. Who is it for? This book is perfect for Java developers who are eager to expand their skillset into big data processing with Apache Spark. Whether you are a seasoned Spark user or first diving into big data concepts, this book meets you at your level. With practical examples and straightforward explanations, you can unlock the potential of Spark in real-world scenarios.

SQL Server 2012 Data Integration Recipes: Solutions for Integration Services and Other ETL Tools

SQL Server 2012 Data Integration Recipes provides focused and practical solutions to real world problems of data integration. Need to import data into SQL Server from an outside source? Need to export data and send it to another system? SQL Server 2012 Data Integration Recipes has your back. You'll find solutions for importing from Microsoft Office data stores such as Excel and Access, from text files such as CSV files, from XML, from other database brands such as Oracle and MySQL, and even from other SQL Server databases. You'll learn techniques for managing metadata, transforming data to meet the needs of the target system, handling exceptions and errors, and much more. What DBA or developer isn't faced with the need to move data back and forth? Author Adam Aspin brings 10 years of extensive ETL experience involving SQL Server, and especially satellite products such as Data Transformation Services and SQL Server Integration Services. Extensive coverage is given to Integration Services, Microsoft's flagship tool for data integration in SQL Server environments. Coverage is also given to the broader range of tools such as OPENDATASOURCE, linked servers, OPENROWSET, Migration Assistant for Access, BCP Import, and BULK INSERT just to name a few. If you're looking for a resource to cover data integration and ETL across the gamut of Microsoft's SQL Server toolset, SQL Server 2012 Data Integration Recipes is the one book that will meet your needs. Provides practical and proven solutions towards creating resilient ETL environments Clearly answers the tough questions which professionals ask Goes beyond the tools to a thorough discussion of the underlying techniques Covers the gamut of data integration, beyond just SSIS Includes example databases and files to allow readers to test the recipes What you'll learn Import and export to and from CSV files, XML files, and other text-based sources. Move data between SQL databases, including SQL Server and others such as Oracle Database and MySQL. Discover and manage metadata held in various database systems. Remove duplicates and consolidate from multiple sources. Transform data to meet the needs of target systems. Profile source data as part of the discovery process. Log and manage errors and exceptions during an ETL process. Improve efficiency by detecting and processing only changed data. Who this book is for SQL Server 2012 Data Integration Recipes is written for developers wishing to find fast and reliable solutions for importing and exporting to and from SQL Server. The book appeals to DBAs as well, who are often tasked with implementing ETL processes. Developers and DBAs moving to SQL Server from other platforms will find the succinct, example-based approach ideal for quickly applying their general ETL knowledge to the specific tools provided as part of a SQL Server environment.

Using XML with Legacy Business Applications

"This volume offers relentlessly pragmatic solutions to help your business applications get the most out of XML, with a breezy style that makes the going easy. Mike has lived this stuff; he has a strong command of the solutions and the philosophy that underlies them." --Eve Maler, XML Standards Architect, Sun Microsystems Businesses running legacy applications that do not support XML can face a tough choice: Either keep their legacy applications or switch to newer, XML-enhanced applications. XML presents both challenges and opportunities for organizations as they struggle with their data. Does this dilemma sound familiar? What if you could enable a legacy application to support XML? You can. In e-commerce expert Michael C. Rawlins outlines usable techniques for solving day-to-day XML-related data exchange problems. Using an easy-to-understand cookbook approach, Rawlins shows you how to build XML support into legacy business applications using Java and C++. The techniques are illustrated by building converters for legacy formats. Converting CSV files, flat files, and X12 EDI to and from XML will never be easier! Using XML with Legacy Business Applications, Inside you'll find: A concise tutorial for learning to read W3C XML schemas An introduction to using XSLT to transform between different XML formats Simple, pragmatic advice on transporting XML documents securely over the Internet For developers working with either MSXML with Visual C++ or Java and Xerces: See Chapter 3 for a step-by-step guide to enabling existing business applications to export XML documents See Chapter 2 for a step-by-step guide to enabling existing business applications to import XML documents See Chapter 5 for code examples and tips for validating XML documents against schemas See Chapter 12 for general tips on building commerce support into an application For end users who need a simple and robust conversion utility: See Chapter 7 for converting CSV files to and from XML See Chapter 8 for converting flat files to and from XML See Chapter 9 for converting X12 EDI to and from XML See Chapter 11 for tips on how to use these techniques together for complex format conversions The resource-filled companion Web site (www.rawlinsecconsulting.com/booksupplement) includes executable versions of the utilities described in the book, full source code in C++ and Java, XSLT stylesheets, bug fixes, sample input and output files, and more. 0321154940B07142003