talk-data.com talk-data.com

Topic

Scala

programming_language functional_programming jvm

110

tagged

Activity Trend

12 peak/qtr
2020-Q1 2026-Q1

Activities

110 activities · Newest first

Spark GraphX in Action

Spark GraphX in Action starts out with an overview of Apache Spark and the GraphX graph processing API. This example-based tutorial then teaches you how to configure GraphX and how to use it interactively. Along the way, you'll collect practical techniques for enhancing applications and applying machine learning algorithms to graph data. About the Technology GraphX is a powerful graph processing API for the Apache Spark analytics engine that lets you draw insights from large datasets. GraphX gives you unprecedented speed and capacity for running massively parallel and machine learning algorithms. About the Book Spark GraphX in Action begins with the big picture of what graphs can be used for. This example-based tutorial teaches you how to use GraphX interactively. You'll start with a crystal-clear introduction to building big data graphs from regular data, and then explore the problems and possibilities of implementing graph algorithms and architecting graph processing pipelines. Along the way, you'll collect practical techniques for enhancing applications and applying machine learning algorithms to graph data. What's Inside Understanding graph technology Using the GraphX API Developing algorithms for big graphs Machine learning with graphs Graph visualization About the Reader Readers should be comfortable writing code. Experience with Apache Spark and Scala is not required. About the Authors Michael Malak has worked on Spark applications for Fortune 500 companies since early 2013. Robin East has worked as a consultant to large organizations for over 15 years and is a data scientist at Worldpay. Quotes Learn complex graph processing from two experienced authors…A comprehensive guide. - Gaurav Bhardwaj, 3Pillar Global The best resource to go from GraphX novice to expert in the least amount of time. - Justin Fister, PaperRater A must-read for anyone serious about large-scale graph data mining! - Antonio Magnaghi, OpenMail Reveals the awesome and elegant capabilities of working with linked data for large-scale datasets. - Sumit Pal, Independent consultant

I'm joined this week by Jon Morra, director of data science at eHarmony to discuss a variety of ways in which machine learning and data science are being applied to help connect people for successful long term relationships. Interesting open source projects mentioned in the interview include Face-parts, a web service for detecting faces and extracting a robust set of fiducial markers (features) from the image, and Aloha, a Scala based machine learning library. You can learn more about these and other interesting projects at the eHarmony github page. In the wrap up, Jon mentioned the LA Machine Learning meetup which he runs. This is a great resource for LA residents separate and complementary to datascience.la groups, so consider signing up for all of the above and I hope to see you there in the future.

Spark

Production-targeted Spark guidance with real-world use cases Spark: Big Data Cluster Computing in Production goes beyond general Spark overviews to provide targeted guidance toward using lightning-fast big-data clustering in production. Written by an expert team well-known in the big data community, this book walks you through the challenges in moving from proof-of-concept or demo Spark applications to live Spark in production. Real use cases provide deep insight into common problems, limitations, challenges, and opportunities, while expert tips and tricks help you get the most out of Spark performance. Coverage includes Spark SQL, Tachyon, Kerberos, ML Lib, YARN, and Mesos, with clear, actionable guidance on resource scheduling, db connectors, streaming, security, and much more. Spark has become the tool of choice for many Big Data problems, with more active contributors than any other Apache Software project. General introductory books abound, but this book is the first to provide deep insight and real-world advice on using Spark in production. Specific guidance, expert tips, and invaluable foresight make this guide an incredibly useful resource for real production settings. Review Spark hardware requirements and estimate cluster size Gain insight from real-world production use cases Tighten security, schedule resources, and fine-tune performance Overcome common problems encountered using Spark in production Spark works with other big data tools including MapReduce and Hadoop, and uses languages you already know like Java, Scala, Python, and R. Lightning speed makes Spark too good to pass up, but understanding limitations and challenges in advance goes a long way toward easing actual production implementation. Spark: Big Data Cluster Computing in Production tells you everything you need to know, with real-world production insight and expert guidance, tips, and tricks.

Big Data Analytics with Spark: A Practitioner’s Guide to Using Spark for Large-Scale Data Processing, Machine Learning, and Graph Analytics, and High-Velocity Data Stream Processing

This book is a step-by-step guide for learning how to use Spark for different types of big-data analytics projects, including batch, interactive, graph, and stream data analysis as well as machine learning. It covers Spark core and its add-on libraries, including Spark SQL, Spark Streaming, GraphX, MLlib, and Spark ML. Big Data Analytics with Spark shows you how to use Spark and leverage its easy-to-use features to increase your productivity. You learn to perform fast data analysis using its in-memory caching and advanced execution engine, employ in-memory computing capabilities for building high-performance machine learning and low-latency interactive analytics applications, and much more. Moreover, the book shows you how to use Spark as a single integrated platform for a variety of data processing tasks, including ETL pipelines, BI, live data stream processing, graph analytics, and machine learning. The book also includes a chapter on Scala, the hottest functional programming language, and the language that underlies Spark. You’ll learn the basics of functional programming in Scala, so that you can write Spark applications in it. What's more, Big Data Analytics with Spark provides an introduction to other big data technologies that are commonly used along with Spark, such as HDFS, Avro, Parquet, Kafka, Cassandra, HBase, Mesos, and so on. It also provides an introduction to machine learning and graph concepts. So the book is self-sufficient; all the technologies that you need to know to use Spark are covered. The only thing that you are expected to have is some programming knowledge in any language.

Apache Spark Graph Processing

Dive into the world of large-scale graph data processing with Apache Spark's GraphX API. This book introduces you to the core concepts of graph analytics and teaches you how to leverage Spark for handling and analyzing massive graphs. From building to analyzing, you'll acquire a comprehensive skillset to work with graph data efficiently. What this Book will help me do Learn to utilize Apache Spark GraphX API to process and analyze graph data. Master transforming raw datasets into sophisticated graph structures. Explore visualization and analysis techniques for understanding graphs. Understand and build custom graph operations tailored to your needs. Implement advanced graph algorithms like clustering and iterative processing. Author(s) Rindra Ramamonjison is a seasoned data engineer with vast experience in big data technologies and graph processing. With a passion for explaining complex concepts in simple terms, Rindra builds on his professional expertise to guide readers in mastering cutting-edge Spark tools. Who is it for? This book is tailored for data scientists and software developers looking to delve into graph data processing at scale. Ideal for those with basic knowledge of Scala and Apache Spark, it equips readers with the tools and techniques to derive insights from complex network datasets. Whether you're diving deeper into big data or exploring graph-specific analytics, this book is your guide.

Neo4j Cookbook

Dive into Neo4j and uncover how to harness its powerful capabilities in graph data analysis with the Neo4j Cookbook. Across 75 well-structured recipes, you'll learn to apply practical techniques in modeling, querying, and visualizing graph databases, enabling you to address real-world challenges efficiently. What this Book will help me do Access Neo4j from popular programming languages such as Java, Python, and Scala, enabling easier integration into your projects. Migrate data seamlessly from various data stores, including SQL and NoSQL, into Neo4j, maintaining data consistency. Use best practices for data modeling with Neo4j to optimize performance and scalability for your applications. Analyze social data from sources like Facebook and Twitter, revealing valuable insights from connections and relationships. Integrate geospatial data to enable location-based queries and nearest-point searches, opening up advanced application features. Author(s) Ankur Goel, the author of Neo4j Cookbook, is an experienced technologist with an extensive background in handling database solutions and applications. Passionate about simplifying complex systems, Ankur excels in teaching essential database concepts through clear and actionable recipes. His writing is rooted in practical insights, reflecting his hands-on experience in the industry. Who is it for? This book is ideal for developers and data engineers who currently use or plan to integrate Neo4j into their workflows. If you are migrating from a traditional database system or delving into graph databases for the first time, this book offers structured guidance. Readers should have a fundamental understanding of programming and familiarity with database concepts for the best experience. It caters to individuals aiming to build or enhance data-driven applications using Neo4j's robust graph modeling.

Advanced Analytics with Spark

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

Learning Spark

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates.

Camel in Action

NEWER EDITION AVAILABLE Camel in Action, Second Edition is now available. An eBook of this older edition is included at no additional cost when you buy the revised edition! A limited number of pBook copies of this edition are still available. Please contact Manning Support to inquire about purchasing previous edition copies. Camel in Action is a Camel tutorial full of small examples showing how to work with the integration patterns. It starts with core concepts like sending, receiving, routing, and transforming data. It then shows you the entire lifecycle and goes in depth on how to test, deal with errors, scale, deploy, and even monitor your app—details you can find only in the Camel code itself. Written by the developers of Camel, this book distills their experience and practical insights so that you can tackle integration tasks like a pro. About the Technology Apache Camel is a Java framework that lets you implement the standard enterprise integration patterns in a few lines of code. With a concise but sophisticated DSL you snap integration logic into your app, Lego-style, using Java, XML, or Scala. Camel supports over 80 common transports such as HTTP, REST, JMS, and Web Services. About the Book What's Inside Valuable examples in Java and XML Explanations of complex patterns Error handling, testing, deploying, managing, and running Camel Accessible to beginners, useful to experts About the Reader About the Authors Claus Ibsen is a principal engineer working for FuseSource specializing in the enterprise integration space. He has worked on Apache Camel for the last three years where he is a a PMC member, a key contributor, and heads the development and roadmap. Claus lives in Sweden near Malmo with his wife and dog. Jonathan Anstey is a software engineer with varied experience in manufacturing control systems, build infrastructure, and enterprise integration. Lately, Jon has been working on Apache Camel as a PMC member and an active committer while at FuseSource. When he is not hacking on Camel he likes to spend time with his wife and daughter in St. John's, Newfoundland. Quotes I highly recommend this book. It kicks ass! - James Strachan, Cofounder of Apache Camel Strikes the right balance between core concepts and running code. - Gregor Hohpe, Coauthor of Enterprise Integration Patterns Comprehensive guide to enterprise integration with Camel. - Gordon Dickens, Chariot Solutions A deep book... with great examples. - Jeroen Benckhuijsen, Atos Origin Great content from the source developers. - Domingo Suarez Torres, SynergyJ A must-have. - Tijs Rademakers, Atos Origin

Free est une construction intéressante, si l'on en juge par la quantité d'encre et, occasionnellement, de sang, qu'elle fait couler. Elle est cependant assez mal comprise dans notre communauté. J'ai récemment enfin eu le déclic, grâce à une explication des plus sibyllines : Free, c'est juste la défonctionnalisation d'une monade dans sa configuration la plus déplaisante. Étrangement, cela a tout débloqué. Dans cette présentation, j'essaie de transmettre cette intuition et de montrer les techniques nécessaires à l'invention de Free.