talk-data.com

People (1 result)

Doug Cutting

creator of Avro

Showing 7 results

Activities & events

Title & Speakers	Event
AI-Powered Search 2025-01-20 Trey Grainger – author Apply cutting-edge machine learning techniques—from crowdsourced relevance and knowledge graph learning, to Large Language Models (LLMs)—to enhance the accuracy and relevance of your search results. Delivering effective search is one of the biggest challenges you can face as an engineer. AI-Powered Search is an in-depth guide to building intelligent search systems you can be proud of. It covers the critical tools you need to automate ongoing relevance improvements within your search applications. Inside you’ll learn modern, data-science-driven search techniques like: Semantic search using dense vector embeddings from foundation models Retrieval augmented generation (RAG) Question answering and summarization combining search and LLMs Fine-tuning transformer-based LLMs Personalized search based on user signals and vector embeddings Collecting user behavioral signals and building signals boosting models Semantic knowledge graphs for domain-specific learning Semantic query parsing, query-sense disambiguation, and query intent classification Implementing machine-learned ranking models (Learning to Rank) Building click models to automate machine-learned ranking Generative search, hybrid search, multimodal search, and the search frontier AI-Powered Search will help you build the kind of highly intelligent search applications demanded by modern users. Whether you’re enhancing your existing search engine or building from scratch, you’ll learn how to deliver an AI-powered service that can continuously learn from every content update, user interaction, and the hidden semantic relationships in your content. You’ll learn both how to enhance your AI systems with search and how to integrate large language models (LLMs) and other foundation models to massively accelerate the capabilities of your search technology. About the Technology Modern search is more than keyword matching. Much, much more. Search that learns from user interactions, interprets intent, and takes advantage of AI tools like large language models (LLMs) can deliver highly targeted and relevant results. This book shows you how to up your search game using state-of-the-art AI algorithms, techniques, and tools. About the Book AI-Powered Search teaches you to create a search that understands natural language and improves automatically the more it is used. As you work through dozens of interesting and relevant examples, you’ll learn powerful AI-based techniques like semantic search on embeddings, question answering powered by LLMs, real-time personalization, and Retrieval Augmented Generation (RAG). What's Inside Sparse lexical and embedding-based semantic search Question answering, RAG, and summarization using LLMs Personalized search and signals boosting models Learning to Rank, multimodal, and hybrid search About the Reader For software developers and data scientists familiar with the basics of search engine technology. About the Author Trey Grainger is the Founder of Searchkernel and former Chief Algorithms Officer and SVP of Engineering at Lucidworks. Doug Turnbull is a Principal Engineer at Reddit and former Staff Relevance Engineer at Spotify. Max Irwin is the Founder of Max.io and former Managing Consultant at OpenSource Connections. Quotes Belongs on the shelf of every search practitioner! - Khalifeh AlJadda, Google A treasure map! Now you have decades of semantic search knowledge at your fingertips. - Mark Moyou, NVIDIA Modern and comprehensive! Everything you need to build world-class search experiences. - Kelvin Tan, SearchStax Kick starts your ability to implement AI search with easy to understand examples. - David Meza, NASA data data-engineering search AI/ML LLM RAG	O'Reilly AI & ML Books
Data Serialization Formats with Doug Cutting and Julien Le Dem - Episode 8 2017-11-22 · 14:00 Julien Le Dem – creator of Parquet , Doug Cutting – creator of Avro , Tobias Macey – host Summary With the wealth of formats for sending and storing data it can be difficult to determine which one to use. In this episode Doug Cutting, creator of Avro, and Julien Le Dem, creator of Parquet, dig into the different classes of serialization formats, what their strengths are, and how to choose one for your workload. They also discuss the role of Arrow as a mechanism for in-memory data sharing and how hardware evolution will influence the state of the art for data formats. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to dataengineeringpodcast.com/gocd to download and launch it today. Enterprise add-ons and professional support are available for added peace of mind. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers This is your host Tobias Macey and today I’m interviewing Julien Le Dem and Doug Cutting about data serialization formats and how to pick the right one for your systems. Interview Introduction How did you first get involved in the area of data management? What are the main serialization formats used for data storage and analysis? What are the tradeoffs that are offered by the different formats? How have the different storage and analysis tools influenced the types of storage formats that are available? You’ve each developed a new on-disk data format, Avro and Parquet respectively. What were your motivations for investing that time and effort? Why is it important for data engineers to carefully consider the format in which they transfer their data between systems? What are the switching costs involved in moving from one format to another after you have started using it in a production system? What are some of the new or upcoming formats that you are each excited about? How do you anticipate the evolving hardware, patterns, and tools for processing data to influence the types of storage formats that maintain or grow their popularity? Contact Information Doug: cutting on GitHub Blog @cutting on Twitter Julien Email @J_ on Twitter Blog julienledem on GitHub Links Apache Avro Apache Parquet Apache Arrow Hadoop Apache Pig Xerox Parc Excite Nutch Vertica Dremel White Paper Twitter Blog on Release of Parquet CSV XML Hive Impala Presto Spark SQL Brotli ZStandard Apache Drill Trevni Apache Calcite The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast Arrow Avro CI/CD CSV Data Engineering Data Management GitHub Hadoop Hive Linux Parquet Presto Spark SQL Vertica XML	Data Engineering Podcast Listen
Event O'Reilly Data Engineering Books 2013-10-06
Oracle Big Data Handbook 2013-10-06 Keith Laker – author , Gokula Mishra – author , David Segleau – author , Brian Macdonald – author , Mark Hornick – author , Debra Harding – author , Robert Stackowiak – author , Helen Sun – author , Khader Mohiuddin – author , Tom Plunkett – author , Bruce Nelson – author Transform Big Data into Insight "In this book, some of Oracle's best engineers and architects explain how you can make use of big data. They'll tell you how you can integrate your existing Oracle solutions with big data systems, using each where appropriate and moving data between them as needed." -- Doug Cutting, co-creator of Apache Hadoop Cowritten by members of Oracle's big data team, Oracle Big Data Handbook provides complete coverage of Oracle's comprehensive, integrated set of products for acquiring, organizing, analyzing, and leveraging unstructured data. The book discusses the strategies and technologies essential for a successful big data implementation, including Apache Hadoop, Oracle Big Data Appliance, Oracle Big Data Connectors, Oracle NoSQL Database, Oracle Endeca, Oracle Advanced Analytics, and Oracle's open source R offerings. Best practices for migrating from legacy systems and integrating existing data warehousing and analytics solutions into an enterprise big data infrastructure are also included in this Oracle Press guide. Understand the value of a comprehensive big data strategy Maximize the distributed processing power of the Apache Hadoop platform Discover the advantages of using Oracle Big Data Appliance as an engineered system for Hadoop and Oracle NoSQL Database Configure, deploy, and monitor Hadoop and Oracle NoSQL Database using Oracle Big Data Appliance Integrate your existing data warehousing and analytics infrastructure into a big data architecture Share data among Hadoop and relational databases using Oracle Big Data Connectors Understand how Oracle NoSQL Database integrates into the Oracle Big Data architecture Deliver faster time to value using in-database analytics Analyze data with Oracle Advanced Analytics (Oracle R Enterprise and Oracle Data Mining), Oracle R Distribution, ROracle, and Oracle R Connector for Hadoop Analyze disparate data with Oracle Endeca Information Discovery Plan and implement a big data governance strategy and develop an architecture and roadmap data data-engineering oracle-database-solutions Analytics Big Data Data Governance DWH Hadoop NoSQL Oracle RDBMS
Hadoop: The Definitive Guide, 2nd Edition 2010-10-05 Tom White – author Discover how Apache Hadoop can unleash the power of your data. This comprehensive resource shows you how to build and maintain reliable, scalable, distributed systems with the Hadoop framework -- an open source implementation of MapReduce, the algorithm on which Google built its empire. Programmers will find details for analyzing datasets of any size, and administrators will learn how to set up and run Hadoop clusters. This revised edition covers recent changes to Hadoop, including new features such as Hive, Sqoop, and Avro. It also provides illuminating case studies that illustrate how Hadoop is used to solve specific problems. Looking to get the most out of your data? This is your book. Use the Hadoop Distributed File System (HDFS) for storing large datasets, then run distributed computations over those datasets with MapReduce Become familiar with Hadoop’s data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Analyze datasets with Hive, Hadoop’s data warehousing system Take advantage of HBase, Hadoop’s database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems "Now you have the opportunity to learn about Hadoop from a master -- not only of the technology, but also of common sense and plain talk." --Doug Cutting, Cloudera data data-engineering Hadoop Avro Cloud Computing DWH Apache HBase HDFS Hive
Lucene in Action, Second Edition 2010-07-08 Erik Hatcher – author , Otis Gospodnetic – author , Michael McCandless – author When Lucene first appeared, this superfast search engine was nothing short of amazing. Today, Lucene still delivers. Its high-performance, easy-to-use API, features like numeric fields, payloads, near-real-time search, and huge increases in indexing and searching speed make it the leading search tool. And with clear writing, reusable examples, and unmatched advice, Lucene in Action, Second Edition is still the definitive guide to effectively integrating search into your applications. This totally revised book shows you how to index your documents, including formats such as MS Word, PDF, HTML, and XML. It introduces you to searching, sorting, and filtering, and covers the numerous improvements to Lucene since the first edition. Source code is for Lucene 3.0.1. About the Technology About the Book What's Inside Performing hot backups Using numeric fields Tuning for indexing or searching speed Boosting matches with payloads Creating reusable analyzers Adding concurrency with threads Four new case studies Much more! About the Reader About the Authors Michael McCandless is a Lucene PMC member and committer with more than a decade of experience building search engines. Erik Hatcher and Otis Gospodnetić are the authors of the first edition of Lucene in Action and long-time contributors to Lucene, Solr, Mahout, and other Lucene-based projects. Quotes ... brings you up to speed. - Doug Cutting, Founder of Lucene, Nutch, and Hadoop This new edition has it all. - Chad Davis, Blackdog Software, Author of Struts 2 in Action Very readable, full of expert tips. - Rick Wagner, Acxiom Corp. Elegant, and easy to read - just like Lucene itself. - Shai Erera, IBM Haifa Research Labs For a Lucene developer, it's required reading. - Stuart Caborn, Thoughtworks data data-engineering search lucene API Hadoop HTML IBM XML
Hadoop: The Definitive Guide 2009-06-05 Tom White – author Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you: Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud Use Pig, a high-level query language for large-scale data processing Take advantage of HBase, Hadoop's database for structured and semi-structured data Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject. "Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk."-- Doug Cutting, Hadoop Founder, Yahoo! data data-engineering Hadoop Cloud Computing Apache HBase HDFS
Hands-On Microsoft Access: A Practical Guide to Improving Your Access Skills 2005-08-24 Bob Schneider – author Praise for Hands-On Microsoft Access “Bob has distilled the essence of database design and Access development into a highly valuable and easily understandable resource that I wish was available when I first started out.” —Graham R. Seach, Microsoft Access MVP “If you’ve been using Access with that typical uncertainty, asking yourself 'Just how could I do that?' or 'Why isn’t this working?', if you’d like to know what you’re doing before you hit the wall, this book is probably perfect for you.” —Olaf Rabbachin, CEO, IntuiDev IT-solutions “Life at the cutting edge of Access development is exciting and very challenging. The knowledge and experience gained over many years of research and trial-and-error has been hard won. But Bob's new book encapsulates the knowledge we now take for granted, and for the first time the beginner is afforded the opportunity to bypass all that hard work. In this his latest work, Bob has distilled the essence of database design and Access development into a highly valuable and easily understandable resource that I wish was available when I first started out.” —Graham R Seach, MCP, MCAD, MCSD, Microsoft Access MVP, author “This is an excellent book for beginners, with an easy reading style. It is now on my recommended list of books that I hand out in every Access class that I teach.” —M.L. “Sco” Scofield, Microsoft Access MVP, MCSD, Senior Instructor, Scofield Business Services “If you've been using Access with that typical uncertainty, asking yourself 'Just how could I do that?' or 'Why isn't this working?', or if you'd like to know what you're doing before you hit the wall, this book is perfect for you. Access is a tremendous product and a database is created using a few clicks; but without at least some theoretical background you're bound to encounter problems soon. I wish a book like this one would've been available when I started getting deeper into working with Access some ten years ago.” —Olaf Rabbachin, CEO, IntuiDev IT-solutions “This book is for any level DB developer/user. It is packed full of real-world examples and solutions that are not the normal Northwind database that most Access books use. The examples and the technical content surrounding them are the real strength of the book. Schneider uses real-world scenarios that make for excellent reading. It made me want to go and redo a lot of my older Access DBs that were not written as well as they could have been. This book taught me different approaches to doing some routine tasks.” —Ron Crumbaker, Microsoft MVP – SMS “While a very powerful application (or perhaps because of its power), Microsoft Access does have a steep learning curve and can be intimidating to new users. Bob Schneider has managed to write a book that's both understandable and enjoyable to read. His examples should be understandable to all readers, and he extends them in a logical manner. This book should leave the reader well equipped to make use of what many consider to be the best desktop database product available.” —Doug Steele, Microsoft Access MVP “The author takes what is potentially a very dry subject and adds fantastic color through entertaining analogies and metaphors. For instance, his examples using the NBA, the Beatles, and Donald Rumsfeld help us 'get it' without realizing we have just traversed what could be very stale database theory. Brilliant!” —Kel Good, MCT, MCSD for Microsoft.NET, Custom Software Development Inc. (www.customsoftware.ca) Go from Access “beginner” to Access “master”! Millions of people use Microsoft Access, but only a small fraction of them are really comfortable with it. If you're ready to go “beyond the wizards”—and become a confident, highly effective Access user— Hands-On Microsoft Access was written for you. In plain English, Bob Schneider helps you master crucial principles for building flexible, powerful databases. Discover how to enter data more easily, retrieve it more freely, manipulate it more successfully, analyze it with greater sophistication, and share it more effectively. Schneider's dozens of hands-on examples thoroughly demystify Access, and his friendly, conversational style makes it more approachable than ever before. Hands-On Microsoft Access presents solutions for the challenges you're most likely to encounter, including How do Access objects and interfaces fit together, and when should I use each one? What's the best way for me to organize my fields into tables? How can I modify the tables, forms, and reports an Access wizard created for me? How can I design forms and reports for people to use more effectively? How do primary keys and relationships work, and why are they so important? How do I make sure my data stays consistent and accurate? How do I build queries that give me the right information—quickly, efficiently, and reliably? How do I use data from other sources, or deliver Access data to other people or programs? What are PivotTables and PivotCharts, what can I do with them, and how do I use them? Written for Access 2003, this book also contains special instructions for Access 2002 users and extensive coverage of issues relevant to Access 95, 97, and 2000. data data-engineering database-management-tools microsoft-access C#/.NET Microsoft

Showing 7 results