talk-data.com

People (27 results)

Romeo

AI Research Engineer · IBM Research Europe

Maddie Shang

Sr. AI Research Engineer · OpenMined

Mohammad Nassar

Cloud Research Engineer · IBM Haifa

Showing 16 results

Activities & events

Title & Speakers	Event
[Notes]How to Build a Portfolio That Reflects Your Real Skills 2025-12-28 · 18:00 These are the notes of the previous "How to Build a Portfolio That Reflects Your Real Skills" event: Properties of an ideal portfolio repository: Built to prove employable skills and readiness for real work Fewer projects, carefully chosen to match job requirements Clean, readable, refactored code, and follows best practices Detailed READMEs (setup, features, tech stack, decisions, how to deploy, testing strategy, etc) Logical, meaningful commits that show development process <- you can follow the git history for important commits/features Clear architecture (layers, packages, separation of concerns) <- use best practices Unit and integration tests included and explained <-- also talk about them in the README Proper validation, exceptions, and edge case handling Polished, complete, production-like projects only “Can this person work on our codebase?” <-- reviewers will ask this Written for recruiters, hiring managers, and senior engineers Uses industry-relevant and job-listed technologies <- tech stak should match the CV Well-scoped, realistic features similar to real products Consistent style, structure, and conventions across projects Environment variables, clear setup steps, sample configs Minimal, justified dependencies with clear versioning Proper logging, and meaningful log messages No secrets committed, basic security best practices applied Shows awareness of scaling, performance, and future growth <- at least have a "possible improvements" section in the README a list of ADRs explains design choices and trade-offs <- should be a part of the documentation 📌 Backend & Frontend Portfolio Project Ideas These projects are intentionally reusable across tech stacks. Following tutorials and reusing patterns is expected — what matters is: understanding the architecture explaining trade-offs documenting decisions clearly ☕ Junior Java Backend Developer (Spring Boot) 1. Shop Manager Application A monolithic Spring Boot app designed with microservice-style boundaries. Features Secure user registration & login Role-based access control using JWT REST APIs for: Users Products Inventory Orders Automatic inventory updates when orders are placed CSV upload for bulk product & inventory import Clear service boundaries (UserService, OrderService, InventoryService, etc.) Engineering Focus Clean architecture (controllers, services, repositories) Global exception handling Database migrations (Flyway/Liquibase) Unit & integration testing Clear README explaining architecture decisions 2. Parallel Data Processing Engine Backend service for processing large datasets efficiently. Features Upload large CSV/log files Split data into chunks Process chunks in parallel using: `ExecutorService` `CompletableFuture` Aggregate and return results Demonstrates Java concurrency Thread pools & async execution Performance optimization 3. Distributed Task Queue System Simple async job processing system. Features One service submits tasks Another service processes them asynchronously Uses Kafka or RabbitMQ Tasks: report generation, data transformation Demonstrates Message-driven architecture Async workflows Eventual consistency 4. Rate Limiting & Load Control Service Standalone service that protects APIs from abuse. Features Token bucket or sliding window algorithms Redis-backed counters Per-user or per-IP limits Demonstrates Algorithmic thinking Distributed state API protection patterns 5. Search & Indexing Backend Document or record search service. Features In-memory inverted index Text search, filters, ranking Optional Elasticsearch integration Demonstrates Data structures Read-optimized design Trade-offs between custom vs external tools 6. Distributed Configuration & Feature Flag Service Centralized config service for other apps. Features Key-value configuration store Feature flags Caching & refresh mechanisms Demonstrates Caching strategies Consistency vs availability trade-offs System design for shared services 🐹 Mid-Level Go Backend Developer (Non-Kubernetes) 1. High-Throughput Event Processing Pipeline Multi-stage concurrent pipeline. Features HTTP/gRPC ingestion Validation & transformation stages Goroutines & channels Worker pools, batching, backpressure Graceful shutdown 2. Distributed Job Scheduler & Worker System Async job execution platform. Features Job scheduling & delayed execution Retries & idempotency Job states (pending, running, failed, completed) Message queue or gRPC-based workers 3. In-Memory Caching Service Redis-like cache written from scratch. Features TTL support Eviction strategies (LRU/LFU) Concurrent-safe access Optional disk persistence 4. Rate Limiting & Traffic Shaping Gateway Reverse-proxy-style rate limiter. Features Token bucket / leaky bucket Circuit breakers Redis-backed distributed limits 5. Log Aggregation & Query Engine Incrementally built system: Step-by-step REST API + Postgres (store logs, query logs) Optimize for massive concurrency Replace DB with in-memory data structures Add streaming endpoints using channels & batching 🐍 Mid-Level Python Backend Developer 1. Asynchronous Task Processing System Async job execution platform. Features Async API submission Worker pool (asyncio or Celery-like) Retries & failure handling Job status tracking Idempotency 2. Event-Driven Data Pipeline Streaming data processing service. Features Event ingestion Validation & transformation Batching & backpressure handling Output to storage or downstream services 3. Distributed Rate Limiting Service API protection service. Steps Step 1: Use an existing rate-limiting library Step 2: Implement token bucket / sliding window yourself 4. Search & Indexing Backend Search system for logs or documents. Features Custom indexing or Elasticsearch Filtering & time-based queries Read-heavy optimization 5. Configuration & Feature Flag Service Shared configuration backend. Steps Step 1: Use a caching library Step 2: Implement your own cache (explain in README) 🟦 Mid-Level TypeScript Backend Developer 1. Asynchronous Job Processing System Queue-based task execution. Features BullMQ / RabbitMQ / Redis Retries & scheduling Status tracking 2. Real-Time Chat / Notification Service WebSocket-based system. Features Presence tracking Message persistence Real-time updates 3. Rate Limiting & API Gateway API gateway with protections. Features Token bucket / sliding window Response caching Request logging 4. Search & Filtering Engine Search backend for products, logs, or articles. Features In-memory index or Elasticsearch Pagination & sorting 5. Feature Flag & Configuration Service Centralized config management. Features Versioning Rollout strategies Caching 🟨 Mid-Level Node.js Backend Developer 1. Async Task Queue System Background job processor. Features Bull / Redis / RabbitMQ Retries & scheduling Status APIs 2. Real-Time Chat / Notification Service Socket-based system. Features Rooms Presence tracking Message persistence 3. Rate Limiting & API Gateway Traffic control service. Features Per-user/API-key limits Logging Optional caching 4. Search & Indexing Backend Indexing & querying service. 5. Feature Flag / Configuration Service Shared backend for app configs. ⚛️ Mid-Level Frontend Developer (React / Next.js) 1. Dynamic Analytics Dashboard Interactive data visualization app. Features Charts & tables Filters & live updates React Query / Redux / Zustand Responsive layouts 2. E-Commerce Store Full shopping experience. Features Product listings Search, filters, sorting Cart & checkout SSR/SSG with Next.js 3. Real-Time Chat / Collaboration App Live multi-user UI. Features WebSockets or Firebase Presence indicators Real-time updates 4. CMS / Blogging Platform SEO-focused content app. Features SSR for SEO Markdown or API-based content Admin editing interface 5. Personalized Analytics / Recommendation UI Data-heavy frontend. Features Filtering & lazy loading Large dataset handling User-specific insights 6. AI Chatbot App — “My House Plant Advisor” LLM-powered assistant with production-quality UX. Core Features Chat interface with real-time updates Input normalization & validation Offensive content filtering Unsupported query detection Rate limiting (per user) Caching recent queries Conversation history per session Graceful fallbacks & error handling Advanced Features Prompt tuning (beginner vs expert users) Structured advice formatting (cards, bullets) Local LLM support Analytics dashboard (popular questions) Voice input/output (speech-to-text, TTS) ✅ Final Advice You do NOT need to build everything. Instead, pick 1–2 strong projects per role and focus on depth: Explain the architecture clearly Document trade-offs (why you chose X over Y) Show incremental improvements Prove you understand why, not just how 📌 Portfolio Quality Signals (Very Important) Have a large, organic commit history → A single or very few commits is a strong indicator of copy-paste work. Prefer 3–5 complex projects over 20 simple ones → Many tiny projects often signal shallow understanding. 🎯 Why This Helps in Interviews Working on serious projects gives you: Real hands-on practice Concrete anecdotes (stories you can tell in interviews) A safe way to learn technologies you don’t fully know yet Better focus and long-term learning discipline A portfolio that can be ported to another tech stack later (Java → Go, Node → Python, etc.) 🎥 Demo & Documentation Best Practices Create a 2–3 minute demo / walkthrough video Show the app running Explain what problem it solves Highlight one or two technical decisions At the top of every README: Add a plain-English paragraph explaining what the project does Assume the reader is a complete beginner 🤝 Open Source & Personal Projects (Interview Signal) Always mention that you have contributed to Open Source or built personal projects. Shows team spirit Shows you can read, understand, and navigate an existing codebase Signals that you can onboard into a real-world repository Makes you sound like an engineer, not just a tutorial follower	[Notes]How to Build a Portfolio That Reflects Your Real Skills
AI-Powered Search 2025-01-20 Trey Grainger – author Apply cutting-edge machine learning techniques—from crowdsourced relevance and knowledge graph learning, to Large Language Models (LLMs)—to enhance the accuracy and relevance of your search results. Delivering effective search is one of the biggest challenges you can face as an engineer. AI-Powered Search is an in-depth guide to building intelligent search systems you can be proud of. It covers the critical tools you need to automate ongoing relevance improvements within your search applications. Inside you’ll learn modern, data-science-driven search techniques like: Semantic search using dense vector embeddings from foundation models Retrieval augmented generation (RAG) Question answering and summarization combining search and LLMs Fine-tuning transformer-based LLMs Personalized search based on user signals and vector embeddings Collecting user behavioral signals and building signals boosting models Semantic knowledge graphs for domain-specific learning Semantic query parsing, query-sense disambiguation, and query intent classification Implementing machine-learned ranking models (Learning to Rank) Building click models to automate machine-learned ranking Generative search, hybrid search, multimodal search, and the search frontier AI-Powered Search will help you build the kind of highly intelligent search applications demanded by modern users. Whether you’re enhancing your existing search engine or building from scratch, you’ll learn how to deliver an AI-powered service that can continuously learn from every content update, user interaction, and the hidden semantic relationships in your content. You’ll learn both how to enhance your AI systems with search and how to integrate large language models (LLMs) and other foundation models to massively accelerate the capabilities of your search technology. About the Technology Modern search is more than keyword matching. Much, much more. Search that learns from user interactions, interprets intent, and takes advantage of AI tools like large language models (LLMs) can deliver highly targeted and relevant results. This book shows you how to up your search game using state-of-the-art AI algorithms, techniques, and tools. About the Book AI-Powered Search teaches you to create a search that understands natural language and improves automatically the more it is used. As you work through dozens of interesting and relevant examples, you’ll learn powerful AI-based techniques like semantic search on embeddings, question answering powered by LLMs, real-time personalization, and Retrieval Augmented Generation (RAG). What's Inside Sparse lexical and embedding-based semantic search Question answering, RAG, and summarization using LLMs Personalized search and signals boosting models Learning to Rank, multimodal, and hybrid search About the Reader For software developers and data scientists familiar with the basics of search engine technology. About the Author Trey Grainger is the Founder of Searchkernel and former Chief Algorithms Officer and SVP of Engineering at Lucidworks. Doug Turnbull is a Principal Engineer at Reddit and former Staff Relevance Engineer at Spotify. Max Irwin is the Founder of Max.io and former Managing Consultant at OpenSource Connections. Quotes Belongs on the shelf of every search practitioner! - Khalifeh AlJadda, Google A treasure map! Now you have decades of semantic search knowledge at your fingertips. - Mark Moyou, NVIDIA Modern and comprehensive! Everything you need to build world-class search experiences. - Kelvin Tan, SearchStax Kick starts your ability to implement AI search with easy to understand examples. - David Meza, NASA data data-engineering search AI/ML LLM RAG	O'Reilly AI & ML Books
Text and Vector Search from Scratch 2024-05-27 · 14:00 Alexey Grigorev – Founder @ DataTalks.Club Hands-on workshop on building a search engine from scratch, focusing on text search and vector search. Topics include in-memory text search, tokenization and preprocessing, inverted index construction, embeddings, converting text to vectors, cosine similarity, and strategies to combine text and vector search. The session includes practical coding in a Jupyter Notebook using Python to implement both text and vector search approaches. Python jupyter notebook embeddings inverted index cosine similarity	Implement a Search Engine
Event Google Cloud Next '24 2024-04-11
Building generative AI experiences for the enterprise on Google Cloud 2024-04-11 · 21:05 Eddie Zhou – Founding Engineer @ Glean Building an assistant capable of answering complex, company-specific questions and executing workflows requires first building a powerful Retrieval Augmented Generation (RAG) system. Founding engineer Eddie Zhou explains how Glean built its RAG system on Google Cloud— combining a domain-adapted search engine with dynamic prompts to harness the full capabilities of Gemini's reasoning engine. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only. Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.
Accelerate analytics and semantic search in real-time with AlloyDB for PostgreSQL 2024-04-10 · 22:00 Sam Idicula – Senior Staff Software Engineer @ Google Cloud , Sridhar Ranganathan – Product Manager @ Google Cloud , Fei Meng – Head of Data Platform @ Nuro Your transactional data powers many applications – from Analytics to generative AI and interactive online systems. AlloyDB unifies all these workloads onto a single, high-performance platform to extend your real-time data. This session dives into two built-in features: AlloyDB AI and the Analytics Accelerator. We'll show the key technologies behind these features, including Google's fast vector search and the columnar engine that enables fast analytical queries, hybrid transaction, and analytics use cases. We’ll share how customers simplified their Analytical and gen AI apps with these two features. Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.
Building generative AI experiences for the enterprise on Google Cloud 2024-04-10 · 16:30 Eddie Zhou – Founding Engineer @ Glean Building an assistant capable of answering complex, company-specific questions and executing workflows requires first building a powerful Retrieval Augmented Generation (RAG) system. Founding engineer Eddie Zhou explains how Glean built its RAG system on Google Cloud— combining a domain-adapted search engine with dynamic prompts to harness the full capabilities of Gemini's reasoning engine. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only. Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.
How IPRally built their ML platform on Ray and GKE 2024-04-10 Nathan Beach – Group Product Manager @ Google Cloud , Juho Kallio – CTO @ IPRally Learn how the patent search engine company IPRally created a custom compute platform to enable higher scale data processing and deep learning. The solution relies on Ray Core and Google Kubernetes Engine, and harvests the cheapest resources from all around the world. In addition to the efficiency, the goal was to build the best environment for machine learning R&D. This has been achieved with integration to Weights&Biases as the experiment tracking system. In this session, we’ll go through on a high level the solution. Please note: seating is limited and on a first-come, first served basis; standing areas are available Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.
AI for Search Success at QAD 2024-04-09 · 19:50 Joey Jablonski – VP of Global Solutions @ Pythian , Jim Josey – Vice President Information Technology Services @ QAD This presentation explores deploying retrieval augmented generation (RAG) on Vertex AI Search to enhance QAD's internal data search (Jira, Confluence, Google Sites). Discover how GenAI improves query responses, utilizing a user-friendly web app on Google App Engine to counteract the loss of institutional knowledge. Join us for insights into this innovative enterprise search solution. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only. Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.
Building generative AI experiences for the enterprise on Google Cloud 2024-04-09 Eddie Zhou – Founding Engineer @ Glean Building an assistant capable of answering complex, company-specific questions and executing workflows requires first building a powerful Retrieval Augmented Generation (RAG) system. Founding engineer Eddie Zhou explains how Glean built its RAG system on Google Cloud— combining a domain-adapted search engine with dynamic prompts to harness the full capabilities of Gemini's reasoning engine. By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only. Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Elastic Berlin Meetup @ Zalando: September Edition 2023-09-14 · 16:30 Join us for a meetup on September 14th at 18.30 at Zalando, Berlin! 18:30: Join us for a drink Please have your full and real name in your profile description because the security team will check if you're registered when you arrive. 18:45: Rankquest: Benchmarking Search API Ranking with Elasticsearch (Jilles van Gurp) Search Ranking is something that many companies that use Elasticsearch struggle with. Something we noticed while helping various clients is that many companies never evolve to having a systematic approach for testing their search ranking quality. It's too abstract for them; they don't know where to start with this, and they don't really get how this should be done or even why this is important. In this presentation we present and unveil our new ranking tool, Rankquest Studio, which aims to address some of these issues. Rankquest emerged out of our frustration with existing tools and approaches in this space and we'll reflect a bit on the requirements we have for this before diving into a demo. Rankquest Studio, is open source, web-based, easy to use, and it can be used to to build out test benchmarks for evaulating your search solutions. 19:15: Generative Black-Box Testing for Evaluating Search Quality (Oliver Trosien @ Zalando) We present a novel generative Black-Box Testing approach that uses semantically equivalent queries (e.g. “rote Kleider”, “Kleid rot”) for introspecting the quality of a search engine. At Zalando, we developed a tool for finding search quality problems at scale with the help of mass-generating semantically equivalent query variants. This is a novel way to find relevance problems that complements other approaches that use customer metrics or ground truth data. The tool allows easy extension with new test scenarios and languages by non-technical native language speakers. We will show how it was used to continuously monitor search relevance, to find regressions, and how you can implement such a tool yourself. Here are some questions that we’ll discuss in the session: How can you track search quality at scale? What kind of quality issues does generative black-box testing (not) uncover? Can it be used to test semantic or vector search quality? 20.00: Pizza Special thanks to our hosts, Zalando!	Elastic Berlin Meetup @ Zalando: September Edition
Cutting the Edge in Fighting Cybercrime: Reverse-Engineering a Search Language to Cross-Compile 2022-07-22 · 18:21 Traditional cybersecurity Security Information and Event Management (SIEM) ways do not scale well for data sources with 30TiB per day, leading HSBC to create a Cybersecurity Lakehouse with Delta and Spark. Creating a platform to overcome several conventional technical constraints, the limitation in the amount of data for long-term analytics available in traditional platforms and query languages is difficult to scale and time-consuming to run. In this talk, we’ll learn how to implement (or actually reverse-engineer) a language with Scala and translate it into what Apache Spark understands, the Catalyst engine. We’ll guide you through the technical journey of building equivalents of a query language into Spark. We’ll learn how HSBC business benefited from this cutting-edge innovation, like decreasing time and resources for Cyber data processing migration, improving Cyber threat Incident Response, and fast onboarding of HSBC Cyber Analysts on Spark with Cybersecurity Lakehouse platform. Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/ Analytics Data Lakehouse Databricks Delta Scala Cyber Security Spark	Databricks DATA + AI Summit 2023 YouTube
Event O'Reilly Data Engineering Books 2015-11-17
Elasticsearch in Action 2015-11-17 Radu Gheorghe – author , Matthew Lee Hinman – author , Roy Russo – author Elasticsearch in Action teaches you how to build scalable search applications using Elasticsearch. You'll ramp up fast, with an informative overview and an engaging introductory example. Within the first few chapters, you'll pick up the core concepts you need to implement basic searches and efficient indexing. With the fundamentals well in hand, you'll go on to gain an organized view of how to optimize your design. Perfect for developers and administrators building and managing search-oriented applications. About the Technology Modern search seems like magic'you type a few words and the search engine appears to know what you want. With the Elasticsearch real-time search and analytics engine, you can give your users this magical experience without having to do complex low-level programming or understand advanced data science algorithms. You just install it, tweak it, and get on with your work. About the Book Elasticsearch in Action teaches you how to write applications that deliver professional quality search. As you read, you'll learn to add basic search features to any application, enhance search results with predictive analysis and relevancy ranking, and use saved data from prior searches to give users a custom experience. This practical book focuses on Elasticsearch's REST API via HTTP. Code snippets are written mostly in bash using cURL, so they're easily translatable to other languages. What's Inside What is a great search application? Building scalable search solutions Using Elasticsearch with any language Configuration and tuning About the Reader This book is for developers and administrators building and managing search-oriented applications. About the Authors Radu Gheorghe is a search consultant and software engineer. Matthew Lee Hinman develops highly available, cloud-based systems. Roy Russo is a specialist in predictive analytics. Quotes To understand how a modern search infrastructure works is a daunting task. Radu, Matt, and Roy make it an engaging, hands-on experience. - Sen Xu, Twitter Inc. An indispensable guide to the challenges of search of semi-structured data. - Artur Nowak, Evidence Prime The best resource for a complex topic. Highly recommended. - Daniel Beck, juris GmbH Took me from confused to confident in a week. - Alan McCann, Givsum.com data data-engineering search elasticsearch Analytics API Bash Cloud Computing Data Science ELK
ElasticSearch Blueprints 2015-07-24 Vineeth Mohan – author Dive into search technology with "ElasticSearch Blueprints"! This is the perfect project-based guide to help you master Elasticsearch. You will learn how to build and design scalable, effective search solutions, improve search relevancy, manage data efficiently, perform analytics, and visualize your data in comprehensive ways. What this Book will help me do Build and fine-tune scalable search engine features with Elasticsearch. Design and implement accurate ecommerce search solutions using filters. Analyze and visualize data with Elasticsearch's powerful data aggregation capabilities. Increase search relevancy and enhance user query assistance using analyzers. Incorporate enhanced data organization methods, including parent-child relationships. Author(s) None Mohan is an experienced professional specializing in search technologies. With a strong technical background, they have engaged deeply with Elasticsearch, creating solutions that address practical challenges. Their approach focuses on making technical topics accessible, guiding readers step-by-step through projects. Who is it for? This book is tailored for data professionals, application developers, and enthusiasts eager to delve into search technologies. Whether you're beginning with Elasticsearch or aiming to refine your skills, this guide will advance your expertise. By working through practical cases, you'll gain confidence in using Elasticsearch effectively to meet diverse requirements. data data-engineering search elasticsearch Analytics ELK
ElasticSearch Cookbook - Second Edition 2015-01-28 Alberto Paro – author The "ElasticSearch Cookbook - Second Edition" is a hands-on guide featuring over 130 advanced recipes to help you harness the power of ElasticSearch, a leading search and analytics engine. Through insightful examples and practical guidance, you'll learn to implement efficient search solutions, optimize queries, and manage ElasticSearch clusters effectively. What this Book will help me do Design and configure ElasticSearch topologies optimized for your specific deployment needs. Develop and utilize custom mappings to optimize your data indexes. Execute advanced queries and filters to refine and retrieve search results effectively. Set up and monitor ElasticSearch clusters for optimal performance. Extend ElasticSearch capabilities through plugin development and integrations using Java and Python. Author(s) Alberto Paro is a technology expert with years of experience working with ElasticSearch, Big Data solutions, and scalable cloud architecture. He has authored multiple books and technical articles on ElasticSearch, leveraging his extensive knowledge to provide practical insights. His approachable and detail-oriented style makes complex concepts accessible to technical professionals. Who is it for? This book is best suited for software developers and IT professionals looking to use ElasticSearch in their projects. Readers should be familiar with JSON, as well as basic programming skills in Java. It is ideal for those who have an understanding of search applications and want to deepen their expertise. Whether you're integrating ElasticSearch into a web application or optimizing your system's search capabilities, this book will provide the skills and knowledge you need. data data-engineering search elasticsearch Analytics Big Data Cloud Computing ELK Java JSON Python
Solr in Action 2014-03-25 Trey Grainger – author , Timothy Potter – author Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities. About the Technology About the Book Whether you're handling big (or small) data, managing documents, or building a website, it is important to be able to quickly search through your content and discover meaning in it. Apache Solr is your tool: a ready-to-deploy, Lucene-based, open source, full-text search engine. Solr can scale across many servers to enable real-time queries and data analytics across billions of documents. Solr in Action teaches you to implement scalable search using Apache Solr. This easy-to-read guide balances conceptual discussions with practical examples to show you how to implement all of Solr's core capabilities. You'll master topics like text analysis, faceted search, hit highlighting, result grouping, query suggestions, multilingual search, advanced geospatial and data operations, and relevancy tuning. What's Inside How to scale Solr for big data Rich real-world examples Solr as a NoSQL data store Advanced multilingual, data, and relevancy tricks Coverage of versions through Solr 4.7 About the Reader This book assumes basic knowledge of Java and standard database technology. No prior knowledge of Solr or Lucene is required. About the Authors Trey Grainger is a director of engineering at CareerBuilder. Timothy Potter is a senior member of the engineering team at LucidWorks. The authors work on the scalability and reliability of Solr, as well as on recommendation engine and big data analytics technologies. Quotes The knowledge and techniques you need. - From the Foreword by Yonik Seeley, Creator of Solr Readable and immediately applicable ... an excellent book. - John Viviano, InterCorp, Inc. The go-to guide for Solr ... a definitive resource for both beginners and experts. - Scott Anthony, Business Instruments A well-dosed combination of deep technical knowledge and real-world experience. - Alexandre Madurell, Piksel, Inc. data data-engineering search solr Analytics Big Data Data Analytics Java NoSQL
ElasticSearch Server 2013-02-21 Rafal Kuc – author , Marek Rogozinski – author ElasticSearch Server is an excellent resource for mastering the ElasticSearch open-source search engine. This book takes you through practical steps to implement, configure, and optimize search capabilities, suitable for various data sets and applications, making faster and more accurate search outcomes accessible. What this Book will help me do Understand the core concepts of ElasticSearch, including data indexing, dynamic mapping, and search analysis. Develop practical skills in writing queries and filters to retrieve precise and relevant results. Learn to set up and efficiently manage ElasticSearch clusters for scalability and real-time performance. Implement advanced ElasticSearch functions like autocompletion, faceting, and geo-search. Utilize optimization techniques for cluster monitoring, health-checks, and tuning for reliable performance. Author(s) The authors of ElasticSearch Server are industry professionals with extensive experience in search technologies and system architecture. They have contributed to multiple tools and publications in the field of data search and analytics. Their writing aims to distill complex technical concepts into practical knowledge, making it valuable for readers from all backgrounds. Who is it for? This book is perfect for developers, system architects, and IT professionals seeking a robust and scalable search solution for their projects. Whether you're new to ElasticSearch or looking to deepen your expertise, this book will serve as a practical guide to implement ElasticSearch effectively. The only prerequisites are a basic understanding of databases and general query concepts, so prior search server knowledge is not required. data data-engineering search elasticsearch Analytics ELK

Showing 16 results