talk-data.com
People (4 results)
See all 4 →Activities & events
| Title & Speakers | Event |
|---|---|
|
Teague Sterling
– Director of Computational Biology Systems
@ 23andMe
We introduce a privacy-forward, secure, extensible, easy-to-use web application, Explore23, for browsing the multimodal data that has been collected as part of the 23andMe, Inc. Research cohort, built heavily on the DuckDB ecosystem. While the 23andMe Research program has collected a large number of data types from its >11M customers who have consented to participate in its Research program, there has not yet been a comprehensive tool enabling the exploration and visualization of the cohort, which is invaluable for genomics-driven target discovery and validation. Furthermore, any exploration of the 23andMe Research cohort needed to enable extensibility to future data types and applications, scalability for large participant and variant cohorts, comprehension by non-experts and external parties, and most importantly, protection of research participant privacy. The Explore23 tool utilizes DuckDB and the DuckDB extension ecosystem extensively through the lifecycle of data used in the showcase. A combination of pre-processing, backend result generation, and WASM-powered Mosaic integrations enable rapid search and visualization of the wide range of datasets collected. This includes integrating data from the various stages of the 23andMe research "pipeline": including raw survey questions, curated condition-based cohorts, genetic variants, and GWAS results. Of particular interest are the variant browser, which enables rapid, in-browser visualization of the over 170 million imputed and genotyped genetic variants in the 23andMe genetic panels; and the phenotypic pedigree summaries, which merges columnar datasets and graph queries (via DuckPGQ) to rapidly identify related participants in the 23andMe research cohort that share specific conditions. For each feature, there were challenges, both internal and external, in finding and contextualizing specific datasets for groups not already well acquainted with the data (e.g., even browsing surveys), and managing data scale. The front-end serves data that has been pre-processed through rigorous masking logic to protect participant privacy. In sum, Explore23 is an invaluable tool for research scientists exploring the immense complexity and diverse data of the whole 23andMe research cohort data. It highlights the incredible versatility of the DuckDB ecosystem to unify data access from raw result processing up through in-browser visualizations. |
Small Data SF 2025
|
|
Solr in Action
2014-03-25
Trey Grainger
– author
,
Timothy Potter
– author
Solr in Action is a comprehensive guide to implementing scalable search using Apache Solr. This clearly written book walks you through well-documented examples ranging from basic keyword searching to scaling a system for billions of documents and queries. It will give you a deep understanding of how to implement core Solr capabilities. About the Technology About the Book Whether you're handling big (or small) data, managing documents, or building a website, it is important to be able to quickly search through your content and discover meaning in it. Apache Solr is your tool: a ready-to-deploy, Lucene-based, open source, full-text search engine. Solr can scale across many servers to enable real-time queries and data analytics across billions of documents. Solr in Action teaches you to implement scalable search using Apache Solr. This easy-to-read guide balances conceptual discussions with practical examples to show you how to implement all of Solr's core capabilities. You'll master topics like text analysis, faceted search, hit highlighting, result grouping, query suggestions, multilingual search, advanced geospatial and data operations, and relevancy tuning. What's Inside How to scale Solr for big data Rich real-world examples Solr as a NoSQL data store Advanced multilingual, data, and relevancy tricks Coverage of versions through Solr 4.7 About the Reader This book assumes basic knowledge of Java and standard database technology. No prior knowledge of Solr or Lucene is required. About the Authors Trey Grainger is a director of engineering at CareerBuilder. Timothy Potter is a senior member of the engineering team at LucidWorks. The authors work on the scalability and reliability of Solr, as well as on recommendation engine and big data analytics technologies. Quotes The knowledge and techniques you need. - From the Foreword by Yonik Seeley, Creator of Solr Readable and immediately applicable ... an excellent book. - John Viviano, InterCorp, Inc. The go-to guide for Solr ... a definitive resource for both beginners and experts. - Scott Anthony, Business Instruments A well-dosed combination of deep technical knowledge and real-world experience. - Alexandre Madurell, Piksel, Inc. |
O'Reilly Data Engineering Books
|