GIS

Collaborative GIS editing in JupyterLab

2025-09-30 · PyData Paris 2025 Watch

talk

by Arjun Verma , Martin Renou

JupyterGIS facilitates collaborative editing of GIS files, including the QGIS format, through a web-based interface built on JupyterLab. It also provides a programmatic interface tailored for Jupyter notebooks, making use of the advanced capabilities of the Jupyter rich display system.

In this presentation, we will first provide a high-level overview of the project’s main features.

We will then explore the latest developments, including the integration with the xarray stack and the Pangeo ecosystem, and the support for STAC geographical asset catalogs.

We conclude the talk with a forward-looking presentation of the ongoing development, such as the story maps feature, and the integration with the R programming language.

Sponsored by: Deloitte | Analyzing Geospatial Data at Scale in Databricks for Environment & Agriculture

GIS For Dummies, 2nd Edition

2025-05-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Michael N. DeMers , Jami Dennis

AI/ML data data-engineering geographic-information-system-gis geographic information system (gis) location-data

A jargon-free primer on GIS concepts and the essential tech tools Geographic Information Systems (GIS) is the fascinating technology field that's all about understanding and visualizing our world. GIS For Dummies introduces you to the essential skills you'll need if you want to become a geospatial data guru. You'll learn to read, analyze, and interpret maps, and you'll discover how GIS professionals create digital models of landscapes, cities, weather patterns, and beyond. Understand how advances in technology, including AI, are turning GIS tools into powerful assets for solving real-world problems and protecting the planet. This beginner-friendly book makes it easy to grasp necessary GIS concepts so you can apply GIS in your organization, pursue a career in this dynamic field, or just impress others with your geographic knowledge. Learn the basics of data analysis, interpretation, and modeling using Geographic Information Systems Gain the skills to read and interpret all types of maps and visual GIS information Discover how GIS is used in fields like urban planning, environmental science, business, and disaster management Explore whether a career in GIS could be right for you GIS For Dummies is the perfect starting point for students, professionals, and anyone curious about the potential of GIS as a technology or career choice.

ArcGIS Pro 3.x Cookbook - Second Edition

2024-05-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tripp Corbin, GISP

Data Management arcgis data data-engineering geographic-information-system-gis location-data

ArcGIS Pro 3.x Cookbook teaches you to master the powerful tools available in Esri's ArcGIS Pro application for geospatial data management and analysis. You'll discover practical recipes that guide you through creating, editing, visualizing, and analyzing GIS data in 2D and 3D. Whether you are transitioning from ArcMap or starting fresh, this book will empower you to build impressive geospatial projects. What this Book will help me do Navigate and make effective use of the ArcGIS Pro user interface and tools. Create, edit, and publish detailed 2D and 3D geospatial maps. Manage data efficiently using geodatabases, relationships, and topology tools. Perform comprehensive spatial analyses including proximity, clustering, and 3D analysis. Apply geospatial data validation techniques to ensure data consistency and integrity. Author(s) Tripp Corbin, GISP, is an experienced Geographic Information Systems Professional with extensive expertise in Esri's GIS ecosystem. Tripp has taught numerous colleagues about ArcGIS Pro and its capabilities, bringing clarity and focus to complex GIS concepts. His engaging teaching style and comprehensive technical knowledge make this book both helpful and approachable for readers. Who is it for? This book is designed for GIS professionals, geospatial analysts, and technicians looking to expand their skills with ArcGIS Pro. It's well-suited for architects and specialists who want to visualize, analyze, and create GIS projects effectively. Beginner GIS users will find clear guidance without needing prior experience, and experienced ArcMap users will learn how to transition smoothly to ArcGIS Pro.

Building Information Modeling

2024-01-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Marie Bagieu , Régine Teulier

data data-engineering data-models

This book presents how Building Information Modeling (BIM) and the use of shared representation of built assets facilitate design, construction and operation processes (ISO 19650). The modeling of public works data disrupts the art of construction. Written by both academics and engineers who are heavily involved in the French research project Modélisation des INformations INteropérables pour les INfrastructues Durables (MINnD) as well as in international standardization projects, this book presents the challenges of BIM from theoretical and practical perspectives. It provides knowledge for evolving in an ecosystem of federated models and common data environments, which are the basis of the platforms and data spaces. BIM makes it possible to handle interoperability very concretely, using open standards, which lead to openBIM. The use of a platform allows for the merging of business software and for approaches such as a Geographic Information System (GIS) to be added to the processes. In organizations, BIM meets the life cycles of structures and circular economy. It is not only a technique that reshapes cooperation and trades around a digital twin but can also disrupt organizations and business models.

Geospatial Analysis with SQL

2023-10-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bonny P McClain

SQL data data-engineering geographic-information-system-gis geographic information system (gis) location-data

"Geospatial Analysis with SQL" is a practical guide that teaches you how to use SQL for geospatial data analysis. With direct, actionable guidance, you will learn to explore and analyze data using geospatial techniques without needing additional programming. This book equips you with the knowledge to solve location-based queries and perform advanced geospatial operations. What this Book will help me do Master the fundamentals of geospatial analysis and learn the importance of location-based data. Develop skills in creating and manipulating spatial database objects in SQL. Gain proficiency in using tools such as PostGIS and QGIS for geospatial data analysis. Learn techniques to visualize spatial data effectively and communicate results. Perform both single-layer and multi-layer spatial analysis for complex real-world scenarios. Author(s) Bonny P. McClain, the author of "Geospatial Analysis with SQL", brings extensive experience as a spatial data analyst and GIS expert. Bonny specializes in helping practitioners make data-driven insights through geospatial techniques. With a passion for teaching, Bonny's goal is to make complex concepts accessible and practical for analysts and developers alike. Who is it for? This book is ideal for GIS analysts, data analysts, and data scientists who have a basic understanding of SQL and geospatial concepts and want to expand their analytical capabilities. Readers looking to perform professional-grade geospatial analysis using SQL will find this book especially valuable. It caters to professionals wishing to use their SQL skills to understand and work with spatial datasets effectively.

US Army Corp of Engineers Enhanced Commerce & National Sec Through Data-Driven Geospatial Insight

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Jeff Mroz

AI/ML Analytics API Cloud Computing Data Engineering Data Governance Data Lake Data Lakehouse Data Management Data Quality Databricks Delta +12 more

The US Army Corps of Engineers (USACE) is responsible for maintaining and improving nearly 12,000 miles of shallow-draft (9'-14') inland and intracoastal waterways, 13,000 miles of deep-draft (14' and greater) coastal channels, and 400 ports, harbors, and turning basins throughout the United States. Because these components of the national waterway network are considered assets to both US commerce and national security, they must be carefully managed to keep marine traffic operating safely and efficiently.

The National DQM Program is tasked with providing USACE a nationally standardized remote monitoring and documentation system across multiple vessel types with timely data access, reporting, dredge certifications, data quality control, and data management. Government systems have often lagged commercial systems in modernization efforts, and the emergence of the cloud and Data Lakehouse Architectures have empowered USACE to successfully move into the modern data era.

This session incorporates aspects of these topics: Data Lakehouse Architecture: Delta Lake, platform security and privacy, serverless, administration, data warehouse, Data Lake, Apache Iceberg, Data Mesh GIS: H3, MOSAIC, spatial analysis data engineering: data pipelines, orchestration, CDC, medallion architecture, Databricks Workflows, data munging, ETL/ELT, lakehouses, data lakes, Parquet, Data Mesh, Apache Spark™ internals. Data Streaming: Apache Spark Structured Streaming, real-time ingestion, real-time ETL, real-time ML, real-time analytics, and real-time applications, Delta Live Tables. ML: PyTorch, TensorFlow, Keras, scikit-learn, Python and R ecosystems data governance: security, compliance, RMF, NIST data sharing: sharing and collaboration, delta sharing, data cleanliness, APIs.

Talk by: Jeff Mroz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Geospatial Data Analytics on AWS

2023-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeff DeMuth , Janahan Gnanachandran , Scott Bateman

AI/ML Analytics Athena AWS Cloud Computing Data Analytics Data Management Data Science QuickSight Redshift S3 Amazon SageMaker +5 more

In "Geospatial Data Analytics on AWS," you will learn how to store, manage, and analyze geospatial data effectively using various AWS services. This book provides insight into building geospatial data lakes, leveraging AWS databases, and applying best practices to derive insights from spatial data in the cloud. What this Book will help me do Design and manage geospatial data lakes on AWS leveraging S3 and other storage solutions. Analyze geospatial data using AWS services such as Athena and Redshift. Utilize machine learning models for geospatial data processing and analytics using SageMaker. Visualize geospatial data through services like Amazon QuickSight and OpenStreetMap integration. Avoid common pitfalls when managing geospatial data in the cloud. Author(s) Scott Bateman, Janahan Gnanachandran, and Jeff DeMuth bring their extensive experience in cloud computing and geospatial analytics to this book. With backgrounds in cloud architecture, data science, and geospatial applications, they aim to make complex topics accessible. Their collaborative approach ensures readers can practically apply concepts to real-world challenges. Who is it for? This book is ideal for GIS and data professionals, including developers, analysts, and scientists. It suits readers with a basic understanding of geographical concepts but no prior AWS experience. If you're aiming to enhance your cloud-based geospatial data management and analytics skills, this is the guide for you.

Geo at the time of AI | Javier de la Torre | Founder & CSO of CARTO

2023-05-24 · Spatial Data Science Conference 2023 Watch

video

by Javier de la Torre (CARTO)

AI/ML Data Science SQL

Javier de la Torre, Founder and CSO of CARTO, kicks off the Spatial Data Science Conference 2023 highlighting the nuances in geospatial current era of artificial intelligence. He demonstrates several uses such as using GPT4 to generate OpenStreetMap SQL queries to grab data and perform analysis, creating GIS systems based on prompts and more.

For more information, check out our website: https://carto.com/

Applied Geospatial Data Science with Python

2023-02-28 · O'Reilly Data Science Books O'Reilly Amazon

book

by David S. Jordan

AI/ML Analytics Data Science Python programming-languages software-development

"Applied Geospatial Data Science with Python" introduces readers to the power of integrating geospatial data into data science workflows. This book equips you with practical methods for processing, analyzing, and visualizing spatial data to solve real-world problems. Through hands-on examples and clear, actionable advice, you will master the art of spatial data analysis using Python. What this Book will help me do Learn to process, analyze, and visualize geospatial data using Python libraries. Develop a foundational understanding of GIS and geospatial data science principles. Gain skills in building geospatial AI and machine learning models for specific use cases. Apply geospatial data workflows to practical scenarios like optimization and clustering. Create a portfolio of geospatial data science projects relevant across different industries. Author(s) David S. Jordan is an experienced data scientist with years of expertise in GIS and geospatial analytics. With a passion for making complex topics accessible, David leverages his deep technical knowledge to provide practical, hands-on instruction. His approach emphasizes real-world applications and encourages learners to develop confidence as they work with geospatial data. Who is it for? This book is perfect for data scientists looking to integrate geospatial data analysis into their existing workflows, and GIS professionals seeking to expand into data science. If you already have a basic knowledge of Python for data analysis or data science and want to explore how to work effectively with geospatial data to drive impactful solutions, this is the book for you.

GIS Pipeline Acceleration with Apache Sedona

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Databricks PySpark

In CKDelta, we ingest and process a massive amount of geospatial data. Using Apache Sedona together with Databricks have accelerated our data pipelines many times.

In this talk, we'll talk about migrating the existing pipelines to Sedona + PySpark and the pitfalls we encountered along the way.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Python for ArcGIS Pro

2022-04-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by William Parker , Silas Toms

API Data Management NumPy Pandas Python arcgis data data-engineering geographic-information-system-gis location-data

Python for ArcGIS Pro is your guide to automating geospatial tasks and maximizing your productivity using Python. Inside, you'll learn how to integrate Python scripting into ArcGIS workflows to streamline map production, data analysis, and data management. What this Book will help me do Automate map production and streamline repetitive cartography tasks. Conduct geospatial data analysis using Python libraries like pandas and NumPy. Integrate ArcPy and ArcGIS API for Python to manage geospatial data more effectively. Create script tools to improve repeatability and manage datasets. Publish and manage geospatial data to ArcGIS Online seamlessly. Author(s) None Toms and None Parker are both experienced GIS professionals and Python developers. With years of hands-on experience using Esri technology in real-world scenarios, they bring practical insights into the application's nuances. Their collaborative approach allows them to demystify technical concepts, making their teachings accessible to audiences of all skill levels. Who is it for? This book is for ArcGIS users looking to integrate Python into workflows, whether you're a GIS specialist, technician, or analyst. It's also suitable for those transitioning to roles requiring programming skills. A basic understanding of ArcGIS helps, but the book starts from the fundamentals.

Practical SQL, 2nd Edition

2022-03-01 · O'Reilly SQL Books O'Reilly Amazon

book

by Anthony DeBarros

JSON Microsoft MySQL Oracle RDBMS SQL SQL Server postgresql

Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language), the standard programming language for defining, organizing, and exploring data in relational databases. Anthony DeBarros, a journalist and data analyst, focuses on using SQL to find the story within your data. The examples and code use the open-source database PostgreSQL and its companion pgAdmin interface, and the concepts you learn will apply to most database management systems, including MySQL, Oracle, SQLite, and others.* You’ll first cover the fundamentals of databases and the SQL language, then build skills by analyzing data from real-world datasets such as US Census demographics, New York City taxi rides, and earthquakes from US Geological Survey. Each chapter includes exercises and examples that teach even those who have never programmed before all the tools necessary to build powerful databases and access information quickly and efficiently. You’ll learn how to: •Create databases and related tables using your own data •Aggregate, sort, and filter data to find patterns •Use functions for basic math and advanced statistical operations •Identify errors in data and clean them up •Analyze spatial data with a geographic information system (PostGIS) •Create advanced queries and automate tasks This updated second edition has been thoroughly revised to reflect the latest in SQL features, including additional advanced query techniques for wrangling data. This edition also has two new chapters: an expanded set of instructions on for setting up your system plus a chapter on using PostgreSQL with the popular JSON data interchange format. Learning SQL doesn’t have to be dry and complicated. Practical SQL delivers clear examples with an easy-to-follow approach to teach you the tools you need to build and manage your own databases. * Microsoft SQL Server employs a variant of the language called T-SQL, which is not covered by Practical SQL.

PostGIS in Action, Third Edition

2021-09-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Leo S. Hsu , Regina Obe

JSON RDBMS SQL data data-engineering geographic-information-system-gis location-data postgis postgresql

In PostGIS in Action, Third Edition you will learn: An introduction to spatial databases Geometry, geography, raster, and topology spatial types, functions, and queries Applying PostGIS to real-world problems Extending PostGIS to web and desktop applications Querying data from external sources using PostgreSQL Foreign Data Wrappers Optimizing queries for maximum speed Simplifying geometries for greater efficiency PostGIS in Action, Third Edition teaches readers of all levels to write spatial queries for PostgreSQL. You’ll start by exploring vector-, raster-, and topology-based GIS before quickly progressing to analyzing, viewing, and mapping data. This fully updated third edition covers key changes in PostGIS 3.1 and PostgreSQL 13, including parallelization support, partitioned tables, and new JSON functions that help in creating web mapping applications. About the Technology PostGIS is a spatial database extender for PostgreSQL. It offers the features and firepower you need to take on nearly any geodata task. PostGIS lets you create location-aware queries with a few lines of SQL code, then build the backend for mapping, raster analysis, or routing application with minimal effort. About the Book PostGIS in Action, Third Edition shows you how to solve real-world geodata problems. You’ll go beyond basic mapping, and explore custom functions for your applications. Inside this fully updated edition, you’ll find coverage of new PostGIS features such as PostGIS Window functions, parallelization of queries, and outputting data for applications using JSON and Vector Tile functions. What's Inside Fully revised for PostGIS version 3.1 and PostgreSQL 13 Optimize queries for maximum speed Simplify geometries for greater efficiency Extend PostGIS to web and desktop applications About the Reader For readers familiar with relational databases and basic SQL. No prior geodata or GIS experience required. About the Authors Regina Obe and Leo Hsu are database consultants and authors. Regina is a member of the PostGIS core development team and the Project Steering Committee. Quotes The best introduction I’ve seen for engineers who want to get ramped up quickly and build advanced GIS applications. - Ikechukwu Okonkwo, Orum.io A wealth of information that showcases how powerful PostGIS is. - Luis Moux-Dominguez, EMO An extraordinary book for the world of GIS. Truly learned a lot! - DeUndre’ Rushon, DigiDiscover LLC Gives you insight into how best to provide map services for a wide audience. - Marcus Brown, Enel Green Power

Learning ArcGIS Pro 2 - Second Edition

2020-07-24 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Tripp Corbin, GISP

Python arcgis data data-engineering geographic-information-system-gis location-data

Learning ArcGIS Pro 2 is your comprehensive guide to mastering the capabilities of ArcGIS Pro for geospatial analysis and cartography. You'll learn to create both 2D and 3D maps, edit and visualize geospatial data, and automate workflows using Python and ModelBuilder. This book provides the foundational skills you need to effectively work with GIS data and projects. What this Book will help me do Navigate the ArcGIS Pro interface to create, analyze, and share GIS projects efficiently. Visualize and interpret geographic data using 2D and 3D mapping techniques. Use Arcade language to customize labels and symbology for better map clarity. Automate GIS workflows through Python scripts and ModelBuilder for increased efficiency. Create and share professional-quality map layouts and series with ease. Author(s) Tripp Corbin, GISP, is a GIS Professional with extensive experience in geographic data analysis and ArcGIS software. As a seasoned instructor and author, Tripp aims to make GIS accessible by breaking down complex topics into manageable concepts. His hands-on teaching approach is reflected throughout this book, providing clear guidance and practical knowledge. Who is it for? This book is ideal for beginner GIS enthusiasts or professionals looking to transition to ArcGIS Pro. It is well-suited for those with minimal exposure to GIS or no prior experience with ArcGIS software. Whether you aim to explore geospatial concepts or acquire skills for professional applications, this book provides a solid foundation.

Geographical Modeling

2020-04-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Denise Pumain

arcgis data data-engineering geographic-information-system-gis location-data

The modeling of cities and territories has progressed greatly in the last 20 years. This is firstly due to geographic information systems, followed by the availability of large amounts of georeferenced data – both on the Internet and through the use of connected objects. In addition, the rise in performance of computational methods for the simulation and exploration of dynamic models has facilitated advancement. Geographical Modeling presents previously unpublished information on the main advances achieved by these new approaches. Each of the six chapters builds a bibliographic review and precisely describes the methods used, highlighting their advantages and discussing their interpretations. They are all illustrated by many examples. The book also explains with clarity the theoretical foundations of geographical analysis, the delicate operations of model selection, and the applications of fractals and scaling laws. These applications include gaining knowledge of the morphology of cities and the organization of urban transport, and finding new methods of building and exploring simulation models and visualizations of data and results.

Geospatial Data Science Quick Start Guide

2019-05-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jayakrishnan Vijayaraghavan , Abdishakur Hassan

Data Science Python data data-engineering geographic-information-system-gis geographic information system (gis) location-data

"Geospatial Data Science Quick Start Guide" provides a practical and effective introduction to leveraging geospatial data in data science. In this book, you will learn techniques for analyzing location-based data, building intelligent models, and performing geospatial operations for various applications. What this Book will help me do Understand the principles and techniques for analyzing geospatial data. Set up Python tools to work effectively with location intelligence. Perform advanced spatial operations such as geocoding and proximity analysis. Develop systems such as geofencing and location-based recommendation engines. Obtain actionable insights by visualizing and processing spatial data effectively. Author(s) Abdishakur Hassan and Jayakrishnan Vijayaraghavan are experts in geospatial analysis. With extensive experience in applying data science to location intelligence, they bring a practical and hands-on approach to coding, teaching, and problem-solving. They are passionate about sharing their knowledge through their clear explanations and structured learning paths. Who is it for? This book is ideal for data scientists interested in integrating geospatial analysis into their models and workflows. It is also suitable for GIS developers looking to enhance existing systems with advanced data analysis capabilities. Readers should have experience with Python and a basic understanding of data science concepts. If location-based data intrigues you, this book is your guide.

Learn D3.js

2019-05-03 · O'Reilly Data Science Books O'Reilly Amazon

book

by Helder Da Rocha

DataViz HTML JavaScript d3 data data-science data-science-tasks data-visualization

Dive into the world of data visualization with 'Learn D3.js'. This comprehensive guide introduces D3.js-the leading JavaScript library for creating interactive, data-driven visualizations on the web. By following practical examples, you'll understand core concepts of D3.js, learn to implement various types of visualizations, and develop skills to bring dynamic, responsive graphics to your projects. What this Book will help me do Master the fundamentals of D3.js and use it to produce stunning web-based data visualizations. Bind data to the DOM using D3.js and configure interactive transitions and animations. Gain experience generating a multitude of chart types such as bar, pie, scatter charts, and more. Incorporate user interactivity into your visualizations using D3.js effectively. Work with map-based data visualizations using GIS data and various geographical projections. Author(s) Helder da Rocha is an experienced developer and educator with a passion for data visualization. With a solid background in JavaScript and web technologies, he has crafted this book to make the complexities of D3.js accessible and engaging. His approach emphasizes practical, hands-on learning, nurturing both new and seasoned developers alike. Who is it for? Are you a web developer, designer, or data scientist aiming to create interactive data visualizations for the web? If you have foundational knowledge of HTML, CSS, and JavaScript, this book is your perfect guide. Whether you're dipping your toes into web-based charts or seeking to craft advanced interactive graphics, 'Learn D3.js' is tailored to empower your journey.

Mastering Geospatial Development with QGIS 3.x - Third Edition

2019-03-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Luigi Pirelli , Richard Smith Jr., GISP , Simon Miles , Kurt Menke, GISP , John Van Hoesen, GISP , Shammunul Islam

Python data data-engineering geographic-information-system-gis geographic information system (gis) location-data

This book, "Mastering Geospatial Development with QGIS 3.x", is your comprehensive guide to becoming skilled in QGIS, an open-source GIS software. Covering functionalities of QGIS 3.4 and 3.6, you will advance your knowledge in spatial data analysis, styling, and spatial database management through practical examples and in-depth discussions. What this Book will help me do Understand the latest features and updates in QGIS 3.6. Master spatial data styling for impactful geographic visualizations. Learn to create and manage spatial databases and GeoPackages. Automate workflows using QGIS's graphical modeler and Python scripting. Develop custom QGIS plugins to extend its capabilities. Author(s) This book is written by a team of GIS experts with extensive experience in spatial data analysis and QGIS. Authors include professionals with GISP credentials who have taught GIS at various levels. With their deep understanding of QGIS and practical teaching approach, they aim to make premium GIS knowledge accessible to all. Who is it for? The book is ideal for GIS professionals seeking to enhance their QGIS expertise. Beginners looking to establish a firm foundation in GIS and QGIS will also benefit. Developers interested in extending QGIS capabilities using Python will find invaluable guidance here. Whether for career growth, project management, or academic purposes, this book suits users aspiring to excel in geospatial development.

Cleaning And Curating Open Data For Archaeology

2019-02-04 · Data Engineering Podcast Listen

podcast_episode

by Eric Kansa (Open Context) , Tobias Macey

Cloud Computing Data Engineering Data Management ETL/ELT GCP Linux Python postgresql

Summary Archaeologists collect and create a variety of data as part of their research and exploration. Open Context is a platform for cleaning, curating, and sharing this data. In this episode Eric Kansa describes how they process, clean, and normalize the data that they host, the challenges that they face with scaling ETL processes which require domain specific knowledge, and how the information contained in connections that they expose is being used for interesting projects.

Introduction

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes, or Google Play Music, tell your friends and co-workers, and share it on social media. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Eric Kansa about Open Context, a platform for publishing, managing, and sharing research data

Interview

Introduction

How did you get involved in the area of data management?

I did some database and GIS work for my dissertation in archaeology, back in the late 1990’s. I got frustrated at the lack of comparative data, and I got frustrated at all the work I put into creating data that nobody would likely use. So I decided to focus my energies in research data management.

Can you start by describing what Open Context is and how it started?

Open Context is an open access data publishing service for archaeology. It started because we need better ways of dissminating structured data and digital media than is possible with conventional articles, books and reports.

What are your protocols for determining which data sets you will work with?

Datasets need to come from research projects that meet the normal standards of professional conduct (laws, ethics, professional norms) articulated by archaeology’s professional societies.

What are some of the challenges unique to research data?

What are some of the unique requirements for processing, publishing, and archiving research data?

You have to work on a shoe-string budget, essentially providing "public goods". Archaeologists typically don’t have much discretionary money available, and publishing and archiving data are not yet very common practices.

Another issues is that it will take a long time to publish enough data to power many "meta-analyses" that draw upon many datasets. The issue is that lots of archaeological data describes very particular places and times. Because datasets can be so particularistic, finding data relevant to your interests can be hard. So, we face a monumental task in supplying enough data to satisfy many, many paricularistic interests.

How much education is necessary around your content licensing for researchers who are interested in publishing their data with you?

We require use of Creative Commons licenses, and greatly encourage the CC-BY license or CC-Zero (public domain) to try to keep things simple and easy to understand.

Can you describe the system architecture that you use for Open Context?

Open Context is a Django Python application, with a Postgres database and an Apache Solr index. It’s running on Google cloud services on a Debian linux.

Wh

talk-data.com

Activity Trend

Top Events

Top Speakers

Collaborative GIS editing in JupyterLab

Sponsored by: Deloitte | Analyzing Geospatial Data at Scale in Databricks for Environment & Agriculture

GIS For Dummies, 2nd Edition

ArcGIS Pro 3.x Cookbook - Second Edition

Building Information Modeling

Geospatial Analysis with SQL

US Army Corp of Engineers Enhanced Commerce & National Sec Through Data-Driven Geospatial Insight

Geospatial Data Analytics on AWS

Geo at the time of AI | Javier de la Torre | Founder & CSO of CARTO

Applied Geospatial Data Science with Python

GIS Pipeline Acceleration with Apache Sedona

Python for ArcGIS Pro

Practical SQL, 2nd Edition

PostGIS in Action, Third Edition

Learning ArcGIS Pro 2 - Second Edition

Geographical Modeling

Geospatial Data Science Quick Start Guide

Learn D3.js

Mastering Geospatial Development with QGIS 3.x - Third Edition

Cleaning And Curating Open Data For Archaeology