talk-data.com talk-data.com

Topic

JSON Schema

schemas data_validation data_modelling

8

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

8 activities · Newest first

Advanced JSON Schema handing and Event Demuxing

This session explores advanced JSON Schema handing(inference and evolving), and event DemuxingTopics include: How from_json is currently used today and its challenges. How to use Variant for rapidly changing schema. How from_json in Lakeflow Declarative Pipelines with primed schema helps simplify schema handling. Demultiplexing patterns for scalable stream processing. Simply event Demuxing with Lakeflow Declarative Pipelines.

Using JSON schema to set the (dbt) stage for product analytics - Coalesce 2023

Surfline uses Segment to collect product analytics events to understand how surfers use their forecasts and live surf cameras across 9000+ surf spots worldwide. An open source tool was developed to define and manage product analytics event schemas using JSON schema which are used to build dbt staging models for all events.

With this solution, the data team has more time to build intermediate and mart models in dbt, knowing that our staging layer fully reflects Surfline’s product analytics events. This presentation is a real-life example on how schemas (or data contracts) can be used as a medium to build consensus, enforce standards, improve data quality, and speed up the dbt workflow for product analytics.

Speaker: Greg Clunies, Senior Analytics Engineer, Surfline

Register for Coalesce at https://coalesce.getdbt.com/

Getting jiggy with jsonschema: The power of contracts for building data systems

Is your SQL query the problem, or how you ask for the data you need, when you need it. In this deep dive, Jake Thomas shares his hypothesis for why the jsonschema is the ticket to contract-driven communication, system interoperability, and an overall improvement to data processing quality of life.

Check the slides here: https://docs.google.com/presentation/d/1kiGyQF7NUWfx-5RyIyeEwSUCwqtIdrXADeI2iixUgiI/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Summary

Using a multi-model database in your applications can greatly reduce the amount of infrastructure and complexity required. ArangoDB is a storage engine that supports documents, dey/value, and graph data formats, as well as being fast and scalable. In this episode Jan Steeman and Jan Stücke explain where Arango fits in the crowded database market, how it works under the hood, and how you can start working with it today.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Jan Stücke and Jan Steeman about ArangoDB, a multi-model distributed database for graph, document, and key/value storage.

Interview

Introduction How did you get involved in the area of data management? Can you give a high level description of what ArangoDB is and the motivation for creating it?

What is the story behind the name?

How is ArangoDB constructed?

How does the underlying engine store the data to allow for the different ways of viewing it?

What are some of the benefits of multi-model data storage?

When does it become problematic?

For users who are accustomed to a relational engine, how do they need to adjust their approach to data modeling when working with Arango? How does it compare to OrientDB? What are the options for scaling a running system?

What are the limitations in terms of network architecture or data volumes?

One of the unique aspects of ArangoDB is the Foxx framework for embedding microservices in the data layer. What benefits does that provide over a three tier architecture?

What mechanisms do you have in place to prevent data breaches from security vulnerabilities in the Foxx code? What are some of the most interesting or surprising uses of this functionality that you have seen?

What are some of the most challenging technical and business aspects of building and promoting ArangoDB? What do you have planned for the future of ArangoDB?

Contact Info

Jan Steemann

jsteemann on GitHub @steemann on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

ArangoDB Köln Multi-model Database Graph Algorithms Apache 2 C++ ArangoDB Foxx Raft Protocol Target Partners RocksDB AQL (ArangoDB Query Language) OrientDB PostGreSQL OrientDB Studio Google Spanner 3-Tier Architecture Thomson-Reuters Arango Search Dell EMC Google S2 Index ArangoDB Geographic Functionality JSON Schema

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

JSON at Work

JSON is becoming the backbone for meaningful data interchange over the internet. This format is now supported by an entire ecosystem of standards, tools, and technologies for building truly elegant, useful, and efficient applications. With this hands-on guide, author and architect Tom Marrs shows you how to build enterprise-class applications and services by leveraging JSON tooling and message/document design. JSON at Work provides application architects and developers with guidelines, best practices, and use cases, along with lots of real-world examples and code samples. You’ll start with a comprehensive JSON overview, explore the JSON ecosystem, and then dive into JSON’s use in the enterprise. Get acquainted with JSON basics and learn how to model JSON data Learn how to use JSON with Node.js, Ruby on Rails, and Java Structure JSON documents with JSON Schema to design and test APIs Search the contents of JSON documents with JSON Search tools Convert JSON documents to other data formats with JSON Transform tools Compare JSON-based hypermedia formats, including HAL and jsonapi Leverage MongoDB to store and access JSON documents Use Apache Kafka to exchange JSON-based messages between services

Summary

Yelp needs to be able to consume and process all of the user interactions that happen in their platform in as close to real-time as possible. To achieve that goal they embarked on a journey to refactor their monolithic architecture to be more modular and modern, and then they open sourced it! In this episode Justin Cunningham joins me to discuss the decisions they made and the lessons they learned in the process, including what worked, what didn’t, and what he would do differently if he was starting over today.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at www.dataengineeringpodcast.com/linode?utm_source=rss&utm_medium=rss and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Justin Cunningham about Yelp’s data pipeline

Interview with Justin Cunningham

Introduction How did you get involved in the area of data engineering? Can you start by giving an overview of your pipeline and the type of workload that you are optimizing for? What are some of the dead ends that you experienced while designing and implementing your pipeline? As you were picking the components for your pipeline, how did you prioritize the build vs buy decisions and what are the pieces that you ended up building in-house? What are some of the failure modes that you have experienced in the various parts of your pipeline and how have you engineered around them? What are you using to automate deployment and maintenance of your various components and how do you monitor them for availability and accuracy? While you were re-architecting your monolithic application into a service oriented architecture and defining the flows of data, how were you able to make the switch while verifying that you were not introducing unintended mutations into the data being produced? Did you plan to open-source the work that you were doing from the start, or was that decision made after the project was completed? What were some of the challenges associated with making sure that it was properly structured to be amenable to making it public? What advice would you give to anyone who is starting a brand new project and how would that advice differ for someone who is trying to retrofit a data management architecture onto an existing project?

Keep in touch

Yelp Engineering Blog Email

Links

Kafka Redshift ETL Business Intelligence Change Data Capture LinkedIn Data Bus Apache Storm Apache Flink Confluent Apache Avro Game Days Chaos Monkey Simian Army PaaSta Apache Mesos Marathon SignalFX Sensu Thrift Protocol Buffers JSON Schema Debezium Kafka Connect Apache Beam

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Introduction to JavaScript Object Notation

What is JavaScript Object Notation (JSON) and how can you put it to work? This concise guide helps busy IT professionals get up and running quickly with this popular data interchange format, and provides a deep understanding of how JSON works. Author Lindsay Bassett begins with an overview of JSON syntax, data types, formatting, and security concerns before exploring the many ways you can apply JSON today. From Web APIs and server-side language libraries to NoSQL databases and client-side frameworks, JSON has emerged as a viable alternative to XML for exchanging data between different platforms. If you have some programming experience and understand HTML and JavaScript, this is your book. Learn why JSON syntax represents data in name-value pairs Explore JSON data types, including object, string, number, and array Find out how you can combat common security concerns Learn how the JSON schema verifies that data is formatted correctly Examine the relationship between browsers, web APIs, and JSON Understand how web servers can both request and create data Discover how jQuery and other client-side frameworks use JSON Learn why the CouchDB NoSQL database uses JSON to store data

Implementing IBM CICS JSON Web Services for Mobile Applications

This IBM® Redbooks® publication provides information about how you can connect mobile devices to IBM Customer Information Control System (CICS®) Transaction Server (CICS TS), using existing enterprise services already hosted on CICS, or to develop new services supporting new lines of business. This book describes the steps to develop, configure, and deploy a mobile application that connects either directly to CICS TS, or to CICS via IBM Worklight® Server. It also describes the advantages that your organization can realize by using Worklight Server with CICS. In addition, this Redbooks publication provides a broad understanding of the new CICS architecture that enables you to make new and existing mainframe applications available as web services using JavaScript Object Notation (JSON), and provides support for the transformation between JSON and application data. While doing so, we provide information about each resource definition, and its role when CICS handles or makes a request. We also describe how to move your CICS applications, and business, into the mobile space, and how to prepare your CICS environment for the following scenarios: Taking an existing CICS application and exposing it as a JSON web service Creating a new CICS application, based on a JSON schema Using CICS as a JSON client This Redbooks publication provides information about the installation and configuration steps for both Worklight Studio and Worklight Server. Worklight Studio is the Eclipse interface that a developer uses to implement a Worklight native or hybrid mobile application, and can be installed into an Eclipse instance. Worklight Server is where components developed for the server side (written in Worklight Studio), such as adapters and custom server-side authentication logic, run. CICS applications and their associated data constitute some of the most valuable assets owned by an enterprise. Therefore, the protection of these assets is an essential part of any CICS mobile project. This Redbooks publication, after a review of the main mobile security challenges, outlines the options for securing CICS JSON web services, and reviews how products, such as Worklight and IBM DataPower®, can help. It then shows examples of security configurations in CICS and Worklight.