Generate Avro Schema from JSON
In this article, we will see an approach on how to create Avro schema using the JSON object/data/file.
I shall be using .NET Core C# based application.
Avro is an open-source schema specification for data serialization that provides serialization and data exchange services for Apache Hadoop.
For more details on Avro please visit the article Avro schemas with example.
Avro is a language-agnostic format that can be used for any language that facilitates the exchange of data between programs.
Today in this article, we will cover below aspects,
We shall be using below JSON file below and will be converting it into Avro schema.
For .NET support and tooling, I did not find any free or open-source preferred utility. Most of the available tools available were either licensed however I wanted something easy and simple technique.
I found that we can very much do a type conversion manually/using tools then followed by schema generation Microsoft.Hadoop.Avro.
Sample JSON data/file as below,
{
"AccountId": 1212312,
"AccountName": "TheCodebuzz",
"SubAccounts": [
{
"AccountId": 34535,
"AccountType": "Saving"
}
]
}
C# Type from JSON
Create a type from Sample JSON. You can create C# classes from JSON schema using Visual Studio Paste special utility easily.
Or
You can use NJsonschema to create classes or any other available methods.
Install Microsoft.Hadoop.Avro
Please install Microsoft.Hadoop.Avro using the Nuget package.
Specify AvroSerializerSettings using AvroPublicMemberContractResolver while creating the schema.
AvroSerializerSettings settings = new AvroSerializerSettings();
settings.Resolver = new AvroPublicMemberContractResolver();
var result = AvroSerializer.Create<AccountDetails>(settings).WriterSchema.ToString();
This option does not require you to specify the [DataContract] or [DataMember] attribute to your class/type definition.
The generated Avro schema is as below,
{
"type": "record",
"name": "AvrosampleNetCore.AccountDetails",
"fields": [
{
"name": "AccountId",
"type": "int"
},
{
"name": "AccountName",
"type": "string"
},
{
"name": "Accounts",
"type": [
"null",
{
"type": "array",
"items": {
"type": "record",
"name": "AvrosampleNetCore.SubAccounts",
"fields": [
{
"name": "AccountId",
"type": "int"
},
{
"name": "AccountType",
"type": [ "null", "string" ]
}
]
}
}
]
}
]
}
Binary serialization for Avro
If your type has [DataContract] attributes with properties defined as [DataMember] then you can use the below way to generate the schema from your class,
In the above case please Specify AvroSerializerSettings using AvroDataContractResolver while creating the schema.
AvroSerializerSettings settings = new AvroSerializerSettings();
settings.Resolver = new AvroDataContractResolver();
var result = AvroSerializer.Create<AccountDetails>(settings).WriterSchema.ToString();
Or
var result = AvroSerializer.Create<AccountDetails>().WriterSchema.ToString();
If no settings are specified Avro uses Binary Data contract serialization settings by default.
Do you have any other better approach to deal with schema?
Do you have any comments or ideas or any better suggestions to share?
Please sound off your comments below.
Happy Coding !!
Please bookmark this page and share it with your friends. Please Subscribe to the blog to receive notifications on freshly published(2024) best practices and guidelines for software design and development.