Generate Avro Schema from Database

In this article, we will see an approach to creating an Avro schema from SQL database. I shall be using a simple C# .NET Core-based application.

Avro is an open-source schema specification for data serialization that provides serialization and data exchange services for Apache Hadoop.

For more details on Avro please visit the article Avro schemas with example

Avro is a language-agnostic format that can be used for any language that facilitates the exchange of data between programs.

Today in this article, we will cover below aspects,

Getting Started

Create any .NET or .NET Core application.

Please install System.Data.SqlClient Nuget package

PM> Install-Package System.Data.SqlClient -Version 4.8.2

Also please add below using to your module,

using System.Data.SqlClient;

Connect to SQL Table

Example:

 
using (SqlConnection _con = new SqlConnection(connectionString))
            {
                string queryStatement = "SELECT * FROM dbo.employee ORDER BY employeeID";

                using (SqlCommand _cmd = new SqlCommand(queryStatement, _con))
                {
                    employeeTable = new DataTable("Employees");

                    SqlDataAdapter _dap = new SqlDataAdapter(_cmd);

                    _con.Open();
                    _dap.Fill(employeeTable);
                    _con.Close();
                }
            }

Convert DataTable to JSON

Please use Newtonsoft to serialize DataTable to JSON string as below,

string JsonString = JsonConvert.SerializeObject(employeeTable);

C# Type from JSON

Create a type from Sample JSON. You can create C# classes from JSON schema using Visual Studio Paste special utility easily.

Or

You can use NJsonschema to create classes or any other available methods.

 public class Employees
    {
        public string FirstName { get; set; }
        public string LastName { get; set; }
        public string Address { get; set; }
        public string ZipCode { get; set; }
        public string State { get; set; }
        public string Country { get; set; }
        public string EmployeeID { get; set; }
    }

Install Microsoft.Hadoop.Avro

Please install Microsoft.Hadoop.Avro using Nuget package.

Specify AvroSerializerSettings using AvroPublicMemberContractResolver while creating the schema.

AvroSerializerSettings settings = new AvroSerializerSettings();
settings.Resolver = new AvroPublicMemberContractResolver();
var result = AvroSerializer.Create<Employee>(settings).WriterSchema.ToString();

The generated Avro schema for simple schema is as below,

{
  "type": "record",
  "name": "AvrosampleNetCore.Employees",
  "fields": [
    {
      "name": "FirstName",
      "type": "string"
    },
    {
      "name": "LastName",
      "type": "string"
    },
    {
      "name": "Address",
      "type": "string"
    },
    {
      "name": "ZipCode",
      "type": "string"
    },
    {
      "name": "State",
      "type": "string"
    },
    {
      "name": "Country",
      "type": "string"
    },
    {
      "name": "EmployeeID",
      "type": "string"
    }
  ]
}

Generate Avro schema for complex or Array or List types

Also, you can generate Avro schema for generic complex or Array or List types as below,

AvroSerializerSettings settings = new AvroSerializerSettings();
settings.Resolver = new AvroPublicMemberContractResolver();
var result = AvroSerializer.Create<Employee[]>(settings).WriterSchema.ToString();
{
  "type": "array",
  "items": {
    "type": "record",
    "name": "AvrosampleNetCore.Employees",
    "fields": [
      {
        "name": "FirstName",
        "type": "string"
      },
      {
        "name": "LastName",
        "type": "string"
      },
      {
        "name": "Address",
        "type": "string"
      },
      {
        "name": "ZipCode",
        "type": "string"
      },
      {
        "name": "State",
        "type": "string"
      },
      {
        "name": "Country",
        "type": "string"
      },
      {
        "name": "EmployeeID",
        "type": "string"
      }
    ]
  }
}

If no settings is specified Avro uses Binary Data contract serialization settings by default.

Do you have any comments or ideas or any better suggestions to share?

Please sound off your comments below.

Happy Coding !!



Please bookmark this page and share it with your friends. Please Subscribe to the blog to receive notifications on freshly published(2024) best practices and guidelines for software design and development.



Leave a Reply

Your email address will not be published. Required fields are marked *