Java- Apache Spark Connect to MongoDB

Java Apache Spark Connect to MongoDB

Today in this article, we will see how to perform Java- Apache Spark Connect to MongoDB with examples.

We will make use of MongoDB Spark Connector which helps connect to MongoDB and allows us to read and write data between Java and MongoDB.

Here’s a basic example demonstrating how to read data from MongoDB, perform a transformation, and then write the results back to MongoDB.

We have below MongoDB collection,

Install MongoDB Connector for Apache Spark in JAVA

You can download the connector from the official MongoDB website or use Maven/Gradle to include it as a dependency in your project.

Please install MongoDB Spark Connector using the below command.

Gradle Command

implementation 'org.mongodb.spark:mongo-spark-connector_2.12:3.0.1' // Replace with the latest version

Maven Dependency:

<dependency>
    <groupId>org.mongodb.spark</groupId>
    <artifactId>mongo-spark-connector_2.12</artifactId>
    <version>3.0.1</version>
</dependency>

Please replace the above using the latest version as supported

For more details please visit the official website: https://www.mongodb.com/products/spark-connector

Create a SparkSession Object

Create a SparkSession Object to use for read and write operations,

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.*;
import org.bson.Document;
import java.util.HashMap;
import java.util.Map;


public class SparkMongoDBTheCodeBuzz {
    public static void main(String[] args) {
        // Configure Spark
        SparkConf sparkConf = new SparkConf()
                .setAppName("MongoDB Spark Connector Example")
                .setMaster("local[*]")
                .set("spark.mongodb.input.uri", "mongodb://host/TheCodeBuzz.Orders"")
                .set("spark.mongodb.output.uri", "mongodb://host/TheCodeBuzz.Orders"");

       // Create a SparkContext
        JavaSparkContext jsc = new JavaSparkContext(sparkConf);

        // Create a SparkSession
        SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate();





We create a Spark session with the MongoDB connector configurations.

We will use the above SparkSession object to read, and write data to MongoDB, etc.

Read Data from MongoDB using Spark

You must specify the following configuration settings to read from MongoDB:

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.*;
import org.bson.Document;
import java.util.HashMap;
import java.util.Map;


public class SparkMongoDBTheCodeBuzz {
    public static void main(String[] args) {
        // Configure Spark
        SparkConf sparkConf = new SparkConf()
                .setAppName("MongoDB Spark Connector Example")
                .setMaster("local[*]")
                .set("spark.mongodb.input.uri", "mongodb://host/TheCodeBuzz.Orders"")
                .set("spark.mongodb.output.uri", "mongodb://host/TheCodeBuzz.Orders"");

      // Create a SparkContext
        JavaSparkContext jsc = new JavaSparkContext(sparkConf);

        // Create a SparkSession
        SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate();

        // Read data from MongoDB
        JavaRDD<Document> mongoRDD = MongoSpark.load(jsc);

        // Perform transformations or analyses as needed
        // For example, you can convert the RDD to a DataFrame
        Dataset<Row> mongoDF = spark.createDataFrame(mongoRDD, Document.class);

        // Show the data
        mongoDF.show();
}

Write data to MongoDB using Java Spark

To write data from MongoDB, call the write function on your SparkSession object.

You must specify the following configuration settings to read from MongoDB,

data = [
  {

    "Order": "journal2",
    "qty": 25,
    "books": [
      "white"
    ],
    "domain": [
      14
    ],
    "Link": "https://www.thecodebuzz.com/order/23456"
  },
  {

    "Order": "journal1",
    "qty": 25,
    "books": [
      "black"
    ],
    "domain": [
      15
    ],
    "Link": "https://www.thecodebuzz.com/order/2324"
  }
]

Let’s save this JSON file in any temporary location and access it



       // Assume df is a DataFrame with the data you want to write to MongoDB
        Dataset<Row> df = spark.read().json("path/to/json/file");

        // Write data to MongoDB
        MongoSpark.write(df).option("TheCodeBuzz", "Orders").mode("append").save();

        // Stop the Spark session
        spark.stop();


That’s all! Happy coding!

Does this help you fix your issue?

Do you have any better solutions or suggestions? Please sound off your comments below.



Please bookmark this page and share it with your friends. Please Subscribe to the blog to receive notifications on freshly published(2024) best practices and guidelines for software design and development.



Leave a Reply

Your email address will not be published. Required fields are marked *