Java- Apache Spark Connect to MongoDB
Today in this article, we will see how to perform Java- Apache Spark Connect to MongoDB with examples.
We will make use of MongoDB Spark Connector which helps connect to MongoDB and allows us to read and write data between Java and MongoDB.
Here’s a basic example demonstrating how to read data from MongoDB, perform a transformation, and then write the results back to MongoDB.
We have below MongoDB collection,
Install MongoDB Connector for Apache Spark in JAVA
You can download the connector from the official MongoDB website or use Maven/Gradle to include it as a dependency in your project.
Please install MongoDB Spark Connector using the below command.
Gradle Command
implementation 'org.mongodb.spark:mongo-spark-connector_2.12:3.0.1' // Replace with the latest version
Maven Dependency:
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.12</artifactId>
<version>3.0.1</version>
</dependency>
Please replace the above using the latest version as supported
For more details please visit the official website: https://www.mongodb.com/products/spark-connector
Create a SparkSession Object
Create a SparkSession Object to use for read and write operations,
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.*;
import org.bson.Document;
import java.util.HashMap;
import java.util.Map;
public class SparkMongoDBTheCodeBuzz {
public static void main(String[] args) {
// Configure Spark
SparkConf sparkConf = new SparkConf()
.setAppName("MongoDB Spark Connector Example")
.setMaster("local[*]")
.set("spark.mongodb.input.uri", "mongodb://host/TheCodeBuzz.Orders"")
.set("spark.mongodb.output.uri", "mongodb://host/TheCodeBuzz.Orders"");
// Create a SparkContext
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
// Create a SparkSession
SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate();
We create a Spark session with the MongoDB connector configurations.
We will use the above SparkSession
object to read, and write data to MongoDB, etc.
Read Data from MongoDB using Spark
You must specify the following configuration settings to read from MongoDB:
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.*;
import org.bson.Document;
import java.util.HashMap;
import java.util.Map;
public class SparkMongoDBTheCodeBuzz {
public static void main(String[] args) {
// Configure Spark
SparkConf sparkConf = new SparkConf()
.setAppName("MongoDB Spark Connector Example")
.setMaster("local[*]")
.set("spark.mongodb.input.uri", "mongodb://host/TheCodeBuzz.Orders"")
.set("spark.mongodb.output.uri", "mongodb://host/TheCodeBuzz.Orders"");
// Create a SparkContext
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
// Create a SparkSession
SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate();
// Read data from MongoDB
JavaRDD<Document> mongoRDD = MongoSpark.load(jsc);
// Perform transformations or analyses as needed
// For example, you can convert the RDD to a DataFrame
Dataset<Row> mongoDF = spark.createDataFrame(mongoRDD, Document.class);
// Show the data
mongoDF.show();
}
Write data to MongoDB using Java Spark
To write data from MongoDB, call the write function on your SparkSession object.
You must specify the following configuration settings to read from MongoDB,
data = [
{
"Order": "journal2",
"qty": 25,
"books": [
"white"
],
"domain": [
14
],
"Link": "https://www.thecodebuzz.com/order/23456"
},
{
"Order": "journal1",
"qty": 25,
"books": [
"black"
],
"domain": [
15
],
"Link": "https://www.thecodebuzz.com/order/2324"
}
]
Let’s save this JSON file in any temporary location and access it
// Assume df is a DataFrame with the data you want to write to MongoDB
Dataset<Row> df = spark.read().json("path/to/json/file");
// Write data to MongoDB
MongoSpark.write(df).option("TheCodeBuzz", "Orders").mode("append").save();
// Stop the Spark session
spark.stop();
That’s all! Happy coding!
Does this help you fix your issue?
Do you have any better solutions or suggestions? Please sound off your comments below.
Please bookmark this page and share it with your friends. Please Subscribe to the blog to receive notifications on freshly published(2024) best practices and guidelines for software design and development.