Loading data from Sql Server to Solr with a Data Import Handler

Here we will see how to import data in Solr with an existing Sql Server Database to achieve powerful search. Apache Solr is undoubtedly one of the best open source enterprise search engine which is based on Lucene. In most of the cases solr would not be your primary data storage location as it is mainly used as search platform. Solr provides you the ability to perform complex full-text search with real-time indexing and fast performance. To import data from Sql Server Database you need to use Apache Solr’s Data Import Handler.

The Data Import handler is a way of importing data from a database using JDBC drivers and indexing it.

Prerequisites to import data would be:

  • An running instance of solr.
  • An sample core created with basic configuration.

The below example is for Full import Data from Sql Server Database to Solr.

  • MS SQL Server connector

You need to download Microsoft JDBC Driver 4.0 for SQL Server from:

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774

Copy file ‘sqljdbc4.jar’ to ‘lib’ folder

  • Solrconfig.xml configuration

Add the following configuration lines to solrconfig.xml which will instruct Solr to load data import handler and SQL Server JDBC driver jar files.

 <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-\d.*\.jar" />  
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-extras-\d.*\.jar" />  

Add the following configuration lines to solrconfig.xml which will instruct Solr to register the Data Import handler.

 <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">  
             <lst name="defaults">  
                  <str name="config">data-config.xml</str>  
             </lst>  
  </requestHandler>  

Note: Config parameter is required as it specifies the location where Data Import Handler configuration file is located. This file contains details of how to connect to data source using JDBC, how to fetch data and how to process it to generate Solr documents that will be indexed.

  • Data-config.xml configuration

Go to <local path>\solr-6.1.0\server\solr\<your core name>\conf and create a new file data-config.xml. Following is the configuration which will be saved in this file. This configuration tells which entity to get the data from and it’s fields that should be saved in solr.

 <dataConfig>   
  <dataSource type="JdbcDataSource"    
     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"    
     url="jdbc:sqlserver://servername\instancename;databaseName=mydb"    
     user="<username>"    
     password="<password>"   
  />   
  <document>   
   <entity name="Employee"    
   pk="id"   
   query="select Empid,Empname,EmpAge,EmpDoj,Dept_id from Employee">   
     <field column="Empid" name="id"/>   
     <field column="Empname" name="Empname"/>     
     <field column="EmpAge" name="EmpAge"/>    
     <field column="EmpDoj" name="EmpDoj"/>    
         <entity name="Employee_Department"    
              query="select Deptname from Department where Deptid='${Employee.Dept_id}'">   
                 <field column="Deptname" name="Deptname"/>                  
         </entity>   
   </entity>   
  </document>   
  </dataConfig>   
  • Scheme.xml configuration
    Make sure you have your unique key id configured in schema file as below.
 <uniqueKey>id</uniqueKey>  

Add required fields in schema file which you have defined in Data-config.xml.

 <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>  
  <field name="Empname" type="text_general" multiValued="false" indexed="true" stored="true"/>  
  <field name="EmpAge" type="long" indexed="true" stored="true" multiValued="false"/>  
  <field name="EmpDoj" type="date" multiValued="false" indexed="true" stored="true"/>  
  <field name="Deptname" type="text_general" multiValued="false" indexed="true" stored="true"/>  

This completes all configuration that would be required for data import. Now we need to open solr admin panel to access the Dataimport section of the core you created and execute the import handler.

solr import data

You can view your imported records in solr admin panel.

In next blog we will add the ability to import data incrementally.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s