Loading data from Sql Server to Solr with a Data Import Handler

Here we will see how to import data in Solr with an existing Sql Server Database to achieve powerful search. Apache Solr is undoubtedly one of the best open source enterprise search engine which is based on Lucene. In most of the cases solr would not be your primary data storage location as it is mainly used as search platform. Solr provides you the ability to perform complex full-text search with real-time indexing and fast performance. To import data from Sql Server Database you need to use Apache Solr’s Data Import Handler.

The Data Import handler is a way of importing data from a database using JDBC drivers and indexing it.

Prerequisites to import data would be:

  • An running instance of solr.
  • An sample core created with basic configuration.

The below example is for Full import Data from Sql Server Database to Solr.

  • MS SQL Server connector

You need to download Microsoft JDBC Driver 4.0 for SQL Server from:

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=11774

Copy file ‘sqljdbc4.jar’ to ‘lib’ folder

  • Solrconfig.xml configuration

Add the following configuration lines to solrconfig.xml which will instruct Solr to load data import handler and SQL Server JDBC driver jar files.

 <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-\d.*\.jar" />  
  <lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-extras-\d.*\.jar" />  

Add the following configuration lines to solrconfig.xml which will instruct Solr to register the Data Import handler.

 <requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">  
             <lst name="defaults">  
                  <str name="config">data-config.xml</str>  
             </lst>  
  </requestHandler>  

Note: Config parameter is required as it specifies the location where Data Import Handler configuration file is located. This file contains details of how to connect to data source using JDBC, how to fetch data and how to process it to generate Solr documents that will be indexed.

  • Data-config.xml configuration

Go to <local path>\solr-6.1.0\server\solr\<your core name>\conf and create a new file data-config.xml. Following is the configuration which will be saved in this file. This configuration tells which entity to get the data from and it’s fields that should be saved in solr.

 <dataConfig>   
  <dataSource type="JdbcDataSource"    
     driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"    
     url="jdbc:sqlserver://servername\instancename;databaseName=mydb"    
     user="<username>"    
     password="<password>"   
  />   
  <document>   
   <entity name="Employee"    
   pk="id"   
   query="select Empid,Empname,EmpAge,EmpDoj,Dept_id from Employee">   
     <field column="Empid" name="id"/>   
     <field column="Empname" name="Empname"/>     
     <field column="EmpAge" name="EmpAge"/>    
     <field column="EmpDoj" name="EmpDoj"/>    
         <entity name="Employee_Department"    
              query="select Deptname from Department where Deptid='${Employee.Dept_id}'">   
                 <field column="Deptname" name="Deptname"/>                  
         </entity>   
   </entity>   
  </document>   
  </dataConfig>   
  • Scheme.xml configuration
    Make sure you have your unique key id configured in schema file as below.
 <uniqueKey>id</uniqueKey>  

Add required fields in schema file which you have defined in Data-config.xml.

 <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>  
  <field name="Empname" type="text_general" multiValued="false" indexed="true" stored="true"/>  
  <field name="EmpAge" type="long" indexed="true" stored="true" multiValued="false"/>  
  <field name="EmpDoj" type="date" multiValued="false" indexed="true" stored="true"/>  
  <field name="Deptname" type="text_general" multiValued="false" indexed="true" stored="true"/>  

This completes all configuration that would be required for data import. Now we need to open solr admin panel to access the Dataimport section of the core you created and execute the import handler.

solr import data

You can view your imported records in solr admin panel.

In next blog we will add the ability to import data incrementally.

Advertisements

Common Table Expression – Sql Server

Here we will see the basics and implementation of CTE in sql server. A common table expression (CTE) is a temporary result set which will be accessible within the next execution scope of a query. The Common Table Expression was introduced in the SQL Server 2005. It basically defines a temporary view which can be referenced in the same query just as a view. A CTE is similar to a derived table but is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.

Syntax of CTE is as follows:

WITH CTEName (Column aliases)
AS (Subquery)
SELECT statement
FROM CTEName;

The select statement must be very next to the CTE and the name is mandatory while the argument is optional.
Continue reading Common Table Expression – Sql Server

Pass DataTable To Store Procedure Using Table Value Parameter.

Here we will see how to pass a datatable as parameter to stored procedure in C# using the sql server user defined table type. One of the way to speed up your application is to reduce the trips to your database especially by avoiding a call to stored procedure multiple times.

Table-valued parameters let you pass a collection of table rows as a parameter to your stored procedure. They allow passing entire sets of rows from ADO.Net code to SQL Server.

Below we will see the steps that needs to be followed to implement this functionality.

Continue reading Pass DataTable To Store Procedure Using Table Value Parameter.

Difference Between DDL, DML and DCL Commands.

DDL
Data Definition Language (DDL) statements are used to define the database structure or schema. Some examples:

CREATE – to create objects in the database
ALTER – alters the structure of the database
DROP – delete objects from the database
TRUNCATE – remove all records from a table, including all spaces allocated for the records are removed
COMMENT – add comments to the data dictionary
RENAME – rename an object
DML
Data Manipulation Language (DML) statements are used for managing data within schema objects. Some examples:

SELECT – retrieve data from the a database
INSERT – insert data into a table
UPDATE – updates existing data within a table
DELETE – deletes all records from a table, the space for the records remain
MERGE – UPSERT operation (insert or update)
CALL – call a PL/SQL or Java subprogram
EXPLAIN PLAN – explain access path to data
LOCK TABLE – control concurrency
DCL
Data Control Language (DCL) statements. Some examples:

GRANT – gives user’s access privileges to database
REVOKE – withdraw access privileges given with the GRANT command

TCL
Transaction Control (TCL) statements are used to manage the changes made by DML statements. It allows statements to be grouped together into logical transactions.

COMMIT – save work done
SAVEPOINT – identify a point in a transaction to which you can later roll back
ROLLBACK – restore database to original since the last COMMIT
SET TRANSACTION – Change transaction options like isolation level and what rollback segment to use