Monday, January 02, 2006

Distributed Queries

Introduction

Many times we face the scenario where we have to execute some query on more than one database. One such example may be you have separate databases for each regional office and corporate office. At the end of year, you want o compile accounts and the effect of each regional office should reflect in these accounts. So you have to query each regional database but how to do this.

Well it is not so difficult. SQL server provides us system stored procedures and SQL functions that allow us to perform such operations.

Using the code

It is really easy to perform operations on different databases using distributed queries. Distributed queries provide SQL Server users with access to:
  • Distributed data stored in multiple instances of SQL Server.
  • Heterogeneous data stored in various relational and non-relational data sources accessed using an OLE DB provider.

Distributed queries can allow users to access another data source (for example, files, non-relational data sources such as Active Directory™, other SQL Servers and so on) using the security context of the Microsoft Windows account under which the SQL Server service is running. SQL Server 2000 impersonates the login appropriately for Windows NT logins; however, that is not possible for SQL Server logins.

Tables and views in external data sources can be referenced directly in SELECT, INSERT, UPDATE, and DELETE Transact-SQL statements. Because distributed queries use OLE DB as the underlying interface, distributed queries can access traditional relational DBMS systems with SQL query processors, as well as data managed by data sources of varying capabilities and sophistication. As long as the software owning the data exposes it in a tabular rowset through an OLE DB provider, the data can be used in distributed queries.

The simple way of performing a select operation against different data sources is:

Exec sp_addlinkedserver @server='Test', @srvproduct='', @provider='SQLOLEDB', @datasrc='192.168.1.254', @catalog='LuckyERP_LAL_Live'

Exec sp_serveroption 'Test', 'data access', 'true'

Exec sp_addlinkedsrvlogin 'Test', 'true'

Select * From OPENQUERY(Test, 'SELECT * FROM dbo.SalesContractHeader')

Where

@server: Is the label for the data-source that will be used to refer to it.

@srvproduct: Is the product name of the OLEDB data-source to be linked as the linked server. In case of SQL Server, you can either set it as ‘SQL Server’, in that case you can’t specify other parameters or you can leave it empty and set other parameters like catalog name and the IP address or host name of the SQL Server.

@provider: Is the name of the OLEDB provider for the data-source.

@datasrc: Is the name of the datasource as interpreted by the OLEDB provider. It can be either IP address of the server or its host name.

@catalog: Is the name of the catalog. In case of SQL Server, it is the name of the database to be linked to.

Executing sp_serveroption is optional. It is used to configure linked server. You can refer msdn for details. In our case, I enable ‘data access’ to linked server. The default behavior also allows data access to linked server. So doing this is not necessary.

Sp_addlinkedsvrlogin creates or updates a mapping between logins on the local instance of Microsoft SQL Server and remote logins on the linked server. For details refer msdn. The first parameter of this stored procedure is the local name of the remote server as specified in sp_addlinkedserver. The second parameter is @useself. A value of true specifies that SQL server authenticated logins use their own credentials to connect to remote server. In case the credentials are different then use false and specify locallogin, remote login and remote password after @useself parameter as 3rd, 4th and 5th parameters.

Using this stored procedure is also optional as default behavior is to use SQL server authenticated logins’ credentials to connect to remote server.

Now after all things are properly configured, we come to actually executing a distributed query. Will it be different than executing ordinary select or DML queries? I don’t think so. There is only a slight difference. Use OPENQUERY to execute the specified pass through query on the given linked server and here we are done with executing a distributed query.

Points of Interest

Have you notice that it is not necessary that the linked server is always SQL Server. Yup you can use any OLEDB source as the data-source in distributed queries including SQL Server, Oracle, MS-Access, ODBC data-source, File system of Indexing Service, MS-Excel Sheets and IBM DB2 Databases. Wow what a long list and you can also develop your own data-source by implementing OLEDB interface. Thanks to Microsoft SQL we are able to perform operations on heterogeneous data stored in such diversified data-sources.