Dr. GUI is NOT Dr. Database

.NET database and distributed computing tools

BFC | Visual Studio | Database Technology | Distributed Computing | Database Classes

The Base One Foundation Component Library (BFC)

How Base One's Scroll Cache simplifies database browsing & improves performance

>> Why does the "Scroll Cache" technology used for improving grid control performance require adding or changing indexes?

Base One's database model takes into account the special place of indexing in guaranteeing high-performance in large database applications. The index dictionary helps to clarify which queries can be done efficiently and which ones can't. Indexing, access path selection, and the judicious use of redundant data are NOT left as an after-thought, but are considered to be critical ingredients in how queries should be handled.

Scroll Cache supports scrolling forwards and backwards through a result set. It is efficient for moving back and forth a record at a time and a page at a time, as well as for going to the last record and the first one. The architecture is scalable because large-scale usage is assumed, but wrapped in a way that makes it easy to use on a small-scale.

Good performance is achieved by combining a data structure with well planned indexes and components that "force" the database system to use those indexes. Scroll Cache automatically forces index searches and prevents sorts, the single most important cause of poor performance in large database applications. Programmers specify queries in the simplest way, and these are automatically adjusted to improve performance dramatically.

>> What's so hard about programming Scroll Cache behavior yourself, if it just uses regular SQL to do its magic?

A Scroll Cache record set is a "database buffer" (area of main memory) designed to provide high efficiency scrolling of data by retrieving a batch of records at a time from the database and caching them locally on the client machine or application server. Data needed for several rapid scrolling operations, one after the other, can automatically be cached locally (by retrieving a larger block of records than is displayed at one time).

Scroll Cache releases all locks as soon as its buffer is full, which increases concurrency between multiple users. By freeing resources after every operation and by insuring that indexes are used and sorts avoided, the load on the database server can be reduced greatly.

Scroll Cache automatically constructs and executes a new query when additional data (outside of what is in the local cache) is needed to fulfill a scroll request. At the same time, Scroll Cache limits the sort order of the result set that may be requested to indexed orderings that can be generated quickly. So, programming is greatly simplified, and the database interface takes care of the details of leveraging the indexes that the programmer has described in the data dictionary.

Old stuff, but still relevant:

MSDN News magazine from Microsoft regularly features a column called "Ask Dr.GUI" where answers are given for various technical questions. The column is usually good, very informative and useful for the developer community.

However in the September/October 2001 issue of MSDN News in the column "Ask Dr.GUI #60" there is a simple question regarding the best way to browse through an SQL result set. Dr. GUI's proposed solution is both inefficient and impractical!

The question posed is important and a common one that is relevant to most database applications and illustrates why Base One's Scroll Cache database class is needed. Here's the extract from Dr. GUI's column in MSDN (Microsoft Developer Network):

Jumping Records

Dear Dr. GUI:

I am developing a Web application using Active Server Pages (ASP). I want to display a list of items that come from a database and I need to divide the list into pages. For each page, the user requests that I open an ADO recordset, divide it into pages, and send back the appropriate page. The problem in this process is that for each page view, my database (SQL Server) has to return a recordset with all records and the Internet Information Services (IIS) displays one page out of it. I was wondering if there was a better way to make SQL Server only return the records I need. I know that to get the first 25 records (page #1), I can use SELECT TOP 25, but how can I get records 26-50 (page #2)? Do I need to do SELECT TOP 50 and jump to record #26?

Dan Avni

Dr. GUI replies:

All this jumping back and forth feels like a game of hopscotch to the good doctor. Let's get back to having both feet planted firmly on the ground and answer your question.

You can certainly use the SELECT TOP 25 FROM Table for the first page of 25 rows. For the next 25 rows, if you query the database again and jump to record #26, the data might have changed by that time and record #26 may no longer be the appropriate row. Further, if you do not use the ORDER BY clause in the select statements, then the order in which the rows are listed might not be consistent if there is no clustered index on the table. Ideally, SQL Server should return the entire recordset and the selection of pages should be handled at the ADO level. If the number of rows being returned to IIS is very big, then you can use a Select Into statement to store all the rows from the initial Select statement in a temporary table. Then you can use a cursor to browse through the temporary table, in chunks of 25 rows at a time. Please note: it is important that you properly manage transactions (preferably not using a transaction in this case) and also the connection to SQL Server. Otherwise the preceding solution could lead to blocking of resources.

Flaws in the solution suggested by Dr. GUI

"You can certainly use the SELECT TOP 25 FROM TABLE …"

This SQL query will work only with Microsoft SQL Server and not with other database systems because TOP 25 is nonstandard SQL.

"Ideally, SQL Server should return the entire recordset…"

What if the entire result set consists of hundreds of thousands of records? The more users the worse this "solution" becomes. The system would crawl. If the entire result set is returned just to display a few pages every application would crawl because of network and server overload resulting from repeatedly fetching and transferring large amounts of unneeded data.

"You can use a Select Into statement…"

SELECT INTO needs an output table name, which is the start of a complex mess. What table name should be used? If a table name is programmatically created, who drops the table? If the table gets dropped after the browsing is finished, what happens if the program exits abnormally and the table does not get dropped? Using "temporary" tables leads to complexity and nonstandard SQL. Once again, the more users, the worse it gets, since each user might need to create a table using SELECT INTO. That means estimating a potentially large, disk space requirements and coding for out-of-space conditions. If the same table is shared, all users would be required to compromise on the timeliness of the data and might have to accept old data created by the very first user. You get the idea. Using SELECT INTO creates extra programming work and adds needless database overhead and disk requirements – when all you want to do is scroll around and browse data.

"Please note: it is important that you properly manage transactions…"

"… (preferably not using a transaction in this case)"

Dr.GUI warns that programmers must use great care in transaction processing settings to avoid locking up the database. Transaction processing exists to protect the integrity, reliability, and recoverability of many database applications, most of which also need to incorporate a browsing feature. Certainly, Dr. GUI can't mean that mixing transaction processing and browsing in a single application is prohibited?

Wouldn't it be better if the transaction processing settings for browsing could be taken care of automatically? Browsing though a result set usually requires a standard set of scrolling operations, such as: moving page by page, going to the last page, and skipping to a page containing a particular value. In the GUI world, users love to browse with table controls (grids) and scroll bars. In web applications, users should be able to scroll using hyperlinks, such as Next Page, Previous Page, First Page and Last Page.

In Dr.GUI's scheme, all database applications that require transaction processing also require expert custom programming to prevent slow speed and occasional deadlocks. Programmers might get to discover performance problems and bugs related to transaction processing only late in the testing process. Even worse, it may be the users who discover these problems in production. Not so with Base One's Scroll Cache components.

Scroll Cache and the ideal way to implement browsing

Now, let us examine features users need for browsing through a result set page by page:

Next Page / Previous Page feature should be present. When Next Page is clicked the next page of data in the logical order to the current page should get displayed. Similarly the Previous Page also should function.
First Page / Last Page feature should be present. If Last Page is chosen then the last page of the result set should be displayed. At any point if the First Page is chosen then the very first page of the result set should be displayed.
Next Record / Previous Record or Next Line / Previous Line feature should be present. That means the current page of data should scroll up or down by one row.
At any given time, all the rows on the whole page of data currently displayed should be in sync, a valid snapshot of the database at an instant in time. In addition, it should be easy for programmers to guarantee each page shows the most current data. That means for example if Next Page is chosen and if no action has been done on the database then the next logical page of data in a given order should get displayed. If some other users have added new data or have deleted or changed any existing data then that change should get reflected as the Next Page of data is fetched. Same thing applies all other browsing operations such as Previous Page, First Page, Last Page, Next Line, Previous Line, etc.
It should be easy for programmers to insure that no data is stored locally or in a temporary table. Proper, current data pages should be easy to obtain afresh from the database. For example if the same current page is refreshed and if no changes have occurred in the database then the same page gets displayed. If any changes such as addition of new records that belong to the current page or change or deletion of records belonging to current page have taken place then the refresh operation should reflect those changes.
The programmer should be able to easily control exactly the number of rows to be fetched during display processing. No unnecessary rows should be fetched or transmitted.
Page fetching operation whether it is first page, last page, previous or next page it should take constant time. For example fetching the last page need not have to take a long time even if the result set consists of millions of records.
No locks should be held or no resources should get blocked waiting on some user action while the browsing is in progress. Immediately after the desired page of data (not the entire result set) is obtained from the database, all resources should be released.
Browsing operation and all other database operations should take place under a valid transaction mode. Any dynamic changes that are made to the database by all the users using the same database should be visible whenever a new data page is obtained or when the current page is refreshed. Once a page is displayed, the user should be allowed to change or delete the records from the displayed page. In the meantime if any other user modifies the same record, then the user should get notified of the change. After seeing the modification done by some other user, the user should be allowed to retry the change or delete operation. Whenever a user changes or deletes a record or set of records in the current page and the transaction is committed, the current page should reflect those change when it gets refreshed. In the same way if any new records are added that belong to the current page, they should appear when the current page gets refreshed.

All of these features are completely provided by the Scroll Cache Record Set class of Base One’s Foundation Component Library (BFC). Scroll Cache is a C++ class in the Database Library that can be used by any Visual C++ application that interacts with IBM, Microsoft, MySQL, Oracle and Sybase database systems. Scroll Cache is also available with COM and .NET interfaces, with C++, C#, VB, VB.NET, ASP, and ASP.NET samples showing usage of this database component.

BFC provides efficient, elegant and easy-to-implement solutions that address the complete range of database related issues. Scroll Cache is just one example of the components that the BFC toolkit provides for all sorts of database applications, big or small, web application or client server, 2-tier or 3-tier.

Bottom line: Scroll Cache speeds up queries and simplifies programming. It greatly reduces the load on the database server by freeing resources after every database operation and by insuring indexes are used - to avoid sorts. The result is greatly increased scalability - far more simultaneous end-users without degrading performance.

BFC | Visual Studio | Database Technology | Distributed Computing | Database Classes


Home	Products	Consulting	Case Studies	Order	Contents	Contact	About Us

Copyright © 2012, Base One International Corporation