Author Archive

Enabling Danish for SQL Server FullText

Monday, August 9th, 2010

SQL Server FullText enables you to search large amount of strings fast, and it is easy to use. It hasn’t changed much since SQL Server 2000.
A simple getting started tutorial can be found on Code Project.

SQL Server FullText is easy to use in applications requiring string searching.

The Danish, Polish and Turkish wordbreaker and stemmer implementations for SQL Server FullText is not developed by Microsoft and therefore not enabled by default. The libraries are however part of the installation process and are therefore present on disk.

To make use of the Danish language capabilities in SQL Server 2008, register the libraries in registry and reload the FullText languages:

  1. Download & run the DanishFulltext.reg file on the server. It will register wordbreaker, stemmer and default location of the thesaurus xml file.
  2. Run the exec sp_fulltext_service ‘update_languages’ in a Management Studio.

Now verify that Danish is enabled with this query: SELECT name FROM sys.fulltext_languages

Note: The DanishFullText.reg assumes that SQL Server is a default instance (not a named instance). If not, modify the file by changing the MSSQL10.MSSQLSERVER to the instance name.

It is the same case with Polish and Turkish – they are not registered by default. See more in the MSDN article How to: Load Licensed Third-Party Word Breakers.

List of out of the box SQL Server 2008 FullText supported languages: Arabic, Bengali (India), Brazilian, British English, Bulgarian, Catalan, Chinese (Hong Kong SAR, PRC), Chinese (Macau SAR), Chinese (Singapore), Croatian, Danish, Dutch, English, French, German, Gujarati, Hebrew, Hindi, Icelandic, Indonesian ,Italian, Japanese, Kannada, Korean, Latvian, Lithuanian, Malay – Malaysia, Malayalam, Marathi, Neutral, Norwegian (Bokmål), Polish, Portuguese, Punjabi, Romanian, Russian, Serbian (Cyrillic), Serbian (Latin), Simplified Chinese, Slovak, Slovenian, Spanish, Swedish, Tamil, Telugu, Thai, Traditional Chinese, Turkish, Ukrainian, Urdu, Vietnamese.

Java 4-ever

Sunday, July 4th, 2010

I find this video hilarious…

You should use the best tools at hand to solve the problem. That said; choosing between Java or .Net doesn’t really matter in most cases. There are however some areas where Java is a better choice and vice versa.

I can’t wait to see it in the cinema :-)

PS. I do develop with Java even though I do not blog much about it.

Update: YouTube removed the video due to copyright claims. You can still see it JavaZone.

Meeting the SQL Azure Development Team

Wednesday, June 30th, 2010

Last week I was at Microsoft HQ in Redmond, WA, USA. I was invited by the SQL Azure Development Team to look at some of the new unreleased features and comment on features in their roadmap.

Unfortunately most of the content was confidential, meaning that I was under NDA, so I may not disclose any details. Sorry :-/

During the week with the SQL Azure Development Team I was fortunate to be engaged in technical detailed discussion about some of the upcoming feature releases – mainly discussing the SQL Server features not currently available in SQL Azure. It was interesting and enlightening at the same time to discuss their technical challenges and why they have build SQL Azure the way they have.

All in all, my conclusion after this event is that Microsoft takes SQL Azure seriously and it will become a major player in the RDBMS world. It will not just be a SQL Server in the cloud, but a separate product with different market segments and different features. I am looking forward to a bright future with SQL Azure :-)

Check for breaking changes in APIs

Tuesday, June 8th, 2010

Have you ever had the need to compare interfaces of two versions of the same framework?

If you have, then ApiChange is a tool for you. It’s open source, powerful and easy to use :-)

I gave it a spin comparing current trunk version 2.9.2 of Lucene.Net with the latest official release version 2.4.0.

I downloaded ApiChange and ran the following command in a command prompt:

ApiChange.exe -Diff -old C:\Lucene.Net_2_4_0\Lucene.Net.dll -new C:\trunk\Lucene.Net.dll

The output lists all the differences, but here is a summary:

  • 23 public types where removed
  • 96 public types where added
  • 158 public types where changed

Cool little tool with other features such as:

  • Diff public types for breaking changes.
  • Who uses a method?
  • Who uses a type?
  • Who uses implements an interface?
  • Who references me?
  • What format has the binary (32/64, Managed C++, Pure IL, Unmanaged)?
  • Search for all event subscribers and unsubscribers.

It’s based on Mono Cecil – a free IL parser, and not reflection as I initial thought. Go check it out…

Levels of reuse in Software Development

Tuesday, June 1st, 2010

One of the promises of object-orientation is reuse. Developing new software systems is expensive, and maintaining them is even more expensive. Reuse is therefore sensible in both business and technology perspectives.

With assistance of Erich Gamma, I have identified four levels of reuse.

First level of reuse: Copy/paste

Duplicating code or functionality makes it easy to reuse it. It’s a real timesaver at first, but keeping all the duplicates up-to-date and maintaining them is horrifying task. Not to mention the problems when forgetting to update one or more duplicates…

“Copy and paste programming is a pejorative term to describe highly repetitive computer programming code apparently produced by copy and paste operations. It is frequently symptomatic of a lack of programming competence, or an insufficiently expressive development environment, as subroutines or libraries would normally be used instead. In certain contexts it has legitimate value, if used with care.” Wikipedia

Second level of reuse: Class libraries

Reuse at class level or a set of classes in a software library is common and also fairly easy with object-oriented languages.

“Libraries contain code and data that provide services to independent programs. This allows the sharing and changing of code and data in a modular fashion. Some executables are both standalone programs and libraries, but most libraries are not executables …” Wikipedia

Third level of reuse: Design Patterns

Patterns allow you to reuse design ideas and concepts independent of concrete code.

“In software engineering, a design pattern is a general reusable solution to a commonly occurring problem in software design. A design pattern is not a finished design that can be transformed directly into code. It is a description or template for how to solve a problem that can be used in many different situations. Object-oriented design patterns typically show relationships and interactions between classes or objects, without specifying the final application classes or objects that are involved.” Wikipedia

Fourth level of reuse: Frameworks

An object-oriented abstract design to solve a specific problem – often very specialized, like Unit Testing frameworks and Object-Relational Mapping frameworks, but can be large, complex or domain specific.

“A software framework … is an abstraction in which common code providing generic functionality can be selectively overridden or specialized by user code providing specific functionality. Frameworks are a special case of software libraries in that they are reusable abstractions of code wrapped in a well-defined API, yet they contain some key distinguishing features that separate them from normal libraries.” Wikipedia

It’s all about being pragmatic – not all software will reach fourth level of reuse and will be structured as frameworks – frankly it shouldn’t. That said; copy/past style development is unquestionably a wrong path.

What level is your company at?

Ageing pictogram

Wednesday, May 19th, 2010

I’m in Prague, Czech for the Apache Lucene EuroCon 2010; wandered around, where I saw this drawing on a house wall.

I find it hilarious – especially the natural shadow over the coffins. It’s just by pure coincidence that I was there, at the time of day where the doorway cast its shadow over the coffins :-)

Finding Missing Indexes with SQL Server DMVs

Monday, May 10th, 2010

Finding Missing Indexes with DMVsSome time ago I wrote written about easy index wins for SQL Server 2005.

SQL server maintains statistics about indexes you should consider creating. This time I’ll show you a DMV (Dynamic Management View) that lists index candidates. This method works for SQL Server 2005 SP2 and later versions.

The query is based on three DMVs and returns index candidates where the calculated improvement is more than 10%:

SELECT
  migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure_pct,
  QUOTENAME(db_name(mid.database_id)) AS [database],
  QUOTENAME(OBJECT_SCHEMA_NAME(mid.object_id, mid.database_id)) AS [schema],
  QUOTENAME(OBJECT_NAME(mid.object_id, mid.database_id)) AS [table],
  'CREATE INDEX [missing_index_' + CONVERT(varchar(64), NEWID()) + ']'
  + ' ON ' + mid.statement
  + ' (' + ISNULL (mid.equality_columns, '')
  + CASE
      WHEN mid.equality_columns IS NOT NULL
	    AND mid.inequality_columns IS NOT NULL THEN ','
      ELSE ''
    END
  + ISNULL(mid.inequality_columns, '')
  + ')'
  + ISNULL(' INCLUDE (' + mid.included_columns + ')', '')
	  AS create_index_statement,
  migs.*,
  mid.database_id,
  mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
  INNER JOIN sys.dm_db_missing_index_group_stats migs
	ON migs.group_handle = mig.index_group_handle
  INNER JOIN sys.dm_db_missing_index_details mid
	ON mig.index_handle = mid.index_handle
WHERE
	migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) *
		(migs.user_seeks + migs.user_scans) > 10
ORDER BY
	migs.avg_total_user_cost * migs.avg_user_impact *
		(migs.user_seeks + migs.user_scans) DESC

It is important to note, that these are index candidates are only candidates and the improvements are based on estimates. The estimated improvement does not take extra disk space requirements and the maintenance of the indexes during updates, inserts and deletes. Furthermore it does not make recommendation about usage of clustered or non-clustered indexes.

This blog post is inspired by Bart Duncan’s Are you using SQL’s missing index DMVs?

Visual Studio 2010 keyboard shortcuts

Sunday, May 2nd, 2010

I am fond of keyboard shortcuts and wish I was a keyboard-shortcut-ninja and didn’t have to rely on the mouse all the time.

Using keyboard shortcuts boosts productivity and ergonomically a better choice, as the risk of getting a tennis elbow/mouse elbow diminish.

Source of keyboard shortcuts for Visual Studio 2010:

Removing SVN folders with PowerShell

Saturday, April 24th, 2010


I need to remove.svn folders from an existing Visual Studio Solution a customer email me, so I could commit it to another SVN repository.

If I had access to the original SVN repository, I could have used the export function, as it does not include the .svn folders – but no, it should not be that easy.

What the heck, I have been putting it off way too long to start working with PowerShell. It should be a familiar environment as it is object-oriented with a C# like syntax with full access to the .Net Framework Base Class Libraries (BCL).

Here it goes – my first PowerShell script…

function RemoveSvnFolders([string]$path)
{
    Write-Host "Removing .svn folders in path $path recursive"

	Get-ChildItem $path -Include ".svn" -Force -Recurse |
		Where {$_.psIsContainer -eq $true} |
		Foreach ($_)
		{
			Remove-Item $_.Fullname -Force -Recurse
		}
}

The Write-Host Cmdlet just writes the content to console window.

If you are like me, a PowerShell novice – start with the Getting Started with Windows PowerShell article and use the free tool PowerGUI from Quest Software. It’s PowerShell IDE with an integrated syntax highlighter editor and debugger.

In line 5 the Get-ChildItem Cmdlet iterates the path recursively and filtering the result to include only “.svn” files and folders. The force parameter allows the cmdlet to get items that cannot otherwise be accessed by the user, such as hidden or system files. Get-ChildItem Cmdlet can also iterate the registry.

Afterwards the result from Get-ChildItem Cmdlet is piped to the Where-Object Cmdlet (Where is an alias for Where-Object). The psIsContainer is a property on a folder. If it is equal to true pass it to the next pipe. I could have written the following instead:

Where {$_.mode -match "d"}

Use the below statement to list all properties for the files and folders in the current folder:

Get-ChildItem | format-list -property *

The foreach statement iterates every item and deletes the folder with the Remove-Item Cmdlet.

Calling the method is as simple as:

RemoveSvnFolders("c:\svn\My Solution")

On TechNet there is a myriad of articles with the root Windows PowerShell Core and more task oriented like A Task-Based Guide to Windows PowerShell Cmdlets and Piping and the Pipeline in Windows PowerShell.

Remove SVN folders PowerShell Script.

Happy PowerShelling… :-)

Miracle Open World 2010 Lucene Presentation

Tuesday, April 20th, 2010

The conference is over and it was a great success. I meet a lot of new people and had lots of technical discussions about .Net, graph databases, freetext search, SQL Server, Oracle Service Bus, debugging with WinDbg and extensions.

The slides and demo code for my Lucene session is available here:

My session “Making freetext search with Lucene.Net work for you” abstract:

Lucene is an open source full-featured text search engine library, making searching in large amounts of text lightning fast. Lucene are in use by many large sites like Wikipedia, LinkedIn, MySpace etc.

It is easy to get started with Lucene, but there are many pitfalls… In this session you will learn about the do’s and don’t’s for indexing and searching, tools, scaling, new features in version 2.9 and some of the more advanced features.

This presentation will use the Microsoft .Net implementation of Lucene named Lucene.Net, but the content of this presentation applies for ported versions of Lucene.