Wednesday, October 13, 2010

Log4Net performance issue with WCF

We ran into an interesting production problem recently that had an interesting resolution. After upgrading our application server to use WCF calls, we found that performance was lagging in production. We were able to replicate this problem in our development environments and spent a good deal of time trying to resolve the issue.  Avenues we explored included profiling calls to the databases (using Sql Server Profiler and filtering on database name and login Id), mining activity logs in the database, pouring over configuration files, utilizing WiresharkSysInternals Process Explorer , SysInternals ProcessMonitor (and filtering by processName and PID), and the Windows PerfMon (performance Monitor - viewing the defaults and various performance objects).

Interesting things I learned/discovered were:
- IE browsers (IE 7 and older) default to only two connections to the web server See this link
- It seems IE processes javascript before rendering a page while Firefox process javascript after the page is rendered in some cases
- The default for WCF connections to the database is serializable rather the read committed!
- WCF connections default to some low threasholds for ConcurrentCalls, ConcurrentSessions, and ConcurrentInstances.  These can be changed using the serviceThrottling node and attributes of a service behavior you create.  See this link for more detail
- In dos, you can filter a netstat to only show specific ports and statuses like this:
netstat /n | find "ESTABLISHED" | find "8085"
- Use /n with netstat to not resolve dns names - without /n, dns name resolution can make netstat take a while.
- In dos, you can use the 'tasklist' command to show something akin to a process list in the Task Manager
- Retrieve IIS process names from the command line by executing this command:
cscript c:\WINDOWS\system32\iisapp.vbs

In the end, the resolution to our problem was to turn off a custom implementation of Log4Net that had running.  In the process monitor we saw that log4net was re-initializing, re-reading it's configs, and re-creating all the log files for EVERY page request.  We discovered this because it seemed that there was a several second lag time between when the request was executed on the browser and when the web server actually acted on it.  Disabling log4net entirely improved our performance by a significant amount. (10's of seconds)

Update - Nov 26, 2010.  One of the reasons we were running into the issue above is the log4net configuration loader was placing a lock (being very pessimistic) on the configuration file every time it was loading.  Agreed, this solution should likely have been using a static logger - one instance for the application.  However, given the way it was implemented, I'm thinking that we could have avoided a bunch of hurt if we had made the configuration loader a little more optimistic and used:
<lockingModel type="log4net.Appender.FileAppender+MinimalLock" />

No comments: