php strip_tags problem

I found that using php function, strip_tags, does not remove all the markup elements correctly from the subject content.  First of all, if an anchor link includes a line break, it will not be removed correctly.  Also, the style information is not properly removed as well.  In the following script, I also added in the regex to remove any content in between script tags, but that may or may not be necessary.

function strip_all_tags($content)
{
$content = preg_replace(’/\n/’,’ ‘,$content);
$content = preg_replace(’/<script.*<\/script>/U’,’ ‘,$content);
$content = preg_replace(’/<style.*<\/style>/U’,’ ‘,$content);
$content = strip_tags(strtolower($content));
return $content;
}

The function will remove all line breaks so strip_tags will not have problems with finding all markups.  Since strip_tags does not remove <style> tags, the new function will remove them using regex.

Mysql on 32bit processor vs 64bit processor

Here is the continuation of the experiment done previously on Mysql on Windows vs Linux. After doing more analysis and observation, it appeared that the difference in speed for the benchmark tests were not related to Windows and Linux but were related to the processor. The previous test just happened to have AMD 64s on the Linux machines and Intel Xeon 32bits on Windows. We hypothesized that the increased throughput of the 64bit processors resulted in about half the time required to return the same query run on a 32bit machine.

View the previous results

So we decided to test the same query again on a Windows machine with an AMD 64 X2. This time the result for the query took 7.35 second, almost the same as the other AMD 64s running Linux.

One more test to run the query on a Windows Intel 64bit chip machine would better solidify our hypothesis.

Mysql server has gone away

I have a spider crawling the web written in PHP running constantly to insert and update data in a Mysql database.  Today, it kept stopping on a records with the error message, “Mysql server has gone away”.  It certainly did not time out as just starting the process again would result in this message immediately.  The script did not close out the connection to the DB as it had worked just fine for over a year.

Finally, I found that the problem was max_allowed_packet setting in my.cnf was too small.  It was set to 2MB and when Mysql receives a query larger than that, it assumes that something must have gone wrong and closes the connection.  I increased that parameter to 4MB and everything is working fine now.

Mysql on Windows vs Linux

I was working on optimizing a Mysql database today and accidentally stumbled upon a benchmarking exercise. The original Mysql database is hosted on a Windows Server 2003. I develop on a Mysql database server running on Ubuntu Linux. The Linux server ran the same query twice as fast as the Windows server without using caching or anything. I know that Linux does I/O caching on its own as well so I even tried running the query after a fresh reboot to rule out that factor. Then I got help from a colleague and started tweaking with the my.cnf/my.ini to make sure they were the same and each time, the results came back about the same. Mysql on Windows was consistently slower than Mysql on Linux.

Then we decided to load the database onto other servers for more data points on this Mysql performance test. We ran the same query returning 429 rows of data with 13 table joins and a couple of sub-queries. All queries were run on the command line client on the servers themselves to avoid network lag.  All servers are running Mysql 5.0.x.

Results

1. Ubuntu Linux: 0.70 seconds
2. CentOS: 0.78 seconds
3. Windows 2003 Server: 1.40 seconds
4. Windows 2003 Server: 1.42 seconds

Server hardware

1. Ubuntu Linux
AMD ATHLON 64 X2 4200+
2GB DDR400
200GB 7200RPM SATA/150

2. CentOS
Dual Opteron 240
2GB DDR ECC
120GB 7200RPM SATA/300

3. Windows 2003 Server
Dual Xeon
2GB DDR ECC
7200RPM PATA

4. Windows 2003 Server
2 Dual Xeon (4 CPUs)
8GB DDR ECC
3 73GB 10,000RPM SCSI in RAID 5

Mysql SQL_NO_CACHE Benchmark Problem

I have been trying to benchmark the sql calls to mysql from an application without success. Even when I set the SELECT statement to SQL_NO_CACHE, the results seemed to be still cached. Apparently, Linux does its own caching of disk reads also. That was giving me incorrect benchmark results if I hit the same data twice.

For those using Linux kernel 2.6.16 or higher, you can clear Linux cache with the following command.
echo 3 > /proc/sys/vm/drop_caches

Unfortunately for me, I am stuck on 2.5.xx and will have to figure out another method.

Open Source Is Much More Serious These Days

If you still think open source applications are just the wild west without any sort of structured support, think again. The Drupal Conference happening in Boston is an example of how organized open source has become. No longer are conventions just for Microsoft, IBM, Oracle, etc.

Web Startup Taking Feature Away From Basic Membership

I have been using Biznik for a while and have been pleased with their features and the community they have built. Changes they have made to the site and services have been well thought out and fair to the users until the latest one that took away the ability to see who has viewed your profile. It had been a feature that was available to all members, even the basic free membership since day one, but they decided to take it away in order to drive basic members to upgrade to a paid membership. It stirred up some criticism with many different viewpoints ranging from features should never be taken away to businesses can change features offered anytime in order to be profitable. One interesting and important point was that taking away a feature that helps both paying and non-paying members connect is always a bad thing as it takes away from the building of a community. Since the community is the value brought forth by Biznik, they are in fact lowering their value proposition to all members by taking this feature away. I am sure it was a difficult decision for them as they must try to increase revenue to keep afloat but in doing so, might hinder their value hence stifling growth. A rough estimate of their current annual revenue is $77,000. Certainly just squeaking by with three people working and technology costs.

Server Sporadically Responding To Pings

Mystery solved.  One of our servers became very sluggish in responding to requests.  When I tested the server by pinging continuously, request timed out for about 10% of the pings.  It turned out that the NIC was malfunctioning.  Replaced the ethernet card and all is well again.

zen cart Warning: session_start() No such file or directory (2) in /dir/public_html/includes/functions/sessions.php on line 102

The error comes up when trying to use file based session storage instead of database. Apparently, the configuration for what directory to store the session is not in the configuration files but in the database table “configuration”.

If you upgrade your php to version 5.2.1, the database session storage breaks. Therefore, I had to switch to file system based storage. But the catch was that I no longer could log into admin as the session could not start. The configuration.php files did not have this constant.

The way to fix this is to log into your database and update the session directory record.

“UPDATE configuration SET configuration_value=’/your/new/direcotory/’ WHERE configuration_title=’session directory’;”

System Specs:
php.5.2.1
zencart 1.3.5
apache 1.3.37
MySQL 5.0.45

.NET Error “The password provided is invalid. Please enter a valid password value.”

Using the SqlMembershipProvider, you may encounter the error message, “The password provided is invalid. Please enter a valid password value. “, which is not very clear as to the exact problem. Apparently, the default setting in the .net membership configuration requires a password with a minimum of eight character and at least one non alpha numeric character such as “!@#$%^&*()”.