Split String in MySQL

The easiest way to split string based on the delimiter is using loop. But it is also the most inefficient way. The following is to demonstrate how to use a utility table number and a scalar function to split a string. It is much faster than loop.

— First create the utility table number:

CREATE TABLE IF NOT EXISTS number (
number int not null,
PRIMARY KEY (number)
) ENGINE=InnoDB ROW_FORMAT=COMPACT
COMMENT ‘Utility table with continuous numbers’;

— populate number table with numbers
INSERT INTO number(number)
SELECT @row := @row + 1 as number
FROM (select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t2,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t3,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t4,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) t5,
(SELECT @row:=0) r

The code above populates 100000 rows (1 – 100000) into the table number. The query cross joins 5 subqueries, each has 10 rows built by the UNION ALL.  The (SELECT @row”=0) just initialize the session variable @row to 0. We can remove it from this INSERT query  by  initializing it ahead of it, e.g. SET @row=0;

— Let’s create the scalar function:

DELIMITER $$
DROP FUNCTION IF EXISTS ufn_split_string
$$

CREATE FUNCTION ufn_split_string(
x text,
delim VARCHAR(12),
pos INT
)
RETURNS VARCHAR(255)
LANGUAGE SQL
DETERMINISTIC
RETURN REPLACE(SUBSTRING(SUBSTRING_INDEX(x, delim, pos),
CHAR_LENGTH(SUBSTRING_INDEX(x, delim, pos -1)) + 1),
delim, ”)
$$

The function returns the Nth (specified by parameter pos) delimited substring in the string x. The delimiter is delim. It uses SUBSTRING_INDEX MySQL built-in function, which returns a substring from a string before the specified number of occurrences of the delimiter. The scalar function calls SUBSTRING_INDEX twice to get the Nth and and (N-1)th occurrence of the delimiter and REPLACE the latter one with empty string in the first one so we get the substring delimited by (N-1)th and Nth occurrence of the delimiter.

To split a delimited string, we need to use the scalar function ufn_split_string and the number table together:

SET @str=’AA,BB,cc,DD,ee’;
SELECT a.val FROM (
SELECT CAST(ufn_split_string(@str,’,’,n.number) AS char(255)) AS val
FROM number n
WHERE n.number<=(CHAR_LENGTH(@str)/2)) a
where a.val<>”;

it returns:

val
AA
BB
cc
DD
ee

If you know the number of items in the string, you can replace (CHAR_LENGTH(@str)/2) with the real number.

Advertisements
Posted in MySQL | Leave a comment

String concatenation gotcha

In SQL Server, there are a few ways to concatenate string values from a column in a table, e.g. using FOR XML TYPE method. The following example is another method we all use:

USE tempdb;

GO

CREATE TABLE MyTestTable (id int not null identity, col1 nvarchar(30) NOT NULL, col2 nvarchar(20) not null);
INSERT MyTestTable(col1,col2) VALUES (‘aaa’,’aa’),(‘bbbb’,’bb’),(‘cccc’,’cc’),(‘dddd’,’dd’);
GO

DECLARE @str nvarchar(max);
SET @str=”;
SELECT @str=@str+col1+’,’ FROM MyTestTable ORDER BY id;
SELECT @str;

SET @str=”;
SELECT @str=@str+col1+’,’ FROM MyTestTable ORDER BY col1;
SELECT @str;
GO

— We take this as granted. But does it always work?

— It does NOT work if ordered by a computed column in the subquery
DECLARE @str nvarchar(max);
SET @str=”;
SELECT @str=@str+tmpcol+’,’ FROM (SELECT col2, col1+CAST(id as nvarchar(20)) AS tmpcol FROM MyTestTable) t ORDER BY t.tmpcol;
SELECT @str;
SET @str=”;
SELECT @str=@str+tmpcol+’,’ FROM (SELECT col2, col1+’_’+col2 AS tmpcol FROM MyTestTable) t ORDER BY tmpcol;
SELECT @str;
GO

— It does NOT work if ordered by a computed column in the subquery
— even just data type change

DECLARE @str nvarchar(max);
SET @str=”;
SELECT @str=@str+tmpcol+’,’ FROM (SELECT col2, CAST(id as nvarchar(20)) AS tmpcol FROM MyTestTable) t ORDER BY tmpcol;
SELECT @str;
SET @str=”;
— even value not changed at all
SELECT @str=@str+tmpcol+’,’ FROM (SELECT col2, col1+” AS tmpcol FROM MyTestTable) t ORDER BY tmpcol;
SELECT @str;
GO
— It works if ordered by a physical column in the subquery
DECLARE @str nvarchar(max);
SET @str=”;
SELECT @str=@str+tmpcol+’,’ FROM (SELECT col2, col1+CAST(id as nvarchar(20)) AS tmpcol FROM MyTestTable) t ORDER BY col2;
SELECT @str;
GO

— It is the same result if change the table MyTestTable to have PK on any column.
— Tested on SQL Server 2008R2 RTM Developer Edition

Posted in SQL server | Leave a comment

T-SQL Suggestion: Support GROUP_CONCAT

MySQL has a handy aggregation function GROUP_CONCAT to concatenate string when grouping data. AFAIK, SQL Server lacks the similar functionality. So I created a MS connect item for this feature support.

Please vote at: https://connect.microsoft.com/SQLServer/feedback/details/764820/support-group-concat-aggregation-function

Posted in SQL server | Leave a comment

Change MySQL data folder on SELinux

When installing MySQL on SELinux (Security Enhanced Linux), you may get the following errors in mysqld.log if you

changed the default data directory for mysql database even you granted all necessary privileges to the user mysql:
111206  1:46:00 [Warning] Can’t create test file /data/mysql/devmysql.lower-test
111206  1:46:00 [Warning] Can’t create test file /data/mysql/devmysql.lower-test
/usr/libexec/mysqld: Can’t change dir to ‘/data/mysql/’ (Errcode: 13)
111206  1:46:00 [ERROR] Aborting

This is because on SELinux such as CentOS 6, all processes and files are labeled in a way that represents

security-relevant information. This information is called the SELinux context. For files, this is viewed using the

ls -Z command:
$ ls -Z file1
-rw-rw-r–  user1 group1 unconfined_u:object_r:user_home_t:s0 file1

In this example, SELinux provides a user (unconfined_u), a role (object_r), a type (user_home_t), and a level (s0).

This information is used to make access control decisions. On DAC systems, access is controlled based on Linux user

and group IDs. SELinux policy rules are checked after DAC rules. SELinux policy rules are not used if DAC rules

deny access first.

So when we change the mysql datadir to a different folder, besides granting the access permission for the mysql

user, we also need to  change the lable of the new folder.

First we need to know what it is the correct labeling using -Z command on the default data dir:
[root@xxx ~]# ls -lh -Zd /var/lib/mysql
drwxr-xr-x. mysql mysql system_u:object_r:mysqld_db_t:s0 /var/lib/mysql

Now change the label for the new location:
chcon -R -u system_u -r object_r -t mysqld_db_t /data/mysql

To verify the lable is changed:
ls -lh -Zd /data/mysql
drwxr-xr-x. mysql mysql system_u:object_r:mysqld_db_t:s0 /data/mysql

Note the mysql config file has a different lable, do not use that one for the data dir.

[root@xxx ~]# ls -lh -Z /etc/my.cnf
-rw-r–r–. root root system_u:object_r:mysqld_etc_t:s0 /etc/my.cnf

If binlog file is put in a different location (e.g. /mysql-log/mysql), we also need to change the label for the

root folder (/mysql-log) and the subfolder /mysql-log/mysql.

Now we can install the system tables:
mysql_install_db –datadir=/data/mysql –user=mysql

 

Posted in Linux, MySQL | 10 Comments

Backup related files with Databases by ZRM for MySQL

Zmanda Recovery Manager for MySQL (ZRM for MySQL) is a robust backup and recovery solution for MySQL server. It can also be used to backup files related to databases. In the backup command mysql-zrm-backup,
it has an option to specify a list of files to be backedup togetehr with the databses in the backup set:

 mysql-zrm-backup
                  [–all-databases] | [–databases “dbname1 dbname2″…] |
                  [–database “name” [–tables tname1 tname2 …]]
                  [–exclude-pattern pattern]
                  [–routines | –no-routines]
                  [–mysql-binpath MySQL_command_dir]
                  [–mysql-binlog-path MySQL_binary_log_dir]
                  [–config-file-list=”config_file_pathname”]

The –config-file-list=”config_file_pathname” is used to specify the file that contains the files on the database server to be backed up.

1) The “config_file_pathname” is the full path name on the ZRM Server, not on the ZRM client (i.e. the database server is being backed up), e.g. /etc/mysql-zrm/server1-backup/files_to_backup.txt
2) In the file, each line is a full path file name to be backed up on the database server, e.g.:
/etc/my.cnf
/var/lib/archive_files/arc_20110101.dat
/var/lib/archive_files/arc_20110102.dat
/var/lib/archive_files/arc_20110103.dat

3) ZRM for MySQL preserves the permission and timestamp etc when backing up the files sepcified in the config file. So the id of the owner and the id of the group of the files must be identical on both ZRM clients and ZRM server.
e.g. if the owner (id=27) and group (id=27) of /etc/my.cnf on server1 is mysql, the user/group of mysql on the ZRM server must have the same id;

4) The account used to run ZRM must be able to access those files;

Posted in MySQL | Leave a comment

Inconsistency in view column data type

When we crate views in MySQL, the data types of the view columns should be derived from the base tables. However MySQL sometimes goes “extra miles”.

USE test;

DROP TABLE IF EXISTS property;

CREATE TABLE property (user_id int not null, property_id int NOT NULL, property_value varchar(512) NOT NULL, PRIMARY KEY CLUSTERED(user_id,property_id));

INSERT property (user_id,property_id,property_value)
VALUES (1,1,’21’),(1,2,’M’),(1,3,’aaa@test.com’),
(2,1,’34’),(2,2,’F’),(2,3,’fff@test.com’),
(3,1,’28’),(3,2,’M’),(3,3,’bbb@test.com’);

— THe view is basically pivot the table and give meaningful column name
DROP VIEW IF EXISTS person;

CREATE VIEW person
AS
SELECT user_id
, MAX(CASE WHEN t.property_id=1 THEN t.property_value ELSE NULL END ) AS age
, MAX(CASE WHEN t.property_id=2 THEN t.property_value ELSE NULL END ) AS gender
, MAX(CASE WHEN t.property_id=3 THEN t.property_value ELSE NULL END ) AS email
FROM property t
GROUP BY t.user_id;

SELECT * FROM person;

SELECT * FROM information_schema.columns where table_schema=’test’ AND table_name=’person’;

From the output of the information_schema.columns, the data type of the columns (age,gender,email) in the view is varchar(512), which is same as in the base table.

If we change the data type of property_value to varchar(n), where n is bigger than 512, and re-run the script, the data type of the columns (age,gender,email) in the view will be changed to longtext, which is a big suprise. MySQL is inconsistent in defining the data types for the view columns.

Now let’s cast the data types of the view columns to what are adequate and meaningful in the view, e.g.
age: unsigned integer
gender: char(1)
email:varchar(512)

The view will be like:
DROP VIEW IF EXISTS person;

CREATE VIEW person
AS
SELECT user_id
, CAST(MAX(CASE WHEN t.property_id=1 THEN t.property_value ELSE NULL END ) AS unsigned) AS age
, CAST(MAX(CASE WHEN t.property_id=2 THEN t.property_value ELSE NULL END ) AS char(1)) AS gender
, CAST(MAX(CASE WHEN t.property_id=3 THEN t.property_value ELSE NULL END ) AS char(512)) AS email
FROM property t
GROUP BY t.user_id;

Check with information_schema.columns, the data type for age is changed to bigint, but the column type becomes ‘bigint(67) unsigned’. In one more complex view, it produced ‘bigint(12288)’. When the database is dumped and ported to another server, the script failed because 12288 is too big for display. The error message is ‘Error Code: 1439
Display width out of range for column xxxx (max = 255) 0.000 sec’

To prevent this happens, a better way is to convert to decimal, e.g. decimal(20,0):

DROP VIEW IF EXISTS person;

CREATE VIEW person
AS
SELECT user_id
, CAST(MAX(CASE WHEN t.property_id=1 THEN t.property_value ELSE NULL END ) AS decimal(20,0)) AS age
, CAST(MAX(CASE WHEN t.property_id=2 THEN t.property_value ELSE NULL END ) AS char(1)) AS gender
, CAST(MAX(CASE WHEN t.property_id=3 THEN t.property_value ELSE NULL END ) AS char(512)) AS email
FROM property t
GROUP BY t.user_id;

Check with information_schema.columns, the column type is ‘decimal(20,0)’.

The data type conversion functions (CAST and CONVERT) in MySQL only supports limited target data type. If they support all the data types that can be used in creating a table, this inconsistency should be able to avoided.

Posted in MySQL | Leave a comment

EDI Address

When you get an EDI document, how can you compose the EDI address? EDI address has the following format:
EDI://aaa:bb:ccc
where aaa = ISA06 field, bb = ISA05 field and ccc = GS02 field in EDI document.
For example, in the following EDI document:
ISA|00|          |00|          |14|GFSBB          |12|FOODBUYBB     
|100706|0726|U|00401|000001083|0|P|>~GS|IN|GFSBB|FOODBUYBB|20100706|0726|1083|X|004010~ST|810|0001~BIG|20100705|4444111||22031002…
The field delimeter is |. The 6th field of ISA is GFSBB, 5th is 14; The second field in the GS section is GFSBB. So the EDI address is:
EDI://GFSBB:14:GFSBB
Another example, whose delimiter is *.
ISA*00*          *00*          *14*118267624CCBP  *12*7043297824    
*100705*2042*U*00401*000004791*0*P*;&#21;GS*IN*118267624CCBP*7043297824*20100705*2042*4791*
The EDI address is EDI://118267624CCBP:14:118267624CCBP
Posted in BizTalk--EDI | Leave a comment