Incorrect String Value with MySQL and EF Core
When I attempted to insert it into a local development database, I notice an odd exception, something that I thought would be avoided because I'm using UTF8 in the character set of the MySQL 5.7:
I'm working on a project at the moment where I'm analysing the sentiment of Twitter Tweets. In this part of the project, I'm looking at the Tweet Text and when I attempted to insert it into a local development database, I notice an odd exception, something that I thought would be avoided because I'm using UTF8 in the character set of the MySQL 5.7:
Incorrect string value: '\xF0\x90\x8D\x83\xF0\x90...' for column 'content' at row 1
A little bit of Googling lead me a StackOverflow post (as always, love StackOverflow) which describes the cause and solution, you can read that post here. The crux of it is that you need to use utf8mb4 rather than utf8 in the character set and collation variables of the database so we can store the extended character set because the UTF8 only includes Unicode characters. With this change we get the full character set which will be stored in 4 bytes rather then the UTF8 character set which stores them in just 3 bytes.
To fix our db, implement the following:
1. Find the my.ini config file
This is running atop Windows 10 so I needed to find out where my.ini is stored, to do that I took a look at services.msc pane and looked at the config variables passed to the mysql server at startup, this is what I found:
"C:\Program Files\MySQL\MySQL Server 5.7\bin\mysqld.exe" --defaults-file="C:\ProgramData\MySQL\MySQL Server 5.7\my.ini" MySQL57
Bingo!
2. Change the character set variables
Then in the my.ini, I changed:
default-character-set=utf8 => default-character-set=utf8mb4
And
character-set-server=utf8 => character-set-server=utf8mb4
3. Restart Server and Recreate the Database.
I restart the server from services.msc, deleted the development database and then allowed EF Core Migrations to recreate the datbase on application start. (Pro tip: always, always create your database with EF Core Migrations so you can recreate it at any time for any reason.)
The collation of the db is then in a form supporting UTF8 extended character set fully in 4 bytes, ready to accept the full range of text from my Twitter Tweets.