The Leus

  • Increase font size
  • Default font size
  • Decrease font size
Home News Feeds Planet MySQL
Planet MySQL
Planet MySQL -

  • MySQL Server 8.0.18: Thanks for the Contributions
    Tweet In my blog series about external contributions to MySQL 8 we have reached version 8.0.18 which was released Monday 14 October 2019. Again the community has contributed to make MySQL better. Thank you. The contributions to MySQL 8.0.18 includes several patches from Facebook as well as patches from Gillian Gunson, Przemysław Skibiński (Percona), and Daniel Black. The contributions are: MySQL now provides more control over the use of compression to minimize the number of bytes sent over connections to the server. Previously, a given connection was either uncompressed or used the zlib compression algorithm. Now, it is also possible to use the zstd algorithm (zstd 1.3), and to select a compression level for zstd connections. The permitted compression algorithms can be configured on the server side, as well as on the connection-origination side for connections by client programs and by servers participating in master/slave replication or Group Replication. For more information, see Connection Compression Control.Connection compression using the zstd algorithm requires that the server be built with zstd library support. The new WITH_ZSTD CMake option indicates whether to use the bundled or system zstd library.Legacy compression-control parameters, such as the --compress client option, are deprecated and will be removed in a future MySQL version.Thanks to Facebook for a contribution on which some of this work was based. The sys.schema_unused_indexes view now filters out unique indexes. Thanks to Gillian Gunson for the contribution. (Bug #24798995, Bug #83257) The new innodb_idle_flush_pct variable permits placing a limit on page flushing during idle periods, which can help extend the life of solid state storage devices. See Limiting Buffer Flushing During Idle Periods.Thanks to Facebook for the contribution. (Bug #27147088, Bug #88566) Replication: The heartbeat interval for a replication slave, which is controlled by the MASTER_HEARTBEAT_PERIOD option of the CHANGE MASTER TO statement, can be specified with a resolution in milliseconds. Previously, the master’s binary log dump thread used a granularity of seconds to calculate whether a heartbeat signal should be sent to the slave, causing excessive heartbeat activity in the case of multiple skipped events. To remove this issue, all heartbeat-related calculations by the master and slave are now carried out using a granularity of nanoseconds for precision. Thanks to Facebook for the contribution. (Bug #29363787, Bug #94356) When generating C source from SQL scripts, Some utf8-encoded characters were split across lines. Thanks to Przemysław Skibiński from Percona for the patch. (Bug #30152555, Bug #96449) With strict SQL mode enabled, the STR_TO_DATE() function did not properly handle values with time parts only. Thanks to Daniel Black for the contribution. (Bug #18090591, Bug #71386) If you have patches you would like to contribute you can do so from MySQL’s GitHub repository (requires signing the Oracle Contributor Agreement). Thank you for the contributions. Tweet

  • Just put the cold data over there
    There are several ways to use less SSD for an OLTP workload: choose a database engine has less space amplification, store less data, move the cold data elsewhere. The first approach is a plan while the others are goals. A plan is something you can implement. A goal requires a plan to get done.This matters when you want to decrease the cost of storage for a database workload but not everyone needs to do that. The first approach assumes your DBMS supports an LSM with leveled compaction and compression (MariaDB and Percona Server include MyRocks, ScyllaDB and Cassandra are also options). The second approach, store less data, assumes you can get everyone to agree to remove data and that is a hard conversation.The third approach, move the cold data elsewhere, is a wonderful goal. I wonder how often that goal is achieved. To implement this you must find data that won't (well, almost never) be read or written again and then move it to less expensive storage. I assume this has been more successful when implemented in storage than in a DBMS. The obvious example is an OS page cache but there are also tiered storage servers. An LSM already classifies data that is hot vs cold for writes, data closer to the L0 was written more recently, but that might not imply anything about read likelihood.I have read wonderful research papers that show how a DBMS might do this but I am still wary. Assume that data can be described by read and write likelihood attributes -- {read N, read forever} X {write N, write forever} then the goal is to find data that is read N, write N for which N has been reached. You frequently won't know the value of N and monitoring will be needed to identify data that is rarely read or written, along with more CPU, RAM and IO to perform that monitoring. This is easier to do when the granularity of hot vs cold is per table but that is rare in my experience. I assume the common case is a table with a mix of hot and cold data.Don't forget that it is a lousy experience when cold data becomes hot again.This post was inspired by something I read on a wonderful blog -- Blocks and Files (no snark, blog is great). My summary of the post is that SSD endurance isn't a problem, just evict cold data to cold storage. Just remember that is a goal not a plan.Update - I can't believe I forgot to mention the RocksDB persistent read cache that can use a faster persistent device (like Optane) to cache data read from a slower persistent device. The RocksDB-cloud effort from Rockset makes RocksDB work on S3 and uses the read cache to benefit from local storage. This post explains RocksDB-cloud.Update - I am not sure I reviewed this before, but there is an interesting paper that extends RocksDB to migrate cold data to cold storage -- see Mutant.

  • Group Replication delivery message service
    In the process of enhancing MySQL replication with new features for our users, we also improve the internals. A lot of work goes also to modernize the codebase and make it easier to maintain and obviously extend as well. In MySQL 8.0.18, we have extended the set of internal services with a group replication  message passing service, therefore generalizing the communication framework for group replication.… Tweet Google Plus Share

  • Setting up a High Availability for PMM
    We have been working with PMM for quite a long time, we do most of the performance analysis with PMM for most of our clients. It also provides the flexibility that we have built our own custom dashboard. PMM has many advantages Easy to deploy (docker based) Flexible Customizable Query Analytics One-stop solution for MySQL,Mongo,ProxySQL & PostgresSQL Orchestrator Rich and Deep metrics stats Highly recommended for any production deployments its equivalent to Enterprise-grade monitoring and graphing tool. Recently we have been working for our client on MySQL Consulting to scale peak sale of the year. Wherein we have deployed PMM to view the performance insights of DB. We started on-boarding a considerable number of servers under PMM, Everyone liked the way PMM was able to project the performance insights with a beautiful dashboard over a period of time. Need for HA and Backup of PMM server When PMM started covering border group of server, we were raised on the question of HA for PMM server, of course it is good to have a High Availability for all possible solutions. What if the server crashes due to Hardware failures? Does PMM support redundancy? Is PMM server a single point of failure (SPOF) ? What would be the backup strategy? We had search for many options to achieve HA, with containerized (docker) PMM server. We explored lsyncd – Rsync based daemon to copy incremental changes at the block level, failed when we tried to sync the entire /var/lib/docker and start back docker services DRBD – will work, but it has complexity with the setup and maintenance The Accidental HA solution: As the name suggests, “it was Accidental,”. lets see how it’s done.We have below servers, those were running on debian9(Stretch), with the same version of docker. Source : (production) Destination: (Replica) The requirement here is to sync the metrics data between the source( and destination (, On the source server we had Prometheus data set size around 178G, so the initial sync took some time. Stop PMM-server on source # docker stop pmm-server Copy the existing data within the docker volume. #docker cp pmm-data:/opt/prometheus/data opt/prometheus/ # docker cp pmm-data:/opt/consul-data opt/ # docker cp pmm-data:/var/lib/mysql var/lib/ # docker cp pmm-data:/var/lib/grafana var/lib/ Once the initial copy is done, start back the source PMM server # docker start pmm-server Next, to transfer the data copy to the destination server, I have used SCP here. #scp -r /backup/* Now on the destination server, make sure to have the same version of the PMM server as the source, I have here used version “1.17-1” Since the monitoring server used for testing does not have internet access enabled I had to do an offline load of PMM image as below # docker load < pmm-server_1.17.tar 071d8bd76517: Loading layer [==================================================>] 210.2MB/210.2MB 6e66f289f547: Loading layer [==================================================>] 899.3MB/899.3MB Loaded image: percona/pmm-server:latest Creating docker volume as same as the source server # docker create -v /opt/prometheus/data -v /opt/consul-data -v /var/lib/mysql -v /var/lib/grafana --name pmm-data percona/pmm-server:latest /bin/true 985630d4aa64375cc4834ceb0f66654bada98cda151636bcd63e9f9b0d3fe3a2 Once the data volume is created now proceed to copy back the data into docker data volume as below # docker cp /backup/opt/prometheus/data pmm-data:/opt/prometheus/ # docker cp /backup/opt/consul-data pmm-data:/opt/ # docker cp /backp/var/lib/mysql pmm-data:/var/lib/ # docker cp /backup/var/lib/grafana pmm-data:/var/lib/ Once the data-copy is done, PMM data volume is ready to be used after the change of ownership as below. docker run --rm --volumes-from pmm-data -it percona/pmm-server:latest chown -R pmm:pmm /opt/prometheus/data /opt/consul-data docker run --rm --volumes-from pmm-data -it percona/pmm-server:latest chown -R grafana:grafana /var/lib/grafana docker run --rm --volumes-from pmm-data -it percona/pmm-server:latest chown -R mysql:mysql /var/lib/mysql Now let’s proceed with the final step to run the pmm-server #docker run -d -p 8080:80 --volumes-from pmm-data --name pmm-server -e SERVER_USER=fhuser -e SERVER_PASSWORD=FHPassword -e METRICS_RETENTION=1500h -e ORCHESTRATOR_ENABLED=true -e ORCHESTRATOR_USER='pmm_user' -e ORCHESTRATOR_PASSWORD='PMM_pass' -e METRICS_RESOLUTION=5s --restart always percona/pmm-server:latest Note this command should be exactly the same command executed on the master.After a few moment the PMM service was started in the destination box, we could see the live metrics data polling in (Graph below). We wanted to check the data sync between the source and destination, so we decided to make it run for a couple of more days After this test run period, We have verified the graphs between source and destination, which seems exactly the same, I am just sharing a sample graph for a DB server( between source and destination. Graph from Destination Graph from source : How does it work? In the first instance when we saw the data is being polled in the backup we were astonished !! and more curious to know how it’s actually working ?. We had a long discussion with our expert team on this This is just a conceptual understanding to explain this scenario.Let’s go back to the official architecture of the PMM server by Percona, As below. . As the above architecture depicts PMM has two parts one is the QAN(Query Analytics) and (MM) Metrics Monitor, where we have achieved the redundancy. Metrics monitor works by a “PULL” mechanism, ie., the Prometheus collects the metrics data from registered exporters. Since we have duplicated and created the destination server with the same registered exporter from servers. Hence destination, it makes one more pull from DB servers and stores data on to its own data-volume. What we have achieved is in the below architecture To analyze this we decided to capture packets on the DB machine for those servers and we could TCP packet analysis via TCP Dump: root@mydbops-db-8046755:/home/ tcpdump -c 20 -tttt -i eth0 src tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 2019-10-08 13:49:30.593668 IP > mydbops-db-8046755.42002: Flags [P.], seq 1314972321:1314972594, ack 1936498908, win 1444, options [nop,nop,TS val 130344282 ecr 540951618], length 273 2019-10-08 13:49:30.614009 IP > mydbops-db-8046755.42002: Flags [.], ack 4126, win 1422, options [nop,nop,TS val 130344287 ecr 540952633], length 0 2019-10-08 13:49:30.614022 IP > mydbops-db-8046755.42002: Flags [.], ack 9052, win 1390, options [nop,nop,TS val 130344287 ecr 540952633], length 0 2019-10-08 13:49:30.712817 IP > mydbops-db-8046755.42002: Flags [P.], seq 1684281946:1684282219, ack 3948074006, win 1444, options [nop,nop,TS val 130344312 ecr 540951623], length 273 2019-10-08 13:49:31.803979 IP > mydbops-db-8046755.42000: Flags [P.], seq 703692847:703693117, ack 1617572969, win 1444, options [nop,nop,TS val 130344585 ecr 540952412], length 270 2019-10-08 13:49:31.816501 IP > mydbops-db-8046755.42000: Flags [.], ack 15606, win 1436, options [nop,nop,TS val 130344588 ecr 540952933], length 0 20 packets captured 23 packets received by filter 0 packets dropped by kernel root@mydbops-db-8046755:/home/ tcpdump -c 20 -tttt -i eth0 src tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 2019-10-08 13:49:42.615378 IP > mydbops-db-8046755.42000: Flags [P.], seq 2445371654:2445371924, ack 1808532851, win 1444, options [nop,nop,TS val 905019213 ecr 540954908], length 270 2019-10-08 13:49:42.627810 IP > mydbops-db-8046755.42000: Flags [.], ack 4126, win 1428, options [nop,nop,TS val 905019217 ecr 540955636], length 0 2019-10-08 13:49:42.627833 IP > mydbops-db-8046755.42000: Flags [.], ack 14262, win 1386, options [nop,nop,TS val 905019217 ecr 540955636], length 0 2019-10-08 13:49:43.490791 IP > mydbops-db-8046755.42002: Flags [P.], seq 611310810:611311083, ack 2742084436, win 1444, options [nop,nop,TS val 905019432 ecr 540954843], length 273 2019-10-08 13:49:43.512377 IP > mydbops-db-8046755.42002: Flags [.], ack 4126, win 1432, options [nop,nop,TS val 905019438 ecr 540955857], length 0 2019-10-08 13:49:43.512388 IP > mydbops-db-8046755.42002: Flags [.], ack 9064, win 1410, options [nop,nop,TS val 905019438 ecr 540955857], length 0 20 packets captured 20 packets received by filter 0 packets dropped by kernel Key Takeaways: This works well for replicating exactly the same set of servers which is already registered with the source. If a new server is added, I believe the samessteps have to be followed back. This is not officially supported, though it works and full-fills the requirement. QAN HA is not done here, but it can be done since the PMM server uses MySQL internally to store the QAN data, hence it can be replicated by exposing the MySQL port. There can be minor differences or deviation in the metrics data since the time at which it polls the data might be different between the servers. As said earlier it is good to have a High Availability for PMM as it plays a critical role.

  • Automatic member fencing with OFFLINE_MODE in Group Replication
    Group Replication enables you to create fault-tolerant systems with redundancy by replicating the system state to a set of servers. Even if some of the servers subsequently fail, as long it is not all or a majority, the system is still available.… Tweet Google Plus Share


Russ has been playing around with Joomla, a different CMS, trying to rid his website of all the geeklog hackers...  Time will tell if this CMS works.