clickhouse笔记01--快速部署clickhouse
1 介绍
ClickHouse是一个用于联机分析(OLAP)的列式数据库管理系统(DBMS),当前已经被很多大公司使用。
Clickhouse 是一款真正的列式数据库管理系统,具备数据压缩、数据的磁盘存储、多核心并行处理、多服务器分布式处理、支持SQL、向量引擎、实时的数据更新、索引、适合在线查询、支持近似计算、Adaptive Join Algorithm、支持数据复制和数据完整性、角色的访问控制 等特性。更多介绍信息可以参考 clickhouse 官方文档
笔者也是初次使用 clickhouse,因此将其部署方法和注意事项贴在此处,以方便有需要的小伙伴学习。
2 部署测试
2.1 准备配置
在 clickhouse 稳定版本安装包 准备最新稳定安装包,笔者准备的为 22.2.2.1 版本.
1) 下载安装包
clickhouse-client-22.2.2.1.tgz
clickhouse-common-static-22.2.2.1.tgz
clickhouse-common-static-dbg-22.2.2.1.tgz
clickhouse-server-22.2.2.1.tgz
2)解压安装包
tar -xzvf "clickhouse-common-static-22.2.2.1.tgz"
sudo "clickhouse-common-static-22.2.2.1/install/doinst.sh"
tar -xzvf "clickhouse-common-static-dbg-22.2.2.1.tgz"
sudo "clickhouse-common-static-dbg-22.2.2.1/install/doinst.sh"
tar -xzvf "clickhouse-server-22.2.2.1.tgz"
sudo "clickhouse-server-22.2.2.1/install/doinst.sh"
tar -xzvf "clickhouse-client-22.2.2.1.tgz"
sudo "clickhouse-client-22.2.2.1/install/doinst.sh"
3) vim /etc/clickhouse-server/config.xml
注释掉 listen_host 即可, 若为 ipv6 就注释第一个,若为 ipv4 就注释第二个,不注释掉的话只能在本机访问了
<listen_host>::</listen_host>
<listen_host>0.0.0.0</listen_host>
2.2 启动服务
# /etc/init.d/clickhouse-server start
# /etc/init.d/clickhouse-server status
Now there is no clickhouse-server process.
执行命令后,发现服务并未启动,查看 /etc/init.d/clickhouse-server 后发现该服务使用 clickhouse 用户启动,因此需要新建该用户,创建先关文件夹,并授权给 clickhouse 用户
创建clickhouse,此处直接把密码也设置为clickhouse
# adduser --home /home/clickhouse clickhouse
创建核心目录
# mkdir -p /var/lib/clickhouse
# chown clickhouse:clickhouse -R /var/lib/clickhouse
# mkdir -p /var/log/clickhouse-server
# chown clickhouse:clickhouse -R /var/log/clickhouse-server
再次执行命令发现服务正常启动:
# /etc/init.d/clickhouse-server start
# /etc/init.d/clickhouse-server status
/var/run/clickhouse-server/clickhouse-server.pid file exists and contains pid = 67486.
The process with pid = 67486 is running.
1 动服务后,您可以使用命令行客户端连接到它: clickhouse-client
2 也可以通过 curl http://192.168.2.136:8123 来查看服务是否ok,curl 返回 Ok.
3 也可以直接通过 mysql客户端连接 clickhouse, mysql -udefault -h192.168.2.136 -P9004
连接后就像平时查看sql 数据库一样访问数据即可。
2.3 测试
- 下载数据集(4.5Gb)
wget http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv
- 创建表
$ clickhouse-cliet :) use default; :) CREATE TABLE uk_price_paid ( price UInt32, date Date, postcode1 LowCardinality(String), postcode2 LowCardinality(String), type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0), is_new UInt8, duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0), addr1 String, addr2 String, street LowCardinality(String), locality LowCardinality(String), town LowCardinality(String), district LowCardinality(String), county LowCardinality(String), category UInt8 ) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);
- 预处理 & 导入数据
clickhouse-local --input-format CSV --structure ' uuid String, price UInt32, time DateTime, postcode String, a String, b String, c String, addr1 String, addr2 String, street String, locality String, town String, district String, county String, d String, e String ' --query " WITH splitByChar(' ', postcode) AS p SELECT price, toDate(time) AS date, p[1] AS postcode1, p[2] AS postcode2, transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type, b = 'Y' AS is_new, transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration, addr1, addr2, street, locality, town, district, county, d = 'B' AS category FROM table" --date_time_input_format best_effort < pp-complete.csv | clickhouse-client --query "INSERT INTO uk_price_paid FORMAT TSV"
- 验证数据
针对该数据集更多测试参考: 快速上手/示例数据集/K Property Price Paid4.1 查看数据条数 :) SELECT count() FROM uk_price_paid; SELECT count() FROM uk_price_paid Query id: 828475ce-ad80-45de-8ed2-4c9f6fd8e871 ┌──count()─┐ │ 27265985 │ └──────────┘ 1 rows in set. Elapsed: 0.004 sec. 4.2 查看占用存储大小 :) SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid'; SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid' Query id: 04d2a185-0989-4649-86a5-e38e6edac861 ┌─formatReadableSize(total_bytes)─┐ │ 292.80 MiB │ └─────────────────────────────────┘ 1 rows in set. Elapsed: 0.019 sec.
3 注意事项
- 必须新建 clickhouse 用户,并创建相关目录,配置clickhouse权限,否则无法正常启动。
- 需要在 /etc/clickhouse-server/config.xml 中开启 <listen_host>0.0.0.0或者::</listen_host>, 否则只能本机访问。
4 说明
软件环境:
Ubuntu 21.04 Desktop
clickhouse 22.2.2.1
参考文档:
clickhouse 官方文档
clickhouse 稳定版本安装包
ClickHouse安装部署【非常详细】