网络运营 > 服务器 > Linux

IBM云对象存储 - Linux主机通过rclone和COS API上传大文件

160人参与2021-01-14

云对象存储作为主流公有云数据存储服务已大规模应用,但其基于HTTP/HTTPs协议(RESTful API)、扁平数据结构和网络依赖等特性,在某些文件归档和备份场景中,通过类似s3fs转成文件系统挂载使用时,或多或少有些限制,比如大文件传送或高频繁I/O ,这里以IBM Cloud Object Storage为例,我们尝试通过rclone和ICOS API方式来实现较为稳定的文件传送和带宽控制。

1. Rclone

IBM Cloud官方配置连接文档:
https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-rclone

[root@centos-s3fs ~]# yum install -y unzip
[root@centos-s3fs ~]# curl https://rclone.org/install.sh | sudo bash
...
rclone v1.53.3 has successfully installed.
Now run "rclone config" for setup. Check https://rclone.org/docs/ for more details.

[root@centos-s3fs ~]# rclone config
2020/12/19 09:55:26 NOTICE: Config file "/root/.config/rclone/rclone.conf" not found - using defaults
No remotes found - make a new one
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> icos-test
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / 1Fichier
   \ "fichier"
 2 / Alias for an existing remote
   \ "alias"
 3 / Amazon Drive
   \ "amazon cloud drive"
 4 / Amazon S3 Compliant Storage Provider (AWS, Alibaba, Ceph, Digital Ocean, Dreamhost, IBM COS, Minio, Tencent COS, etc)
   \ "s3"
... ...
Storage> 4
** See help for s3 backend at: https://rclone.org/s3/ **

Choose your S3 provider.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Amazon Web Services (AWS) S3
   \ "AWS"
 2 / Alibaba Cloud Object Storage System (OSS) formerly Aliyun
   \ "Alibaba"
 3 / Ceph Object Storage
   \ "Ceph"
 4 / Digital Ocean Spaces
   \ "DigitalOcean"
 5 / Dreamhost DreamObjects
   \ "Dreamhost"
 6 / IBM COS S3
   \ "IBMCOS"
... ...
provider> 6
Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars).
Only applies if access_key_id and secret_access_key is blank.
Enter a boolean value (true or false). Press Enter for the default ("false").
Choose a number from below, or type in your own value
 1 / Enter AWS credentials in the next step
   \ "false"
 2 / Get AWS credentials from the environment (env vars or IAM)
   \ "true"
env_auth> 1
AWS Access Key ID.
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
access_key_id> xxxxxxxx
AWS Secret Access Key (password)
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
secret_access_key> xxxxxxxxxx
Region to connect to.
Leave blank if you are using an S3 clone and you don't have a region.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Use this if unsure. Will use v4 signatures and an empty region.
   \ ""
 2 / Use this only if v4 signatures don't work, eg pre Jewel/v10 CEPH.
   \ "other-v2-signature"
region> 2
Endpoint for IBM COS S3 API.
Specify if using an IBM COS On Premise.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / US Cross Region Endpoint
   \ "s3.us.cloud-object-storage.appdomain.cloud"
 2 / US Cross Region Dallas Endpoint
   \ "s3.dal.us.cloud-object-storage.appdomain.cloud"
 3 / US Cross Region Washington DC Endpoint
   \ "s3.wdc.us.cloud-object-storage.appdomain.cloud"
 4 / US Cross Region San Jose Endpoint
   \ "s3.sjc.us.cloud-object-storage.appdomain.cloud"
 5 / US Cross Region Private Endpoint
   \ "s3.private.us.cloud-object-storage.appdomain.cloud"
... ...
endpoint> s3.private.eu-de.cloud-object-storage.appdomain.cloud
Location constraint - must match endpoint when using IBM Cloud Public.
For on-prem COS, do not make a selection from this list, hit enter
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / US Cross Region Standard
   \ "us-standard"
 2 / US Cross Region Vault
   \ "us-vault"
 3 / US Cross Region Cold
   \ "us-cold"
 4 / US Cross Region Flex
   \ "us-flex"
 5 / US East Region Standard
   \ "us-east-standard"
 6 / US East Region Vault
   \ "us-east-vault"
... ...
location_constraint>
Canned ACL used when creating buckets and storing or copying objects.

This ACL is used for creating objects and if bucket_acl isn't set, for creating buckets too.

For more info visit https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl

Note that this ACL is applied when server side copying objects as S3
doesn't copy the ACL from the source but rather writes a fresh one.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Owner gets FULL_CONTROL. No one else has access rights (default). This acl is available on IBM Cloud (Infra), IBM Cloud (Storage), On-Premise COS
   \ "private"
 2 / Owner gets FULL_CONTROL. The AllUsers group gets READ access. This acl is available on IBM Cloud (Infra), IBM Cloud (Storage), On-Premise IBM COS
   \ "public-read"
 3 / Owner gets FULL_CONTROL. The AllUsers group gets READ and WRITE access. This acl is available on IBM Cloud (Infra), On-Premise IBM COS
   \ "public-read-write"
 4 / Owner gets FULL_CONTROL. The AuthenticatedUsers group gets READ access. Not supported on Buckets. This acl is available on IBM Cloud (Infra) and On-Premise IBM COS
   \ "authenticated-read"
acl> 2
Edit advanced config? (y/n)
y) Yes
n) No (default)
y/n> n
Remote config
--------------------
[icos-test]
type = s3
provider = IBMCOS
env_auth = false
access_key_id = xxx
secret_access_key = xxx
region = other-v2-signature
endpoint = s3.private.eu-de.cloud-object-storage.appdomain.cloud
acl = public-read
--------------------
y) Yes this is OK (default)
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
icos-test            s3

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q>

完成rclone设定,上述配置会保存在rclone.conf, 如果是多个客户端,可将配置文件部署在其他linux主机上,不用一一设定。

[root@centos-s3fs ~]# cat .config/rclone/rclone.conf
[icos-test]
type = s3
provider = IBMCOS
env_auth = false
access_key_id = xxx
secret_access_key = xxx
region = other-v2-signature
endpoint = s3.private.eu-de.cloud-object-storage.appdomain.cloud
acl = public-read

简单测试一个100G文件上传

[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold

[root@centos-s3fs ~]# rclone lsd icos-test:
          -1 2020-12-18 03:52:07        -1 eu-de-cold
          -1 2020-09-09 03:30:40        -1 liutao-cos
          -1 2020-11-25 12:56:39        -1 mariadb-backup
          -1 2020-05-21 13:52:16        -1 video-on-demand

[root@centos-s3fs ~]# rclone ls icos-test:eu-de-cold
107374182400 100G.file

[root@centos-s3fs ~]# rclone delete icos-test:eu-de-cold/100G.file

如果当前linux主机同时运行着其他业务,rclone势必会争抢部分网卡出站资源,数据传送的带宽控制可通过以下三个参数实现:

rclone-test-1:

[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=52M --s3-upload-concurrency=15

# 这里因为机器资源限制,虽然我们设置15并发,但实际系统只能接受12
[root@centos-s3fs ~]# netstat -anp |grep 10.1.129.58 | wc -l 
12

52M_chunk_size + 15_concurrency , 传送速率可以达到平均220MB/s,基本当前实例的极限
在这里插入图片描述
rclone-test-2:

[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=16M --s3-upload-concurrency=10

#并发连接降至10,检查发现有10个传送连接
[root@centos-s3fs ~]# netstat -anp |grep 10.1.129.58 | wc -l 
10

16M_chunk_size + 10_concurrency , 传送降低至120MB/s,通过这两个参数基本结合文件大小,即可定位当前实例的最佳实践
在这里插入图片描述
rclone-test-3:

[root@centos-s3fs ~]# rclone copy /data/100G.file icos-test:eu-de-cold --s3-chunk-size=16M --s3-upload-concurrency=10 --bwlimit "08:00,20M 12:00,30M 13:00,50M 18:00,80M 23:00,off"
2020/12/19 12:44:33 NOTICE: Scheduled bandwidth change. Limit set to 30MBytes/s

可以看到带宽速率因为当前时间段的设定,因此降到30MB/s左右

Interface        RX      TX    12:50:11
eth0 	 416.318KB/s   32.2685MB/s
Interface        RX      TX    12:50:12
eth0 	 450.342KB/s   31.9155MB/s

2. ICOS API

IBM COS提供非常完整的S3 API,常见开发语言比如java, python, node.js, GO 等都有对应的SDK工具包,开发人员可轻松上手
https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-sdk-about
在这里插入图片描述
访问IBM github站点安装COS python SDK,https://github.com/IBM/ibm-cos-sdk-python,下面是IBM在线文档提供的code样本,只用把其中的cos端点,api key以及服务的 CRN匹配已有环境的设定即可。

import ibm_boto3
from ibm_botocore.client import Config, ClientError

# Constants for IBM COS values
COS_ENDPOINT = "https://s3.private.us-south.cloud-object-storage.appdomain.cloud"
COS_API_KEY_ID = "xxx"
COS_INSTANCE_CRN = "xxx"

def upload_large_file(bucket_name, item_name, file_path):
    print("Starting large file upload for {0} to bucket: {1}".format(item_name, bucket_name))

    # set the chunk size to 5 MB
    part_size = 1024 * 1024 * 5

    # set threadhold to 5 MB
    file_threshold = 1024 * 1024 * 5

    # Create client connection
    cos_cli = ibm_boto3.client("s3",
        ibm_api_key_id=COS_API_KEY_ID,
        ibm_service_instance_id=COS_INSTANCE_CRN,
        config=Config(signature_version="oauth"),
        endpoint_url=COS_ENDPOINT
    )
    
    # set the transfer threshold and chunk size in config settings
    transfer_config = ibm_boto3.s3.transfer.TransferConfig(
        multipart_threshold=file_threshold,
        multipart_chunksize=part_size
    )

    # create transfer manager
    transfer_mgr = ibm_boto3.s3.transfer.TransferManager(cos_cli, config=transfer_config)

    try:
        # initiate file upload
        future = transfer_mgr.upload(file_path, bucket_name, item_name)

        # wait for upload to complete
        future.result()

        print ("Large file upload complete!")
    except Exception as e:
        print("Unable to complete large file upload: {0}".format(e))
    finally:
        transfer_mgr.shutdown()
        
def main()
    upload_large_file('iso-image-bucket', '100G.file', '/data/100G.file' )

if __name__ == "__main__":
    main()

执行上传,如果带宽过大,可以通过设置 part_size和file_threshold来调整传送速度

[root@centos-s3fs ~]# python upload_large_file.py
Starting large file upload for 100G.file to bucket: iso-image-bucket
Large file upload complete!

上传成功!!!
在这里插入图片描述

总结:
除了上述方式以外,IBM COS同时也支持很多s3兼容工具,类似cyberduck,tntdrive, cloudberry, s3browser,包括命令行工具aws s3cli等, 而且如果客户有浏览器的工作站,无需安装任何工具,自带的免费Aspera浏览器插件进行公网跨区域传输文件,相对三方工具传送质量和速度都很有保障

Happy Learning ! :)

本文地址:https://blog.csdn.net/weixin_42599323/article/details/111412387

您对本文有任何疑问!!点此进行留言回复

推荐阅读

猜你喜欢

IBM云对象存储 - Linux主机通过rclone和COS API上传大文件

01-14

Vmware虚拟机搭建Web服务器

01-14

数据取证技术面试题

01-14

Android 10.0系统启动之init进程

01-13

1、Android开发概述

01-12

应用容器化

01-13

拓展阅读

大家都在看

windows10 更新Ubuntu20.04 LTS的方法步骤

06-14

Ubuntu 安装cuda10.1驱动的实现步骤

07-31

详解shell中脚本参数传递的两种方式

11-21

详解shell中脚本参数传递的两种方式

11-21

CentOS7运行.sh脚本提示syntax error: unexpected end of file的解决方法

05-12

shell命令实现当前目录下多个文件合并为一个文件的方法

03-19

Linux查找处理文件名后包含空格的文件(两种方法)

11-25

Apache FlinkCEP 实现超时状态监控的步骤详解

03-09

热门评论