解放双手!用LibreOffice 7.4.5在CentOS 7上搭建无人值守的文档自动转换服务
解放双手用LibreOffice 7.4.5在CentOS 7上搭建无人值守的文档自动转换服务在数字化转型浪潮中企业每天需要处理大量文档格式转换任务。传统人工操作不仅效率低下还容易出错。本文将带您构建一个基于LibreOffice的全自动文档转换服务实现从Word到PDF等格式的批量无人值守转换特别适合需要处理合同、报告等文档的技术团队。1. 环境准备与LibreOffice安装优化1.1 系统依赖检查在CentOS 7上部署前建议先更新系统并安装必要依赖yum update -y yum install -y cairo cups-libs libSM fontconfig ttmkfdir unzip1.2 定制化安装LibreOffice官方RPM包安装后默认占用较大空间可通过以下命令精简不需要的组件yum localinstall --nogpgcheck LibreOffice_*.rpm \ --excludelibreoffice*gnome* \ --excludelibreoffice*kde4* \ --excludelibreoffice*pyuno*安装后验证核心功能/opt/libreoffice7.4/program/soffice --version1.3 中文字体解决方案推荐使用开源字体替代Windows字体避免版权风险yum install -y wqy-microhei-fonts fc-cache -fv2. 构建自动化转换服务核心2.1 高可靠转换脚本开发创建/opt/convert_service/convert.sh脚本包含错误处理和日志记录#!/bin/bash INPUT_DIR/var/lib/doc_convert/queue OUTPUT_DIR/var/lib/doc_convert/done LOG_FILE/var/log/doc_convert/service.log function convert_file() { local input_file$1 local filename$(basename $input_file) echo $(date) - 开始处理: $filename $LOG_FILE /opt/libreoffice7.4/program/soffice --headless --convert-to pdf \ --outdir $OUTPUT_DIR $input_file 2 $LOG_FILE if [ $? -eq 0 ]; then mv $input_file $INPUT_DIR/processed/ echo $(date) - 转换成功: $filename $LOG_FILE else mv $input_file $INPUT_DIR/failed/ echo $(date) - 转换失败: $filename $LOG_FILE fi } export -f convert_file # 主处理循环 while true; do find $INPUT_DIR -maxdepth 1 -type f \( -name *.doc* -o -name *.odt \) \ -exec bash -c convert_file $0 {} \; sleep 5 done2.2 系统服务化配置创建systemd服务单元/etc/systemd/system/doc-convert.service[Unit] DescriptionDocument Conversion Service Afternetwork.target [Service] ExecStart/bin/bash /opt/convert_service/convert.sh Restartalways Userdocuser Groupdocuser EnvironmentHOME/opt/convert_service [Install] WantedBymulti-user.target初始化服务环境useradd -r -s /sbin/nologin docuser mkdir -p /var/lib/doc_convert/{queue,done,processed,failed} chown -R docuser:docuser /var/lib/doc_convert /opt/convert_service systemctl daemon-reload systemctl enable --now doc-convert.service3. 高级功能实现3.1 并发处理优化通过GNU parallel实现多文件并行处理yum install -y parallel # 修改convert.sh中的处理逻辑 find $INPUT_DIR -maxdepth 1 -type f \( -name *.doc* -o -name *.odt \) \ -print0 | parallel -0 -j 4 convert_file3.2 监控与告警集成配置Prometheus监控指标输出# 安装node_exporter文本收集器 mkdir -p /var/lib/node_exporter/textfile_collector cat EOF /opt/convert_service/metrics.sh #!/bin/bash processed\$(ls -1 /var/lib/doc_convert/processed | wc -l) failed\$(ls -1 /var/lib/doc_convert/failed | wc -l) queue\$(ls -1 /var/lib/doc_convert/queue | wc -l) cat EOM /var/lib/node_exporter/textfile_collector/doc_convert.prom # HELP doc_convert_processed Total processed files # TYPE doc_convert_processed counter doc_convert_processed $processed # HELP doc_convert_failed Total failed conversions # TYPE doc_convert_failed counter doc_convert_failed $failed # HELP doc_convert_queue Current queue size # TYPE doc_convert_queue gauge doc_convert_queue $queue EOM EOF # 添加cron任务每分钟更新指标 (crontab -l 2/dev/null; echo * * * * * /bin/bash /opt/convert_service/metrics.sh) | crontab -4. 与企业应用集成方案4.1 REST API接口开发使用Python Flask创建简单API接口from flask import Flask, request, jsonify import subprocess import os import uuid app Flask(__name__) UPLOAD_FOLDER /var/lib/doc_convert/queue ALLOWED_EXTENSIONS {doc, docx, odt} def allowed_file(filename): return . in filename and \ filename.rsplit(., 1)[1].lower() in ALLOWED_EXTENSIONS app.route(/convert, methods[POST]) def upload_file(): if file not in request.files: return jsonify({error: No file part}), 400 file request.files[file] if file.filename : return jsonify({error: No selected file}), 400 if file and allowed_file(file.filename): filename str(uuid.uuid4()) os.path.splitext(file.filename)[1] filepath os.path.join(UPLOAD_FOLDER, filename) file.save(filepath) return jsonify({message: File queued for conversion, job_id: filename}), 202 return jsonify({error: Invalid file type}), 400 if __name__ __main__: app.run(host0.0.0.0, port5000)4.2 与Spring Boot集成配置在Spring Boot应用中添加文件上传和状态检查接口RestController RequestMapping(/api/docs) public class DocumentController { PostMapping(/upload) public ResponseEntityMapString, String uploadDocument( RequestParam(file) MultipartFile file) { // 实现文件上传到转换服务的逻辑 } GetMapping(/status/{jobId}) public ResponseEntityMapString, String checkStatus( PathVariable String jobId) { // 检查转换状态 } GetMapping(/download/{jobId}) public ResponseEntityResource downloadDocument( PathVariable String jobId) { // 下载转换后的文件 } }5. 性能调优与故障排查5.1 内存优化配置编辑LibreOffice运行配置/opt/libreoffice7.4/program/soffice.rc[Office] WorkingSet256 CacheSize128 MaxRecycledDocuments205.2 常见问题解决方案问题现象可能原因解决方案转换超时大文件处理时间长增加--timeout参数值中文乱码字体缓存未更新执行fc-cache -fv服务崩溃内存不足调整JVM参数或增加swap权限拒绝SELinux限制设置chcon或临时禁用SELinux在长时间运行后可能会遇到内存泄漏问题。可以通过定期重启服务来缓解# 添加每日重启任务 echo 0 3 * * * systemctl restart doc-convert.service | crontab -