经验总结博客相关如何导出cnblog里面的md文件
zphj1987下载文章
下面的脚本是从网站下载md的文件,100篇一页,下载几次即可,修改下脚本
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
|
import requests import json
url="https://api.cnblogs.com/api/blogs/zphj1987/posts?pageSize=100&pageIndex=1" headers = {"Authorization":"Bearer "+"------token cnblog获取"} r = requests.get(url, headers=headers) reslist=r.text reslist=json.loads(reslist) print(reslist) print(json.dumps(reslist,ensure_ascii=False)) for val in reslist: newval=json.dumps(val,ensure_ascii=False) print(newval) print("获取的url") print(val["Url"]) print("文章标题") print(val["Title"]) print("获取发布的时间") print(val["PostDate"].replace("T"," ")) print("获取markdown的地址") mdurl=val["Url"].replace(".html",".md" ) print(mdurl) headers = { "User-Agent": "Apifox/1.0.0 (https://apifox.com)", "Authorization":"Bearer ------------------token cnblog网站获取----------------", "Accept": "*/*", "Host": "www.cnblogs.com", "Connection": "keep-alive", } content = requests.get(mdurl, headers=headers).content print(content) print(val["Title"]) print(val["PostDate"].replace("T"," ")) head_content= """--- title: %s date: %s tags: "暂未分类" categories: "暂未分类" --- """%(val["Title"].encode('utf-8'),val["PostDate"].replace("T"," ").encode('utf-8')) print(head_content) all_content=head_content + content print(all_content) with open('output/%s.md' %(val["Title"]),'wb') as file: file.write(all_content+ '\n')
|
下载完成后就得到了全部的md文件,这个里面的img引用的还是cnblog的资源地址,我们需要下载相关的资源,然后替换blog内的引用地址
获取资源的文件列表
1
| grep "cnblogs" -R *.md |grep -v html > getlist.txt
|
下载资源并修改资源引用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| #! /bin/bash
file_path="getlist.txt"
while IFS= read -r line; do filename=`echo "$line"|awk -F':' '{print $1}'` httpad=`echo "$line"|awk -F':' '{ for (i=2; i<=NF; i++) { printf "%s%s", $i, (i<NF ? ":" : "\n") } }' ` url=$(echo "$httpad" | grep -oE 'https://[^ )]+') imgname=`echo $url|awk -F'/' '{print $NF}'` newpath=/images/blog/$imgname echo "sed -i '' 's|$url|$newpath|g' \"$filename\"" sed -i '' 's|$url|$newpath|g' "$filename"
done < "$file_path"
|
上面的注释掉了,有的时候可能文章名称特殊,无法完全执行,可以把打印的结果自己再手动执行下即可
总结
经过上面的操作以后,整个博客就迁移出来了,比自己一篇篇处理要快很多,剩余的分类的就自己再处理下