如何导出cnblog里面的md文件

下载文章

下面的脚本是从网站下载md的文件,100篇一页,下载几次即可,修改下脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#! /bin/python
# -*- coding:utf-8 -*-

import requests
import json

# 获取当前登录用户信息
#url = 'https://api.cnblogs.com/api/users'
# 获取个人信息
#url = 'https://api.cnblogs.com/api/blogs/zphj1987'

# 获取个人随笔列表
url="https://api.cnblogs.com/api/blogs/zphj1987/posts?pageSize=100&pageIndex=1"
headers = {"Authorization":"Bearer "+"------token cnblog获取"}
r = requests.get(url, headers=headers)
reslist=r.text
reslist=json.loads(reslist)
print(reslist)
print(json.dumps(reslist,ensure_ascii=False))
for val in reslist:
newval=json.dumps(val,ensure_ascii=False)
print(newval)
print("获取的url")
print(val["Url"])
print("文章标题")
print(val["Title"])
print("获取发布的时间")
print(val["PostDate"].replace("T"," "))
print("获取markdown的地址")
mdurl=val["Url"].replace(".html",".md" )
print(mdurl)
headers = {
"User-Agent": "Apifox/1.0.0 (https://apifox.com)",
"Authorization":"Bearer ------------------token cnblog网站获取----------------",
"Accept": "*/*",
"Host": "www.cnblogs.com",
"Connection": "keep-alive",
}
content = requests.get(mdurl, headers=headers).content
print(content)
print(val["Title"])
print(val["PostDate"].replace("T"," "))
head_content= """---
title: %s
date: %s
tags: "暂未分类"
categories: "暂未分类"
---
"""%(val["Title"].encode('utf-8'),val["PostDate"].replace("T"," ").encode('utf-8'))
print(head_content)
all_content=head_content + content
print(all_content)
with open('output/%s.md' %(val["Title"]),'wb') as file:
file.write(all_content+ '\n')

下载完成后就得到了全部的md文件,这个里面的img引用的还是cnblog的资源地址,我们需要下载相关的资源,然后替换blog内的引用地址

获取资源的文件列表

1
grep "cnblogs" -R *.md |grep -v html > getlist.txt

下载资源并修改资源引用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#! /bin/bash

# 请将文件路径替换为实际的文件路径
# grep "cnblogs" -R *.md > getlist.txt
file_path="getlist.txt"

# 使用while循环和read命令逐行读取文件
while IFS= read -r line; do
# 在这里可以对每一行进行处理,例如打印或进行其他操作
filename=`echo "$line"|awk -F':' '{print $1}'`
#echo $filename
httpad=`echo "$line"|awk -F':' '{ for (i=2; i<=NF; i++) { printf "%s%s", $i, (i<NF ? ":" : "\n") } }' `
#echo $httpad
url=$(echo "$httpad" | grep -oE 'https://[^ )]+')
#echo $url
#wget -N -P ../images/blog/ $url
imgname=`echo $url|awk -F'/' '{print $NF}'`
newpath=/images/blog/$imgname
#echo $newpath
echo "sed -i '' 's|$url|$newpath|g' \"$filename\""
sed -i '' 's|$url|$newpath|g' "$filename"

done < "$file_path"

上面的注释掉了,有的时候可能文章名称特殊,无法完全执行,可以把打印的结果自己再手动执行下即可

总结

经过上面的操作以后,整个博客就迁移出来了,比自己一篇篇处理要快很多,剩余的分类的就自己再处理下