允许用户做富文本编辑的站点要怎么防 XSS 呢

cpdyj0

2019-03-12 10:05:38 +08:00

后端找个 parser，parse 一遍试试

AngryPanda

2019-03-12 10:09:18 +08:00

lizheming

2019-03-12 10:11:18 +08:00

比较干净的方法只能是白名单，有两种思路，一种是如 1 楼所述用 parser 解析后白名单过滤，一种是自定义一套语法然后非语法的 HTML 全部过滤掉，例如 Markdown 和 UBBCode （暴露年纪了） ....
两种思路其实差不多。需要使用 parser 主要是因为不仅仅只是过滤标签，合法标签里的非法属性也需要过滤，例如`onload`, `onerror` 之类的会触发 JS 行为的属性，这时候用正则去做就挺麻烦了，即使写出来了也没办法维护。

Shynoob

2019-03-12 10:26:10 +08:00

参考下
```java
// 预编译 XSS 过滤正则表达式
private static List<Pattern> xssPatterns = ListUtils.newArrayList(
Pattern.compile("(<\\s*(script|link|style|iframe)([\\s\\S]*?)(>|<\\/\\s*\\1\\s*>))|(</\\s*(script|link|style|iframe)\\s*>)", Pattern.CASE_INSENSITIVE),
Pattern.compile("\\s*(href|src)\\s*=\\s*(\"\\s*(javascript|vbscript):[^\"]+\"|'\\s*(javascript|vbscript):[^']+'|(javascript|vbscript):[^\\s]+)\\s*(?=>)", Pattern.CASE_INSENSITIVE),
Pattern.compile("\\s*on[a-z]+\\s*=\\s*(\"[^\"]+\"|'[^']+'|[^\\s]+)\\s*(?=>)", Pattern.CASE_INSENSITIVE),
Pattern.compile("(eval\\((.*?)\\)|xpression\\((.*?)\\))", Pattern.CASE_INSENSITIVE)
);

/**
* XSS 非法字符过滤，内容以开头的用以下规则（保留标签）
* @author ThinkGem
*/
public static String xssFilter(String text) {
String oriValue = StringUtils.trim(text);
if (text != null){
String value = oriValue;
for (Pattern pattern : xssPatterns) {
Matcher matcher = pattern.matcher(value);
if (matcher.find()) {
value = matcher.replaceAll(StringUtils.EMPTY);
}
}
// 如果开始不是 HTML，XML，JOSN 格式，则再进行 HTML 的 "、<、> 转码。
if (!StringUtils.startsWithIgnoreCase(value, "") // HTML
&& !StringUtils.startsWithIgnoreCase(value, "<?xml ") // XML
&& !StringUtils.contains(value, "id=\"FormHtml\"") // JFlow
&& !(StringUtils.startsWith(value, "{") && StringUtils.endsWith(value, "}")) // JSON Object
&& !(StringUtils.startsWith(value, "[") && StringUtils.endsWith(value, "]")) // JSON Array
){
StringBuilder sb = new StringBuilder();
for (int i = 0; i < value.length(); i++) {
char c = value.charAt(i);
switch (c) {
case '>':
sb.append("＞");
break;
case '<':
sb.append("＜");
break;
case '\'':
sb.append("＇");
break;
case '\"':
sb.append("＂");
break;
// case '&':
// sb.append("＆");
// break;
// case '#':
// sb.append("＃");
// break;
default:
sb.append(c);
break;
}
}
value = sb.toString();
}
if (logger.isInfoEnabled() && !value.equals(oriValue)){
logger.info("xssFilter: {} <=<=<= {}", value, text);
}
return value;
}
return null;
}

// 预编译 SQL 过滤正则表达式
private static Pattern sqlPattern = Pattern.compile("(?:')|(?:--)|(/\\*(?:.|[\\n\\r])*?\\*/)|(\\b(select|update|and|or|delete|insert|trancate|char|into|substr|ascii|declare|exec|count|master|into|drop|execute)\\b)", Pattern.CASE_INSENSITIVE);

/**
* SQL 过滤，防止注入，传入参数输入有 select 相关代码，替换空。
* @author ThinkGem
*/
public static String sqlFilter(String text){
if (text != null){
String value = text;
Matcher matcher = sqlPattern.matcher(text);
if (matcher.find()) {
value = matcher.replaceAll(StringUtils.EMPTY);
}
if (logger.isWarnEnabled() && !value.equals(text)){
logger.info("sqlFilter: {} <=<=<= {}", value, text);
return StringUtils.EMPTY;
}
return value;
}
return null;
}
```

lastpass

2019-03-12 10:29:08 +08:00

emmm，你别直接使用 html 模式不行吗?

wusatosi

2019-03-12 10:36:31 +08:00

用<noscript>包一下就好了吧

mytry

2019-03-12 10:42:56 +08:00

在后端正则过滤基本会出 XSS，用三方库 parser 的也容易出 XSS。

最稳妥的方案，在前端渲染前用浏览器内置的 DOMParser API 把字符串转成 DOM 树，只保留白名单的标签和属性，然后再添加到文档里。

corningsun

2019-03-12 10:44:33 +08:00

XSS 还是得后端处理的，建议直接保存原文到数据库，查询返回前端前做一次转化，后端有很多现成的库可以参考的。

比如说：Jsoup 有很多预定义好的白名单供选择

public static Whitelist relaxed() {
return new Whitelist()
.addTags(
"a", "b", "blockquote", "br", "caption", "cite", "code", "col",
"colgroup", "dd", "div", "dl", "dt", "em", "h1", "h2", "h3", "h4", "h5", "h6",
"i", "img", "li", "ol", "p", "pre", "q", "small", "span", "strike", "strong",
"sub", "sup", "table", "tbody", "td", "tfoot", "th", "thead", "tr", "u",
"ul")

.addAttributes("a", "href", "title")
.addAttributes("blockquote", "cite")
.addAttributes("col", "span", "width")
.addAttributes("colgroup", "span", "width")
.addAttributes("img", "align", "alt", "height", "src", "title", "width")
.addAttributes("ol", "start", "type")
.addAttributes("q", "cite")
.addAttributes("table", "summary", "width")
.addAttributes("td", "abbr", "axis", "colspan", "rowspan", "width")
.addAttributes(
"th", "abbr", "axis", "colspan", "rowspan", "scope",
"width")
.addAttributes("ul", "type")

.addProtocols("a", "href", "ftp", "http", "https", "mailto")
.addProtocols("blockquote", "cite", "http", "https")
.addProtocols("cite", "cite", "http", "https")
.addProtocols("img", "src", "http", "https")
.addProtocols("q", "cite", "http", "https")
;
}

代码调用示例：

import org.apache.commons.lang3.StringUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.safety.Whitelist;

public class JsoupUtil {

private static final Whitelist WHITELIST = Whitelist.relaxed();

static {
// 富文本编辑时一些样式是使用 style 来进行实现的
// 比如红色字体 style="color:red;"
// 所以需要给所有标签添加 style 属性
WHITELIST.addAttributes(":all", "style");
WHITELIST.addAttributes(":all", "class");
WHITELIST.addAttributes(":all", "target");
WHITELIST.addAttributes(":all", "spellcheck");
}

private JsoupUtil() {
}

private static final Document.OutputSettings OUTPUT_SETTINGS = new Document.OutputSettings().prettyPrint(false);

public static String clean(String content) {
if (StringUtils.isBlank(content)) {
return "";
}
return Jsoup.clean(content, "", WHITELIST, OUTPUT_SETTINGS);
}

}

mytry

2019-03-12 10:45:26 +08:00

@mytry 几年前写过个演示的，可以参考下~ https://www.etherdream.com/FunnyScript/richtext_safe_render.html

wktrf

2019-03-12 10:45:50 +08:00

我记得贴吧的是直接将标签转换了，例如 img 标签转换成[image]baidu.com[image]了, 不过听说图片内也可以藏脚本，所以上传的图片还得处理

krixaar

2019-03-12 10:46:33 +08:00

让用户用所见即所得编辑器写 Markdown 或者用 BBcode ？

111111111111

2019-03-12 10:47:58 +08:00

UBB 代码了解一下？

oldmyth

2019-03-12 10:48:42 +08:00

建议直接百度
www.baidu.com

ikaros

2019-03-12 21:59:20 +08:00

@lizheming 对,最后思考了一下可能还是用 markdown 比较好
@cpdyj0 这种方式感觉还是有点问题,主要是#3 说的，有些属性什么的也要禁用
@Shynoob 只是禁用标签还是不够的
@lastpass 当前在用的一个富文本编辑器提交上来就是 html, 一时脑子抽没想起还有 markdown 这种

cpdyj0

2019-03-13 00:06:23 +08:00

@ikaros 是的，parse 一遍的话首先要保证 parser 尽量没有严重的 bug，然后就是白名单了，不顺眼的东西都不能出现

laozhoubuluo

2019-03-13 09:24:16 +08:00

如果是新站点，建议还是 UBB 代码或者 Markdown 实现，因为 html 乱七八糟的属性太多，很容易过滤不到......
如果已经是 html 了，那就认命吧......
Discuz! X 门户就是 HTML，门户出了 XSS 没法无伤修复......