实际开发过程中，会因为 lgn 倍的时间复杂度优化而牺牲代码可读性吗

V2EX = way to explore

V2EX 是一个关于分享和探索的地方

现在注册

已注册用户请登录

这是一个创建于 1782 天前的主题，其中的信息可能已经有所发展或是发生改变。

1. 场景描述：

TLDR: 在 Draft.js 富文本框架中，根据文本的 type 可以将每一行文本划分为

 'unstyled', 'header-one', 'header-two', ..., 'header-six', 'atomic', // 其他类型暂不考虑

其中，在处理 unstyled 文本时，

如果仅包含内置行内样式（BOLD, ITALIC, ...），那么直接根据 inlineStyleRanges 进行操作
如果出现了 Decorator 实现的行内样式（比如 @某人），那么必须对 inlineStyleRanges 和 EntityRanges 先进行合并操作，即根据 offset 排序数组（升序）

已知 inlineStyleRanges 和 EntityRanges 都是 sorted.

根据对象的 offset 属性，对两个 sorted 的数组进行升序排序 （ Merge Sorted Arrays)

2. 样例

类型定义：

type InlineStyleRangesType = {
    offset: number,
    length: number,
    style: 'BOLD' | 'CODE' | 'ITALIC' | 'STRIKETHROUGH' | 'UNDERLINE',
}[]

type EntityRangesType = {
    offset: number,
    length: number,
    key: number,
}[]

type SortedArrayType = {
    offset: number,
    length: number,
    key?: number,
    style?: 'BOLD' | 'CODE' | 'ITALIC' | 'STRIKETHROUGH' | 'UNDERLINE',
}[]

输入：

const a: InlineStyleRangesType = [
    { offset: 1, length: 1, style: "BOLD" },
    { offset: 9, length: 2, style: "ITALIC" },
]

const b: EntityRangesType = [
    { offset: 4, length: 1, key: 0 },
    { offset: 12, length: 1, key: 1 },
]

输出：

[
    { offset: 1, length: 1, style: "BOLD" }, 
    { offset: 4, length: 1, key: 0 },
    { offset: 9, length: 2, style: "ITALIC" },
    { offset: 12, length: 1, key: 1 },
]

3. 解法

本人解法

// version 1

function mergeSortedArrays(
    inlineStyleRanges: InlineStyleRangesType,
    entityRanges: EntityRangesType
): SortedArrayType {
    const ranges = []

    inlineStyleRanges.forEach((range) => ranges.push(range))
    entityRanges.forEach((range) => ranges.push(range))

    ranges.sort((a, b) => a.offset - b.offset)

    return ranges
}

Interview Cake 解法

// version 2

function mergeSortedArrays(
    inlineStyleRanges: InlineStyleRangesType,
    EntityRanges: EntityRangesType
): SortedArrayType {
    const ranges = (inlineStyleRanges as SortedArrayType).concat(EntityRanges)

    ranges.sort((a, b) => a.offset - b.offset)

    return ranges
}

// version 3

function mergeSortedArrays(
    inlineStyleRanges: InlineStyleRangesType,
    EntityRanges: EntityRangesType
): SortedArrayType {
    const ranges = []

    let currentIndexEntity = 0;
    let currentIndexInlineStyle = 0;
    let currentIndexMerged = 0;

    while (currentIndexMerged < (inlineStyleRanges.length + EntityRanges.length)) {

	const isInlineStyleRangesExhausted = currentIndexInlineStyle >= inlineStyleRanges.length;
	const isEntityRangesExhausted = currentIndexEntity >= EntityRanges.length;

	// Case: next comes from inlineStyleRanges
	// inlineStyleRanges must not be exhausted, and EITHER:
	// 1) EntityRanges IS exhausted, or
	// 2) The current element in inlineStyleRanges is less
	//    than the current element in EntityRanges
	if (!isInlineStyleRangesExhausted
		&& (isEntityRangesExhausted
		|| (inlineStyleRanges[currentIndexInlineStyle].offset < EntityRanges[currentIndexEntity].offset)
	)) {

		ranges[currentIndexMerged] = inlineStyleRanges[currentIndexInlineStyle];
		currentIndexInlineStyle++;

	// Case: next comes from EntityRanges
	} else {
		ranges[currentIndexMerged] = EntityRanges[currentIndexEntity];
		currentIndexEntity++;
	}

	currentIndexMerged++;
    }

    return ranges
}

4. 算法分析

	version 1	version 2	version 3
是否利用了 sorted 特性	否	否	是
时间复杂度 *	O(nlgn)	O(nlgn)	O(n)
空间复杂度	O(n)	O(n)	O(n)
代码可读性	★★★★★	★★★★	★

* version 2 其实要优于 version 1 （常数倍）

25 条回复 • 2020-10-22 12:55:52 +08:00

YadongZhang

2020-10-21 16:48:52 +08:00

v2ex 似乎不支持 ts 语法高亮

jatai

2020-10-21 18:06:33 +08:00 via Android

实际开发过程中, 即使没有任何技术难度, 也要牺牲代码的可读性, 为了显得不可替代

raaaaaar

2020-10-21 18:08:08 +08:00 via Android

代码可读性真的和性能冲突吗？我有点疑惑，看那些经常炫技的代码，代码规范做好，命名，封装，注释，文档这么一套下来，真的会可读性低？

Leonard

2020-10-21 18:09:12 +08:00

注释写得好就行

gfreezy

2020-10-21 18:12:33 +08:00

一行也就 1000 个字符串顶天了吧。啥排序在性能上都没啥区别吧。O(n^2) 在实际使用中都没啥区别吧？

如果确实追求性能，可以单独写个通用 mergesort 函数，贴个算法文档链接，再复杂的实现都没关系，一般都直接读文档就好了，没必要读代码。

dreamapple

2020-10-21 18:24:41 +08:00 via Android

脚本语言直接调用内置 sort 和手写实现 merge 真的不知道哪个快，profile 一下说不定差别不大。实际项目亿级别的我选择 mapreduce

YadongZhang

2020-10-21 18:26:21 +08:00 via Android

@gfreezy #5 好像是这样的，每行文本长度有限

xumng123

2020-10-21 20:45:09 +08:00 via iPhone

不是实时系统或者上百万的并发量，没啥区别

liyunlong41

2020-10-21 20:50:55 +08:00 via iPhone

突然想到力扣链表找环的问题，有快慢指针和哈希表两种做法，两者时间复杂度相同但是前者省内存，可是前者难于理解和维护，后者简单易懂，假如实际业务上有类似地方，我可能会用哈希表的做法。

anguiao

2020-10-21 21:02:02 +08:00 via Android

如果是写偏底层的类库，应该考虑一下性能问题。
如果只是业务代码的话，在没有碰到性能瓶颈之前，个人觉得还是可读性比较重要。

ipwx

2020-10-21 21:35:04 +08:00

不是，现在前端程序员都这样了嘛，version 3 也叫不可读？

Sasasu

2020-10-21 22:32:29 +08:00

version 3 就是刻意写的很难懂，cpp 标准库版本

template<class InputIt1, class InputIt2, class OutputIt>
OutputIt merge(InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
OutputIt d_first)
{
for (; first1 != last1; ++d_first) {
if (first2 == last2) {
return std::copy(first1, last1, d_first);
}
if (*first2 < *first1) {
*d_first = *first2;
++first2;
} else {
*d_first = *first1;
++first1;
}
}
return std::copy(first2, last2, d_first);
}

Sasasu

2020-10-21 22:36:44 +08:00

第一种就是刻意写的难懂，刻意使用下标，合并一个数组用光的条件到主循环里，合并判断条件到控制里

```
template<class InputIt1, class InputIt2,
class OutputIt, class Compare>
OutputIt merge(InputIt1 first1, InputIt1 last1,
InputIt2 first2, InputIt2 last2,
OutputIt d_first, Compare comp)
{
for (; first1 != last1; ++d_first) {
if (first2 == last2) {
return std::copy(first1, last1, d_first);
}
if (comp(*first2, *first1)) {
*d_first = *first2;
++first2;
} else {
*d_first = *first1;
++first1;
}
}
return std::copy(first2, last2, d_first);
}
```

ref https://en.cppreference.com/w/cpp/algorithm/merge

beidounanxizi

2020-10-22 00:54:37 +08:00

怎么说吧 premature optimize 是不可取的，lgN 也要看基数 N 啊 N 100 内有区别嘛？

YadongZhang

2020-10-22 01:02:49 +08:00 via Android

@beidounanxizi #14 复杂度说的是 asymptotic analysis

YadongZhang

2020-10-22 01:09:59 +08:00 via Android

这个函数会在另一个主函数里调用，而这个主函数是个 O(n^2) 的算法实现，优化的部分在这儿

YadongZhang

2020-10-22 01:22:51 +08:00 via Android

@ipwx #11 edge cases 的考量

islxyqwe

2020-10-22 08:17:49 +08:00

建一个 O(n)的<T>(a:T[],b:T[],cmp:(x:T,y:T)=>number):T[]然后调用

gimp

2020-10-22 08:25:17 +08:00

如何编写无法维护的代码，让自己稳拿铁饭碗 ;-)

https://coderlmn.github.io/frontEndCourse/unmaintainable.html

Nugine0

2020-10-22 09:07:18 +08:00 via Android

合并排序数组不是基础算法吗？应该有现成的轮子。
如果没有轮子，函数上面注释写清楚，加几个测试，没人会关心里面代码可不可读。

bleepbloop

2020-10-22 09:57:57 +08:00

工作好几年了，从没遇到数据量大到需要刻意优化算法的地步，可能我太菜了吧

py2ex

2020-10-22 10:28:44 +08:00

以为 lgn 是什么新词，原来是 log N

YadongZhang

2020-10-22 11:17:19 +08:00

@py2ex #22

CLRS 3e, Chap.1, Sec.3.2, P57

By equation (3.15) 换底公式, changing the base of a logarithm from one constant to another changes the value of the logarithm by only a constant factor, and so we shall often use the notation “lg n” when we don’t care about constant factors, such as in O-notation.

shadeofgod

2020-10-22 12:55:07 +08:00

合并两个有序数组的也可以很容易写出兼顾可读性的写法吧？

shadeofgod

2020-10-22 12:55:52 +08:00