Python 使用 psutil 计算网速和显示进程，如何提高并发？

我刚开始学 fastapi ，写了一些简单的接口，不想搞 netdata 或者 glances 之类的复杂界面，想把服务器基本的参数服务之类的的直接显示在接口里，其中计算网速和显示进程部分严重拖累性能，qps 仅为个位数，不知道异步怎么写，有什么改进的方案，请各位大佬指导一下谢谢。

计算网速，用了 asyncio 但是并发不高，qps 个位数。

async def calculate_network_speed():
    initial_time = time.time()
    initial_bytes_sent = psutil.net_io_counters().bytes_sent
    initial_bytes_recv = psutil.net_io_counters().bytes_recv
    await asyncio.sleep(1)
    current_bytes_sent = psutil.net_io_counters().bytes_sent
    current_bytes_recv = psutil.net_io_counters().bytes_recv
    elapsed_time = time.time() - initial_time
    download_speed = (current_bytes_recv - initial_bytes_recv) / elapsed_time
    upload_speed = (current_bytes_sent - initial_bytes_sent) / elapsed_time
    return download_speed, upload_speed

显示进程，用的同步，写了个装饰器按时间 cache 结果，qps50 左右也不高。

@time_cache(5)
def get_top_processes(slice: int = 10):
    processes = [
        (
            proc.info["pid"],
            proc.info["name"],
            proc.info["cpu_percent"],
            proc.info["memory_percent"],
            " ".join(proc.info["cmdline"]),
        )
        for proc in psutil.process_iter(
            ["pid", "name", "cpu_percent", "memory_percent", "cmdline"]
        )
        if proc.info["cpu_percent"] > 0 or proc.info["memory_percent"] > 0
    ]
    top_cpu = sorted(processes, key=lambda x: x[2], reverse=True)[:slice]
    top_mem = sorted(processes, key=lambda x: x[3], reverse=True)[:slice]
    return top_cpu, top_mem

def time_cache(max_age=10, maxsize=128, typed=False):
    def decorator(fn):
        @lru_cache(maxsize=maxsize, typed=typed)
        def _new(*args, __time_salt, **kwargs):
            return fn(*args, **kwargs)

        @wraps(fn)
        def wrapped(*args, **kwargs):
            return _new(*args, **kwargs, __time_salt=int(time.time() / max_age))

        return wrapped

    return decorator

我的网站显示如下：https://api.naizi.fun/status 安装浏览器插件自动格式化一下就行了。请大佬说说 python 异步咋写，有没有好的参考？