software¶

2023年5月6日
分类于 software
需要 7 分钟阅读时间

In short, the commit introduced by Linux 6.2.13:

commit 0d30989fe9a176565d360376d4bc2ea1c61cbbac
Author: Liam R. Howlett <Liam.Howlett@oracle.com>
Date:   Fri Apr 14 14:59:19 2023 -0400

    mm/mmap: regression fix for unmapped_area{_topdown}

    commit 58c5d0d6d522112577c7eeb71d382ea642ed7be4 upstream.

    The maple tree limits the gap returned to a window that specifically fits
    what was asked.  This may not be optimal in the case of switching search
    directions or a gap that does not satisfy the requested space for other
    reasons.  Fix the search by retrying the operation and limiting the search
    window in the rare occasion that a conflict occurs.

    Link: https://lkml.kernel.org/r/20230414185919.4175572-1-Liam.Howlett@oracle.com
    Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}")
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

While fixing a BUG, a new BUG is introduced, causing MAP_32BIT to fail to work sometimes, and Xilinx's Digilent driver uses this parameter, causing mmap to fail and unable to recognize the FPGA.

The new BUG has been fixed in [PATCH v2] maple_tree: Make maple state reusable after mas_empty_area().

2023年5月6日
分类于 software
需要 5 分钟阅读时间

Linux 6.2.13 引入的 BUG 导致 Vivado 无法识别 FPGA

English version

TLDR

简单来说，Linux 6.2.13 引入的 commit：

commit 0d30989fe9a176565d360376d4bc2ea1c61cbbac
Author: Liam R. Howlett <Liam.Howlett@oracle.com>
Date:   Fri Apr 14 14:59:19 2023 -0400

    mm/mmap: regression fix for unmapped_area{_topdown}

    commit 58c5d0d6d522112577c7eeb71d382ea642ed7be4 upstream.

    The maple tree limits the gap returned to a window that specifically fits
    what was asked.  This may not be optimal in the case of switching search
    directions or a gap that does not satisfy the requested space for other
    reasons.  Fix the search by retrying the operation and limiting the search
    window in the rare occasion that a conflict occurs.

    Link: https://lkml.kernel.org/r/20230414185919.4175572-1-Liam.Howlett@oracle.com
    Fixes: 3499a13168da ("mm/mmap: use maple tree for unmapped_area{_topdown}")
    Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

修复了 BUG 的同时，引入了新的 BUG，导致 MAP_32BIT 有时无法工作，而 Xilinx 的 Digilent 下载器代码使用了这个参数，导致 mmap 失败，无法识别 FPGA。

新 BUG 在 [PATCH v2] maple_tree: Make maple state reusable after mas_empty_area() 中被修复。

2023年1月13日
分类于 software
需要 2 分钟阅读时间

在 FreeBSD 上运行 code-server

背景

最近在 FreeBSD 上移植开源软件，但是因为 vscode 官方不支持 FreeBSD，所以尝试使用 code-server。

2022年10月29日
分类于 software
需要 4 分钟阅读时间

在 ppc64le Linux 上运行 Nix

背景

之前尝试过在 ppc64le 的机器上运行 Nix，当时的尝试是把代码克隆下来编译，我还写了一个 Docker 脚本：

2022年9月19日
分类于 software
需要 2 分钟阅读时间

Buildroot 2020.08 的 Fakeroot 版本过旧导致的兼容性问题

背景

最近在给之前的 Buildroot 2020.09 增加新的软件包，结果编译的时候报错：

mknod: ....../dev/console: Operation not permitted

还有一个背景是前段时间把系统升级到了 Ubuntu 22.04 LTS。

2022年8月5日
分类于 software
需要 5 分钟阅读时间

从 TeX 到 PDF 的过程

背景

今天跑 xdvipdfmx 的时候出现了报错，忽然想研究一下，DVI 格式是什么，TeX 是如何一步步变成 PDF 的。

2022年8月2日
分类于 software
需要 6 分钟阅读时间

用 Nix 编译 Rust 项目

背景

Rust 项目一般是用 Cargo 管理，但是它的缺点是每个项目都要重新编译一次所有依赖，硬盘空间占用较大，不能跨项目共享编译缓存。调研了一下，有若干基于 Nix 的 Rust 构建工具：

cargo2nix: https://github.com/cargo2nix/cargo2nix
carnix: 不再更新
crane: https://github.com/ipetkov/crane
crate2nix: https://github.com/kolloch/crate2nix
naersk: https://github.com/nix-community/naersk
nocargo: https://github.com/oxalica/nocargo

下面我分别来尝试一下这几个工具的使用。

2022年7月26日
分类于 software
需要 5 分钟阅读时间

invalid date 报错与时区的关系

背景

最近在验题的时候，@HarryChen 发现了一个现象：

$ date -d "1919-04-13"
date: invalid date ‘1919-04-13’
$ TZ=UTC date -d "1919-04-13"
Sun Apr 13 00:00:00 UTC 1919

也就是说，这个现象与时区有关，那么为啥 1919-04-13 是一个不合法的日期呢？

时区

实际上，对于某一个时区来说，有的时间是不存在的，最常见的就是夏令时。在 Timezone DB 里可以看到，恰好在 1919 年 4 月 13 日发生了一次 UTC+8 到 UTC+9 的变化，因此零点变成了一点，就变成了不合法的日期。

这个数据，实际上保存在 tzdata 中，可以用 zdump 工具查看：

$ tzdata -v Asia/Shanghai
Asia/Shanghai  Fri Dec 13 20:45:52 1901 UTC = Sat Dec 14 04:45:52 1901 CST isdst=0
Asia/Shanghai  Sat Dec 14 20:45:52 1901 UTC = Sun Dec 15 04:45:52 1901 CST isdst=0
Asia/Shanghai  Sat Apr 12 15:59:59 1919 UTC = Sat Apr 12 23:59:59 1919 CST isdst=0
Asia/Shanghai  Sat Apr 12 16:00:00 1919 UTC = Sun Apr 13 01:00:00 1919 CDT isdst=1
Asia/Shanghai  Tue Sep 30 14:59:59 1919 UTC = Tue Sep 30 23:59:59 1919 CDT isdst=1
Asia/Shanghai  Tue Sep 30 15:00:00 1919 UTC = Tue Sep 30 23:00:00 1919 CST isdst=0
Asia/Shanghai  Fri May 31 15:59:59 1940 UTC = Fri May 31 23:59:59 1940 CST isdst=0
Asia/Shanghai  Fri May 31 16:00:00 1940 UTC = Sat Jun  1 01:00:00 1940 CDT isdst=1
Asia/Shanghai  Sat Oct 12 14:59:59 1940 UTC = Sat Oct 12 23:59:59 1940 CDT isdst=1
Asia/Shanghai  Sat Oct 12 15:00:00 1940 UTC = Sat Oct 12 23:00:00 1940 CST isdst=0
Asia/Shanghai  Fri Mar 14 15:59:59 1941 UTC = Fri Mar 14 23:59:59 1941 CST isdst=0
Asia/Shanghai  Fri Mar 14 16:00:00 1941 UTC = Sat Mar 15 01:00:00 1941 CDT isdst=1
Asia/Shanghai  Sat Nov  1 14:59:59 1941 UTC = Sat Nov  1 23:59:59 1941 CDT isdst=1
Asia/Shanghai  Sat Nov  1 15:00:00 1941 UTC = Sat Nov  1 23:00:00 1941 CST isdst=0
Asia/Shanghai  Fri Jan 30 15:59:59 1942 UTC = Fri Jan 30 23:59:59 1942 CST isdst=0
Asia/Shanghai  Fri Jan 30 16:00:00 1942 UTC = Sat Jan 31 01:00:00 1942 CDT isdst=1
Asia/Shanghai  Sat Sep  1 14:59:59 1945 UTC = Sat Sep  1 23:59:59 1945 CDT isdst=1
Asia/Shanghai  Sat Sep  1 15:00:00 1945 UTC = Sat Sep  1 23:00:00 1945 CST isdst=0
Asia/Shanghai  Tue May 14 15:59:59 1946 UTC = Tue May 14 23:59:59 1946 CST isdst=0
Asia/Shanghai  Tue May 14 16:00:00 1946 UTC = Wed May 15 01:00:00 1946 CDT isdst=1
Asia/Shanghai  Mon Sep 30 14:59:59 1946 UTC = Mon Sep 30 23:59:59 1946 CDT isdst=1
Asia/Shanghai  Mon Sep 30 15:00:00 1946 UTC = Mon Sep 30 23:00:00 1946 CST isdst=0
Asia/Shanghai  Mon Apr 14 15:59:59 1947 UTC = Mon Apr 14 23:59:59 1947 CST isdst=0
Asia/Shanghai  Mon Apr 14 16:00:00 1947 UTC = Tue Apr 15 01:00:00 1947 CDT isdst=1
Asia/Shanghai  Fri Oct 31 14:59:59 1947 UTC = Fri Oct 31 23:59:59 1947 CDT isdst=1
Asia/Shanghai  Fri Oct 31 15:00:00 1947 UTC = Fri Oct 31 23:00:00 1947 CST isdst=0
Asia/Shanghai  Fri Apr 30 15:59:59 1948 UTC = Fri Apr 30 23:59:59 1948 CST isdst=0
Asia/Shanghai  Fri Apr 30 16:00:00 1948 UTC = Sat May  1 01:00:00 1948 CDT isdst=1
Asia/Shanghai  Thu Sep 30 14:59:59 1948 UTC = Thu Sep 30 23:59:59 1948 CDT isdst=1
Asia/Shanghai  Thu Sep 30 15:00:00 1948 UTC = Thu Sep 30 23:00:00 1948 CST isdst=0
Asia/Shanghai  Sat Apr 30 15:59:59 1949 UTC = Sat Apr 30 23:59:59 1949 CST isdst=0
Asia/Shanghai  Sat Apr 30 16:00:00 1949 UTC = Sun May  1 01:00:00 1949 CDT isdst=1
Asia/Shanghai  Fri May 27 14:59:59 1949 UTC = Fri May 27 23:59:59 1949 CDT isdst=1
Asia/Shanghai  Fri May 27 15:00:00 1949 UTC = Fri May 27 23:00:00 1949 CST isdst=0
Asia/Shanghai  Sat May  3 17:59:59 1986 UTC = Sun May  4 01:59:59 1986 CST isdst=0
Asia/Shanghai  Sat May  3 18:00:00 1986 UTC = Sun May  4 03:00:00 1986 CDT isdst=1
Asia/Shanghai  Sat Sep 13 16:59:59 1986 UTC = Sun Sep 14 01:59:59 1986 CDT isdst=1
Asia/Shanghai  Sat Sep 13 17:00:00 1986 UTC = Sun Sep 14 01:00:00 1986 CST isdst=0
Asia/Shanghai  Sat Apr 11 17:59:59 1987 UTC = Sun Apr 12 01:59:59 1987 CST isdst=0
Asia/Shanghai  Sat Apr 11 18:00:00 1987 UTC = Sun Apr 12 03:00:00 1987 CDT isdst=1
Asia/Shanghai  Sat Sep 12 16:59:59 1987 UTC = Sun Sep 13 01:59:59 1987 CDT isdst=1
Asia/Shanghai  Sat Sep 12 17:00:00 1987 UTC = Sun Sep 13 01:00:00 1987 CST isdst=0
Asia/Shanghai  Sat Apr 16 17:59:59 1988 UTC = Sun Apr 17 01:59:59 1988 CST isdst=0
Asia/Shanghai  Sat Apr 16 18:00:00 1988 UTC = Sun Apr 17 03:00:00 1988 CDT isdst=1
Asia/Shanghai  Sat Sep 10 16:59:59 1988 UTC = Sun Sep 11 01:59:59 1988 CDT isdst=1
Asia/Shanghai  Sat Sep 10 17:00:00 1988 UTC = Sun Sep 11 01:00:00 1988 CST isdst=0
Asia/Shanghai  Sat Apr 15 17:59:59 1989 UTC = Sun Apr 16 01:59:59 1989 CST isdst=0
Asia/Shanghai  Sat Apr 15 18:00:00 1989 UTC = Sun Apr 16 03:00:00 1989 CDT isdst=1
Asia/Shanghai  Sat Sep 16 16:59:59 1989 UTC = Sun Sep 17 01:59:59 1989 CDT isdst=1
Asia/Shanghai  Sat Sep 16 17:00:00 1989 UTC = Sun Sep 17 01:00:00 1989 CST isdst=0
Asia/Shanghai  Sat Apr 14 17:59:59 1990 UTC = Sun Apr 15 01:59:59 1990 CST isdst=0
Asia/Shanghai  Sat Apr 14 18:00:00 1990 UTC = Sun Apr 15 03:00:00 1990 CDT isdst=1
Asia/Shanghai  Sat Sep 15 16:59:59 1990 UTC = Sun Sep 16 01:59:59 1990 CDT isdst=1
Asia/Shanghai  Sat Sep 15 17:00:00 1990 UTC = Sun Sep 16 01:00:00 1990 CST isdst=0
Asia/Shanghai  Sat Apr 13 17:59:59 1991 UTC = Sun Apr 14 01:59:59 1991 CST isdst=0
Asia/Shanghai  Sat Apr 13 18:00:00 1991 UTC = Sun Apr 14 03:00:00 1991 CDT isdst=1
Asia/Shanghai  Sat Sep 14 16:59:59 1991 UTC = Sun Sep 15 01:59:59 1991 CDT isdst=1
Asia/Shanghai  Sat Sep 14 17:00:00 1991 UTC = Sun Sep 15 01:00:00 1991 CST isdst=0
Asia/Shanghai  Mon Jan 18 03:14:07 2038 UTC = Mon Jan 18 11:14:07 2038 CST isdst=0
Asia/Shanghai  Tue Jan 19 03:14:07 2038 UTC = Tue Jan 19 11:14:07 2038 CST isdst=0

可以看到，它列出来了历史上 Asia/Shanghai 时区的变化历史。具体的历史，可以查看中国时区。

此外，历史上，从儒略历到格里高利历的演变过程，也出现了一段“不存在”的日期，如 Setting October 14 ,1582 fails in java.sql.Date。

2022年7月11日
分类于 software
需要 3 分钟阅读时间

切换 ConnectX-4 为以太网模式

背景

最近在给机房配置网络，遇到一个需求，就是想要把 ConnectX-4 当成以太网卡用，它既支持 Infiniband，又支持 Ethernet，只不过默认是 Infiniband 模式，所以需要用 mlxconfig 工具来做这个切换。

MST 安装

要使用 mlxconfig，就需要安装 MFT(Mellanox Firmware Tools)。我们用的是 Debian bookworm，于是要下载 DEB：

wget https://www.mellanox.com/downloads/MFT/mft-4.20.1-14-x86_64-deb.tgz
unar mft-4.20.1-14-x86_64-deb.tgz
cd mft-4.20.1-14-x86_64-deb

UPDATE 2022-10-28: 现在最新版本 mft-4.21.0-99 已经修复了下面出现的编译问题。

wget https://www.mellanox.com/downloads/MFT/mft-4.21.0-99-x86_64-deb.tgz
unar mft-4.21.0-99-x86_64-deb.tgz
cd mft-4.21.0-99-x86_64-deb

尝试用 sudo ./install.sh 安装，发现 dkms 报错。查看日志，发现是因为内核过高（5.18），有函数修改了用法，即要把 pci_unmap_single 的调用改为 dma_unmap_single，并且修改第一个参数，如 linux commit a2e759612e5ff3858856fe97be5245eecb84e29b 指出的那样：

-           pci_unmap_single(dev->pci_dev, dev->dma_props[i].dma_map, DMA_MBOX_SIZE, DMA_BIDIRECTIONAL);
+           dma_unmap_single(&dev->pci_dev->dev, dev->dma_props[i].dma_map, DMA_MBOX_SIZE, DMA_BIDIRECTIONAL);

修改完以后，手动 sudo dkms install kernel-mft-dkms/4.20.1，发现就编译成功了。再手动安装一下 mft 并启动服务：

$ sudo dpkg -i DEBS/mft_4.20.1-14_amd64.deb
$ sudo mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI module - Success
[warn] mst_pciconf is already loaded, skipping
Create devices
Unloading MST PCI module (unused) - Success
$ sudo mst status
MST modules:
------------
    MST PCI module is not loaded
    MST PCI configuration module loaded

MST devices:
------------
/dev/mst/mtxxxx_pciconf0         - PCI configuration cycles access.
                                   domain:bus:dev.fn=0000:xx:xx.0 addr.reg=yy data.reg=zz cr_bar.gw_offset=-1
                                   Chip revision is: 00

既然已经安装好了，最后执行 mlxconfig 即可切换为以太网：

$ sudo mlxconfig -d /dev/mst/mtxxxx_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Device #1:
----------

Device type:    ConnectX4
Name:           REDACTED
Description:    ConnectX-4 VPI adapter card; FDR IB (56Gb/s) and 40GbE; dual-port QSFP28; PCIe3.0 x8; ROHS R6
Device:         /dev/mst/mtxxxx_pciconf0

Configurations:                              Next Boot       New
         LINK_TYPE_P1                        IB(1)           ETH(2)
         LINK_TYPE_P2                        IB(1)           ETH(2)

 Apply new Configuration? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

显示各个配置可能的选项和内容：sudo mlxconfig -d /dev/mst/mtxxxx_pciconf0 show_confs

整个安装流程在仓库 https://github.com/jiegec/mft-debian-bookworm 中用脚本实现。

UPDATE: 太新的 MFT 版本不支持比较旧的网卡，例如 4.22.1-LTS 支持 ConnectX-3，但 4.26.1-LTS 就不支持了。

VMware ESXi

如果要在 ESXi 上把网卡改成以太网模式，可以参考下面的文档：

https://docs.nvidia.com/networking/pages/releaseview.action?pageId=15049813
https://docs.nvidia.com/networking/plugins/servlet/mobile?contentId=15051769#content/view/15051769

命令（ESXi 7.0U3）：

scp *.vib root@esxi:/some/path
esxcli software vib install -v /some/path/MEL_bootbank_mft_4.21.0.703-0.vib
esxcli software vib install -v /some/path/MEL_bootbank_nmst_4.21.0.703-1OEM.703.0.0.18434556.vib
reboot
/opt/mellanox/bin/mst start
/opt/mellanox/bin/mst status -vv
/opt/mellanox/bin/mlxfwmanager --query
/opt/mellanox/bin/mlxconfig -d mt4115_pciconf0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2
reboot

然后就可以看到网卡了。

2022年7月1日
分类于 software
需要 3 分钟阅读时间

Nix Cookbook

背景

最近在尝试 NixOS 和在 macOS 上跑 Nix，下面记录一些我在使用过程中遇到的一些小问题和解决思路。

Home Manager

Home Manager 描述用户默认看到的程序，而 NixOS 的配置是所有用户的。

配置文件

配置文件：~/.config/nixpkgs/home.nix

应用配置文件：

home-manager switch

应用 Flakes 配置文件并显示变化：

#!/usr/bin/env python3
import os

user = os.getenv("USER")
home = f"/nix/var/nix/profiles/per-user/{user}/"
old = home + os.readlink(f"{home}profile")
os.system("home-manager switch --flake .")
new = home + os.readlink(f"{home}profile")
os.system(f"nix store diff-closures {old} {new}")

常用配置

常用的 Home Manager 配置：

# Allow unfree
nixpkgs.config.allowUnfree = true;
nixpkgs.config.allowUnfreePredicate = (pkg: true);

# User wide packages
home.packages = with pkgs; [
  xxx
];

生成 Nix 配置 ~/.config/nix/nix.conf：

# Enable flakes & setup TUNA mirror
nix.package = pkgs.nix;
nix.settings = {
  experimental-features = [ "nix-command" "flakes" ];
  substituters = [ "https://mirrors.tuna.tsinghua.edu.cn/nix-channels/store" "https://cache.nixos.org/"];
};

Shell 环境变量和 PATH：

home.sessionVariables = {
  A = "B";
};
home.sessionPath = [
  "$HOME/.local/bin"
];

离线 Home Manager 文档（用 home-manager-help 命令打开）：

manual.html.enable = true;

配置 direnv

direnv 是一个 shell 插件，它的用途是进入目录的时候，会根据 .envrc 来执行命令，比如自动进入 nix-shell 等。配置：

programs.direnv.enable = true;

然后在工程路径下，编写 .envrc：

use_nix

那么，在 shell 进入目录的时候，就会自动获得 nix-shell 的环境变量。

配置 fish

可以在 home manager 配置中编写 fish 配置，这样它会自动生成 ~/.config/fish/config.fish 文件：

programs.fish.enable = true;
programs.fish.shellAliases = {
  a = "b";
};
programs.fish.shellInit = ''
  # Rust
  set -x PATH ~/.cargo/bin $PATH
'';

配置 git

同理，也可以在 home manager 中配置 git：

programs.git.enable = true;
programs.git.lfs.enable = true;
programs.git.userName = "Someone";
programs.git.userEmail = "mail@example.com";
programs.git.extraConfig = {
  core = {
    quotepath = false;
  };
  pull = {
    rebase = false;
  };
};
programs.git.ignores = [
  ".DS_Store"
];

生成的 git 配置在 ~/.config/git/config 和 ~/.config/git/ignore。

实用工具

search.nixos.org

search.nixos.org 可以搜索 nixpkgs 上的各种包，也可以看到不同平台支持情况。缺点是看不出是否 unfree 和 broken，并且一些 darwin os-specific 的包不会显示。

打包

可以很容易地编写 default.nix 来给自己的项目打包。

CMake

对于一个简单的 cmake 程序，可以按照如下的格式编写 default.nix：

with import <nixpkgs> {};

stdenv.mkDerivation {
  name = "xyz";
  version = "1.0";

  src = ./.;

  nativeBuildInputs = [
    cmake
  ];

  buildInputs = [
    xxx
    yyy
  ];
}

可以用 nix-build 命令来构建，生成结果会在当前目录下创建一个 result 的软链接，里面就是安装目录。

由于 nix-build 的时候也会创建 build 目录，为了防止冲突，建议开发的时候用其他的名字。

Qt

对于 Qt 项目来说，由于有不同的 Qt 大版本，所以实现的时候稍微复杂一些，要拆成两个文件，首先是 default.nix：

with import <nixpkgs> {};

libsForQt5.callPackage ./xxx.nix { }

这里就表示用 qt5 来编译，那么编写 xxx.nix 的时候，传入的 qtbase 等库就是 qt5 的版本：

{ stdenv, qtbase, wrapQtAppsHook, cmake }:

stdenv.mkDerivation {
  name = "xxx";
  version = "1.0";

  src = ./.;

  nativeBuildInputs = [
    cmake
    wrapQtAppsHook # must-have for qt apps
  ];

  buildInputs = [
    qtbase
  ];
}

实际测试中发现，运行的程序可能会报告 Could not initialize GLX 的错误，这个方法可以通过 wrapProgram 添加环境变量解决：

  # https://github.com/NixOS/nixpkgs/issues/66755#issuecomment-657305962
  # Fix "Could not initialize GLX" error
  postInstall = ''
    wrapProgram "$out/bin/xxx" --set QT_XCB_GL_INTEGRATION none
  '';

开发环境

除了打包以外，通常还会在 shell.nix 中定义开发环境需要的包：

{ pkgs ? import <nixpkgs> {}
}:

pkgs.mkShell {
  buildInputs = with pkgs; [
    cmake
  ];
}

然后可以用 nix-shell 来进入开发环境。如果不希望外面的环境变量传递进去，可以用 nix-shell --pure。

搜索

按名字搜索一个包：

nix search nixpkgs xxx
nix-env -qaP yyy

Nixpkgs

可以从 TUNA 镜像上先 clone 一份到本地，然后再添加 github 上游作为 remote。

从本地 nixpkgs 安装：

nix-env -f $PWD -iA xxx

从本地 nixpkgs 编译：

nix-build $PWD -A xxx

从本地 nixpkgs 开一个 shell：

nix-shell -I nixpkgs=$PWD -p xxx

Nixpkgs 的分支

Nixpkgs 开发分支主要有三个：

master
staging-next
staging

发 PR 的时候，如果需要重新编译的包比较多，就要往 staging 提交；比较少，就往 staging-next 提交。

CI 会自动把 master 合并到 staging-next，也会把 staging-next 合并到 staging。这样 master 上的改动也会同步到 staging 上。

维护者会定义把 staging 手动合并到 staging-next，然后手动合并 staging-next 到 staging。这个的周期一般是一周多，可以在 pr 里搜索 staging-next。

Hydra 会编译 master 分支和 staging-next 分支上的包，不会编译 staging 分支上的包。同理，binary cache 上前两个分支上有的，而 staging 上没有的。

参考：https://nixos.org/manual/nixpkgs/stable/#submitting-changes-commit-policy

提交贡献

注意事项：

升级一些比较老的写法，例如 mkDerivation -> stdenv.mkDerivation，Qt 的 hook
引入 patch 的时候，建议先向上游提 PR，如果合并了，就直接用上游的 commit；如果没有合并，退而求其次可以用 pr 的 patch；如果没有提 PR 的渠道，或者上游的 commit 无法应用到当前的版本，或者这个 patch 没有普适性，再写本地的 patch；注释里要写打 patch 的原因和相关的 issue 链接，什么时候不再需要这个 patch，并且起个名字
不知道 SHA256 的时候，可以注释掉或者随便写一个，这样 nix build 的时候会重新下载，然后把正确的显示出来
对于有命令的包，可以添加 testVersion 测试
长时间没有 review 的 pr，可以在 discourse 上回复帖子。
更新之前，可以搜索一下，有没有相关的 issue 或者 pr；如果有 issue，新建 pr 的时候要提一下

一些常见的问题：

编译器打开 -fno-common 后，可能会导致一些链接问题
Darwin 上的 clang 没有打开 LTO，也没有打开 Universal 支持
AArch64 Darwin 上的 gfortran 的 stack protector 不工作，需要把 hardening 关掉
当编译报错是 -Werror 导致的时候，按照 warning 类型在 NIX_CFLAGS_COMPILE 中添加 -Wno-error=warning-type
configure 版本较老，需要引入 autoreconfHook

阅读文档：https://github.com/NixOS/nixpkgs/blob/master/doc/contributing/quick-start.chapter.md 和 https://github.com/NixOS/nixpkgs/blob/master/doc/contributing/coding-conventions.chapter.md

VSCode

可以安装 https://github.com/nix-community/vscode-nix-ide/ 插件，配合 rnix-lsp 来使用。

杂项

可以用 nix copy 命令在不同机器的 store 之间复制文件，见 nix copy - copy paths between Nix stores。

2022年5月31日
分类于 software
需要 4 分钟阅读时间

在 libvirt 中运行 RISC-V 虚拟机

背景

我在 libvirt 中跑了几个 KVM 加速的虚拟机，然后突发奇想，既然 libvirt 背后是 qemu，然后 qemu 是支持跨指令集的，那是否可以让 libvirt 来运行 RISC-V 架构的虚拟机？经过一番搜索，发现可以跑 ARM：How To: Running Fedora-ARM under QEMU，既然如此，我们也可以试试用 libvirt 来运行 RV64 虚拟机。

准备 rootfs

第一步是根据 Debian 的文档 Creating a riscv64 chroot 来创建 rootfs，然后再用 virt-make-fs 来打包。

首先是用 mmdebstrap 来生成一个 chroot：

$ sudo mkdir -p /tmp/riscv64-chroot
$ sudo apt install mmdebstrap qemu-user-static binfmt-support debian-ports-archive-keyring
$ sudo mmdebstrap --architectures=riscv64 --include="debian-ports-archive-keyring" sid /tmp/riscv64-chroot "deb http://deb.debian.org/debian-ports sid main" "deb http://deb.debian.org/debian-ports unreleased main"

进入 chroot 以后，进行一些配置：

$ sudo chroot /tmp/riscv64-chroot
$ apt update
$ apt install linux-image-riscv64 u-boot-menu vim
# set root password
$ passwd

然后修改 /etc/default/u-boot 文件，添加如下的配置：

# change ro to rw, set root device
U_BOOT_PARAMETERS="rw noquiet root=/dev/vda1"
# fdt is provided by qemu
U_BOOT_FDT_DIR="noexist"

然后运行 u-boot-update 生成配置文件 /boot/extlinux/extlinux.conf。

到这里，rootfs 已经准备完毕。

尝试在 QEMU 中启动

接下来，可以参考 Setting up a riscv64 virtual machine 先启动一个 qemu 来测试一下是否可以正常工作：

首先制作一个 qcow2 格式的镜像：

$ sudo virt-make-fs --partition=gpt --type=ext4 --size=+10G --format=qcow2 /tmp/riscv64-chroot rootfs.qcow2
$ qemu-img info rootfs.qcow2
image: rootfs.qcow2
file format: qcow2
virtual size: 11.4 GiB (12231971328 bytes)
disk size: 1.33 GiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false
    extended l2: false

然后启动 qemu，配置好 OpenSBI 和 U-Boot 的路径：

$ sudo apt install qemu-system-misc opensbi u-boot-qemu
$ sudo qemu-system-riscv64 -nographic -machine virt -m 8G \
    -bios /usr/lib/riscv64-linux-gnu/opensbi/generic/fw_jump.elf \
    -kernel /usr/lib/u-boot/qemu-riscv64_smode/uboot.elf \
    -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-device,rng=rng0 \
    -append "console=ttyS0 rw root=/dev/vda1" \
    -device virtio-blk-device,drive=hd0 -drive file=rootfs.qcow2,format=qcow2,id=hd0 \
    -device virtio-net-device,netdev=usernet -netdev user,id=usernet,hostfwd=tcp::22222-:22

如果系统可以正常工作，看到下面的输出，下一步就可以配置 libvirt 了。

[    6.285024] Run /init as init process
Loading, please wait...
Starting version 251.1-1
[    7.743714] virtio_ring: module verification failed: signature and/or required key missing - tainting kernel
[    8.071762] virtio_blk virtio1: [vda] 23838189 512-byte logical blocks (12.2 GB/11.4 GiB)
[    8.181210]  vda: vda1
Begin: Loading essential drivers ... done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
Warning: fsck not present, so skipping root file system
[    9.003143] EXT4-fs (vda1): mounted filesystem with ordered data mode. Quota mode: none.
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... done.
[    9.754151] Not activating Mandatory Access Control as /sbin/tomoyo-init does not exist.
[    9.808860] random: fast init done
[   10.651361] systemd[1]: Inserted module 'autofs4'
[   10.735574] systemd[1]: systemd 251.1-1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
[   10.736902] systemd[1]: Detected architecture riscv64.

Welcome to Debian GNU/Linux bookworm/sid!

配置 libvirt

首先，打开 virt-manager，在向导中，可以在下拉菜单选择自定义的架构，选择 riscv64 和 virt，然后选择 Import existing disk image，找到刚刚创建的 qcow2 文件。

创建好以后，我们还不能直接启动，因为此时还没有配置 OpenSBI 和 U-Boot。由于 virt-aa-helper 会检查 OpenSBI 和 U-Boot 的路径，要求它们不能在 /usr/lib 路径下：

/*
 * Don't allow access to special files or restricted paths such as /bin, /sbin,
 * /usr/bin, /usr/sbin and /etc. This is in an effort to prevent read/write
 * access to system files which could be used to elevate privileges. This is a
 * safety measure in case libvirtd is under a restrictive profile and is
 * subverted and trying to escape confinement.
 *
 * Note that we cannot exclude block devices because they are valid devices.
 * The TEMPLATE file can be adjusted to explicitly disallow these if needed.
 *
 * RETURN: -1 on error, 0 if ok, 1 if blocked
 */
    const char * const restricted[] = {
        "/bin/",
        "/etc/",
        "/lib",
        "/lost+found/",
        "/proc/",
        "/sbin/",
        "/selinux/",
        "/sys/",
        "/usr/bin/",
        "/usr/lib",
        "/usr/sbin/",
        "/usr/share/",
        "/usr/local/bin/",
        "/usr/local/etc/",
        "/usr/local/lib",
        "/usr/local/sbin/"
    };

所以，我手动把 U-Boot 和 OpenSBI 复制一份到 /var/lib 下：

$ sudo mkdir -p /var/lib/custom
$ cd /var/lib/custom
$ sudo cp -r /usr/lib/u-boot/qemu-riscv64_smode .
$ sudo cp -r /usr/lib/riscv64-linux-gnu .

此时，再去配置 libvirt 的 XML 配置文件：

  <os>
    <type arch='riscv64' machine='virt'>hvm</type>
    <loader type='rom'>/var/lib/custom/riscv64-linux-gnu/opensbi/generic/fw_jump.elf</loader>
    <kernel>/var/lib/custom/qemu-riscv64_smode/uboot.elf</kernel>
    <boot dev='hd'/>
  </os>

其余部分不用修改。在下面可以看到 virt-manager 已经设置好了 qemu-system-riscv64:

<devices>
  <emulator>/usr/bin/qemu-system-riscv64</emulator>
  <disk type='file' device='disk'>
    <driver name='qemu' type='qcow2'/>
    <source file='/path/to/rootfs.qcow2'/>
    <target dev='vda' bus='virtio'/>
    <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
  </disk>

保存以后直接启动，就完成了在 libvirt 中运行 Debian RV64 虚拟机的目的。

2022年5月2日
分类于 software
需要 2 分钟阅读时间

LoongArch64 工具链构建

最近因为龙芯杯的原因，想自己搞个 LoongArch64 的交叉编译工具链试试，结果遇到了很多坑，最后终于算是搞出来了。

一开始是想搞一个 newlib 的工具链，比较简单，而且之前做过一个仓库：jiegec/riscv-toolchain，就是构建的 riscv64-unknown-elf 工具链，照着 riscv-gnu-toolchain 就可以了。不过研究发现，newlib 还不支持 loongarch，目前只有 glibc 支持，只好硬着头皮上了。

于是我就在 riscv-toolchain 的基础上搞 loongarch64-unknown-linux-gnu，也就是带 glibc 的工具链，结果发现遇到很多坑。首先编译 libgcc 的时候就找不到头文件，于是先要从 glibc 和 linux 安装头文件到 sysroot 里面，对于 sysroot 里面的头文件路径到底是 include 还是 usr/include 也折腾了半天。然后编译 libgcc 又各种出问题，最后折腾了半天，结果是 gcc stage1 和 glibc 都没问题，gcc stage2 会报链接错误，但是不管它也能用，可以编译出正常的程序，毕竟 libc 是好的。

于是转念一想，要不要试试 crosstool-ng。克隆了一份上游的版本，照着 riscv 的部分抄了一份变成了 loongarch，然后把 config 里面的 linux/glibc/gcc/binutils-gdb 都替换为 custom location，这样我就可以用上游的最新版本了。中途还遇到了 crosstool-ng 对 gcc 12/13 不兼容的 bug，还好下面有人提出了解决方法。这些都搞定以后，终于构建出了一个完整的 loongarch64-unknown-linux-gnu 工具链。仓库地址是 jiegec/ct-ng-loongarch64，需要配合添加了 LoongArch 的 jiegec/crosstool-ng loongarch 分支使用。

最后得到的工具链各组件版本如下：

loongarch64-unknown-linux-gnu-gcc (crosstool-NG 1.25.0_rc2.1_7e21141) 13.0.0 20220502 (experimental)
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

GNU ld (crosstool-NG 1.25.0_rc2.1_7e21141) 2.38.50.20220502
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.
GNU gdb (crosstool-NG 1.25.0_rc2.1_7e21141) 13.0.50.20220502-git
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

之后有时间的话，再把 qemu 和系统搞起来跑跑。

UPDATE: GCC 12.1 发布了，试了一下这个正式版本可以正确地编译。目前还需要使用 HEAD 版本的 binutils 和龙芯的 glibc 和 linux。

参考文档：

2022年1月25日
分类于 software
需要 2 分钟阅读时间

RISC-V Vector 1.0 工具链构建

不久前 RVV 1.0 标准终于是出来了，但是工具链的支持目前基本还处于刚 upstream 还没有 release 的状态。而目前 RVV 1.0 的支持主要在 LLVM 上比较活跃，因此也是采用 LLVM Clang + GCC Newlib Toolchain 的方式进行配合，前者做 RVV 1.0 的编译，后者提供 libc 等基础库。

UPDATE: LLVM 14 已经发布，这个版本已经支持 RVV 1.0，直接从 https://apt.llvm.org 等地安装 LLVM 14 即可。

LLVM Clang 直接采用 upstream 即可。编译选项：

$ cmake -G Ninja ../llvm -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_INSTALL_PREFIX=/prefix/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_TARGETS_TO_BUILD="RISCV"
$ ninja
$ ninja install
$ /prefix/llvm/bin/clang --version
clang version 14.0.0 (https://github.com/llvm/llvm-project.git 8d298355ca3778a47fd6b3110aeee03ea5e8e02b)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /data/llvm/bin

还需要配合一个 GCC 工具链才可以完整地工作。可以直接采用 riscv-gnu-toolchain nightly 版本，比如 riscv64-elf-ubuntu-20.04-nightly-2022.01.17-nightly.tar.gz。下载以后解压，得到 riscv 目录，GCC 版本是比较新的：

$ ~/riscv/bin/riscv64-unknown-elf-gcc --version
riscv64-unknown-elf-gcc (GCC) 11.1.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

但是如果编译 C++ 程序，链接的时候就会报错：prefixed ISA extension must separate with _，这是因为 riscv-gnu-toolchain 仓库的 binutils 版本不够新，在 upstream 的 binutils 里面已经修复了这个问题。所以 clone 下来，然后编译，覆盖掉 riscv-gnu-toolchain 里面的 binutils：

UPDATE: binutils 2.38 已经发布，用这个版本即可。

$ ../configure --target=riscv64-unknown-elf --prefix=$HOME/riscv --disable-gdb --disable-sim --disable-libdecnumber --disable-readline
$ make
$ make install
$ ~/riscv/bin/riscv64-unknown-elf-ld --version
GNU ld (GNU Binutils) 2.38.50.20220125
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.

然后编译程序的时候，使用 clang，配置参数 --gcc-toolchain=~/riscv 即可让 clang 找到 GNU 工具链。这样就可以编译出来 RVV 1.0 的程序了：

$ /llvm/bin/clang --target=riscv64-unknown-elf -O2 -march=rv64gcv1p0 -menable-experimental-extensions -mllvm --riscv-v-vector-bits-min=256 --gcc-toolchain=$HOME/riscv add.cpp -o add
$ /data/llvm/bin/llvm-objdump --mattr=+v -S add
   1020e: 57 70 04 c5   vsetivli        zero, 8, e32, m1, ta, mu
   10212: 07 64 03 02   vle32.v v8, (t1)
   10216: 87 e4 03 02   vle32.v v9, (t2)
   1021a: 93 07 07 fe   addi    a5, a4, -32
   1021e: 07 e5 07 02   vle32.v v10, (a5)
   10222: 87 65 07 02   vle32.v v11, (a4)
   10226: 57 14 85 02   vfadd.vv        v8, v8, v10
   1022a: d7 94 95 02   vfadd.vv        v9, v9, v11
   1022e: 93 07 05 fe   addi    a5, a0, -32
   10232: 27 e4 07 02   vse32.v v8, (a5)
   10236: a7 64 05 02   vse32.v v9, (a0)

可以看到 llvm 的自动向量化是工作的。此外，也可以编写 rvv intrinsic。

2021年12月29日
分类于 software
需要 1 分钟阅读时间

XRDP 和 NVIDIA 显卡兼容性问题

背景

最近在尝试配置 XRDP，发现它在有 NVIDIA 的机器上启动远程桌面后会黑屏，查看错误信息可以看到：

xf86OpenConsole: Cannot open virtual console 1 (Permission denied)

2024 年注：一些比较新的发行版上携带的 xrdp 已经没有这个问题，此外不要忘记安装 xorgxrdp。

解决方法

XRDP 作者在 issue #2010 中提到了解决方法：

修改 /etc/xrdp/sesman.ini，在 [Xorg] 部分里加上下面的配置：

param=-configdir
param=/

实际上就是不让 Xorg 加载 nvidia xorg 驱动，这样就绕过了问题。

2021年12月26日
分类于 software
需要 2 分钟阅读时间

NVIDIA 驱动和 CUDA 版本信息速查

背景

之前和 NVIDIA 驱动和 CUDA 搏斗比较多，因此记录一下一些常用信息，方便查询。

常用地址

CUDA 版本与 NVIDIA 驱动兼容性

可以通过 apt show cuda-runtime-x-x 找到：

cuda 12.6 >= 560 (Release Notes: 525)
cuda 12.5 >= 555 (Release Notes: 525)
cuda 12.4 >= 550 (Release Notes: 525)
cuda 12.3 >= 545 (Release Notes: 525)
cuda 12.2 >= 535 (Release Notes: 525)
cuda 12.1 >= 530 (Release Notes: 525)
cuda 12.0 >= 525 (Release Notes: 525)
cuda 11.8 >= 520 (Release Notes: 450)
cuda 11.7 >= 515 (Release Notes: 450)
cuda 11.6 >= 510 (Release Notes: 450)
cuda 11.5 >= 495 (Release Notes: 450)
cuda 11.4 >= 470 (Release Notes: 450)
cuda 11.3 >= 465 (Release Notes: 450)
cuda 11.2 >= 460 (Release Notes: 450)
cuda 11.1 >= 455 (Release Notes: 450)
cuda 11.0 >= 450 (Release Notes: 450)
cuda 10.2 >= 440
cuda 10.1 >= 418
cuda 10.0 >= 410
cuda 9.2 >= 396
cuda 9.1 >= 390
cuda 9.0 >= 384

使用 nvidia-smi 看到的 CUDA 版本，通常就是这个驱动在上表里对应的 CUDA 版本，例如内核驱动版本是 470 的话，看到的 CUDA 版本就是 11.4。

实际上兼容的驱动版本会比 APT 宣称的更多一些：官方文档里面写了 CUDA 11.x 可以兼容 NVIDIA >= 450，CUDA 12.x 可以兼容 NVIDIA >= 525。

CUDA 版本和 GCC/Clang 版本兼容性

可以在 cuda/include/crt/host_config.h 文件里找到：

cuda 12.8: gcc <= 14, 3.2 < clang < 20
cuda 12.6: gcc <= 13, 3.2 < clang < 19
cuda 12.3: gcc <= 12, 3.2 < clang < 16
cuda 12.1: gcc <= 12, 3.2 < clang < 16
cuda 12.0: gcc <= 12, 3.2 < clang < 15
cuda 11.8: gcc <= 11, 3.2 < clang < 15
cuda 11.5: gcc <= 11
cuda 11.4: gcc <= 10
cuda 11.3: gcc <= 10, 3.2 < clang < 12
cuda 11.1: gcc <= 10, 3.2 < clang < 11
cuda 11.0: gcc <= 9, 3.2 < clang < 10
cuda 10.2: gcc <= 8, 3.2 < clang < 9
cuda 10.1: gcc <= 8, 3.2 < clang < 9
cuda 10.0: gcc <= 7
cuda 9.1: gcc <= 6

CUDA 版本与显卡兼容性

编译选项与显卡对应关系 https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

可以在 nvcc --help 搜索 gpu-architecture 找到：

cuda 12.8 sm_50 to sm_120a
cuda 12.3 sm_50 to sm_90a
cuda 12.1 sm_50 to sm_90a
cuda 12.0 sm_50 to sm_90a
cuda 11.8 sm_35 to sm_90
cuda 11.4 sm_35 to sm_87
cuda 11.3 sm_35 to sm_86
cuda 11.1 sm_35 to sm_86
cuda 11.0 sm_35 to sm_80
cuda 10.2 sm_30 to sm_75
cuda 10.0 sm_30 to sm_75
cuda 9.1 sm_30 to sm_72
cuda 9.0 sm_30 to sm_70

显卡的 Compute Capability 可以在 https://developer.nvidia.com/cuda-gpus 找到：

B200: 100
H100: 90
RTX 4090: 89
A100: 80
V100: 70
P100: 60

升级 NVIDIA 驱动

升级后，需要 rmmod 已有的，再 modprobe 新的：

sudo rmmod nvidia_uvm nvidia_drm nvidia_modeset nvidia && sudo modprobe nvidia

如果发现 rmmod 失败，可以 lsof /dev/nvidiactl 查看谁在占用。DGX OS 上需要停止：

sudo systemctl stop nvsm.service
sudo systemctl stop nvidia-dcgm.service

除了 /dev/nvidia* 可能被占用以外，还需要用 lsof 检查 /dev/dri/render*。

2021年9月2日
分类于 software
需要 1 分钟阅读时间

Locale 影响排序的特殊副作用

背景

最近在答疑的时候，发现同一条命令在不同系统上行为不同，一开始以为是 bash 版本问题，排查后最后发现是 locale 的问题。一个例子如下：

$ cat poc.txt | tr '\\n' ' '
1 + - * / \ a b A B 一 二 测 试 α
$ LANG="" sort poc.txt | tr '\\n' ' '
* + - / 1 A B \ a b α 一 二 测 试
$ LANG="zh_CN.UTF-8" sort poc.txt | tr '\\n' ' '
* + - / \ 1 测 二 试 一 a A b B α
$ LANG="en_US.UTF-8" sort poc.txt | tr '\\n' ' '
* + - / \ 1 a A b B α 一 二 测 试

注意 1 a A 的顺序，在不同的 locale 下结果不同。

网上也有关于这个问题的讨论：

https://unix.stackexchange.com/questions/75341/specify-the-sort-order-with-lc-collate-so-lowercase-is-before-uppercase
https://stackoverflow.com/questions/43448655/weird-behavior-of-bash-glob-regex-ranges

2021年7月24日
分类于 software
需要 1 分钟阅读时间

配置 homebridge-broadlink-rm-pro

背景

最近发现空调遥控器电池有点不足，有时候会自动关机，于是拿出以前买的 Broadlink RM mini 3 充当远程的空调遥控器使用。为了方便手机上配置，分别采用了官方的 App 智慧星和 homebridge 进行配置。

步骤

首先用官方的智慧星配置好 Broadlink RM mini 3 的网络，然后配置 homebridge-broadlink-rm-pro。最早的插件作者不怎么更新了，这个版本是目前用的比较多的一个 fork。

安装好以后，在 Home 里面可以看到 Scan Code 的开关。打开以后，用遥控器在 Broadlink RM mini 3 附近按按键，就可以在 Homebridge 日志里看到 hex code 了。然后，就按照插件教程里的方法写配置，例子如下：

{
        "platform": "BroadlinkRM",
        "name": "Broadlink RM",
        "accessories": [
        {
                "name": "Air Conditioner",
                "type": "air-conditioner",
                "noHumidity": true,
                "minTemperature": 26,
                "maxTemperature": 28,
                "defaultCoolTemperature": 27,
                "data": {
                        "off": "2600...",
                        "cool28": {
                                "data": "2600..."
                        },
                        "cool27": {
                                "data": "2600..."
                        },
                        "cool26": {
                                "data": "2600..."
                        }
                }
        }]
}

这样就可以在手机上方便地控制空调温度了。测试了一下，可以用 Siri 说“设置空调为 XX 度”，也是完全可以工作的。

P.S. 小米的空气净化器现在可以用插件 https://github.com/torifat/xiaomi-mi-air-purifier#readme，之前博客里写的那一个已经不更新了。

2021年7月16日
分类于 software
需要 1 分钟阅读时间

Nginx 处理 POST 请求出现 Internal Server Error 排查一则

前言

最近一个服务忽然出现问题，用户反馈，HTTP POST 一个小的 body 不会出错，POST 一个大的 body 就会 500 Internal Server Error。

排查

观察后端日志，发现没有出错的那一个请求。观察 Nginx 日志，发现最后一次日志是几个小时前。最后几条 Nginx 日志写的是 a client request body is buffered to a temporary file。

结论

继续研究后，发现是硬盘满了。Nginx 在处理 POST body 的时候，如果 body 超过阈值，会写入到临时文件中：

Syntax: client_body_buffer_size size;
Default: client_body_buffer_size 8k|16k;
Context: http, server, location
Sets buffer size for reading client request body. In case the request body is larger than the buffer, the whole body or only its part is written to a temporary file. By default, buffer size is equal to two memory pages. This is 8K on x86, other 32-bit platforms, and x86-64. It is usually 16K on other 64-bit platforms.

详见 https://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_buffer_size

这就可以解释为什么 Nginx 返回 500 而且没有转发到后端，也可以解释为什么 Nginx 没有输出新的错误日志。

2021年3月13日
分类于 software
需要 3 分钟阅读时间

Gnome 的 Fractional Scaling

背景

最近发现部分软件（包括 Google Chrome，Firefox 和 Visual Studio Code）在 125% 的 Fractional Scaling 模式下会很卡。找到了一些临时解决方法，但是很不优雅，也很麻烦。所以深入研究了一下 Fractional Scaling 的工作方式。

临时解决方法

根据关键字，找到了 Chrome menus too slow after enabling fractional scaling in Ubuntu 20.04。按它的方法，关闭 Google Chrome 的硬件加速，发现卡顿问题确实解决了。

类似地，也可以关闭 VSCode 的硬件加速，在 Firefox 里也可以找到相应的设置。这样操作确实可以解决问题。但是，对于每一个出问题的应用都这样搞一遍，还是挺麻烦的。

另一个思路是，不使用 Fractional Scaling，而只是把字体变大。但毕竟和我们想要的效果不大一样。

一些发现

在物理机进行了一些实验以后，发现一个现象：125% 的时候卡顿，而其他比例（100%，150%，175%，200%）都不卡顿。

网上一顿搜到，找到了 xrandr 工具。下面是观察到的一些现象（GNOME 设置分辨率一直是 1920x1080）：

放缩比例	xrandr 显示的分辨率	xrandr 显示的 transform
100%	1920x1080	diag(1.0, 1.0, 1.0)
125%	3072x1728	diag(1.6, 1.6, 1.0)
150%	2560x1440	diag(1.33, 1.33, 1.0)
175%	2208x1242	diag(1.15, 1.15, 1.0)
200%	1920x1080	diag(1.0, 1.0, 1.0)

在 xrandr 文档中，写了：transform 是一个 3x3 矩阵，矩阵乘以输出的点的坐标得到图形缓存里面的坐标。

由此可以猜想：fractional scaling 的工作方式是，把绘制的 buffer 调大，然后再用 transform 把最终输出分辨率调成 1920x1080。可以看到，xrandr 显示的分辨率除以 transform 对应的值，就是 1920x1080。但这并不能解释 100% 和 200% 的区别，所以肯定还漏了什么信息。

翻了翻 mutter 实现 fractional scaling 的 pr，找到了实现 scale 的一部分：

if (clutter_actor_get_resource_scale (priv->actor, &resource_scale) &&
    resource_scale != 1.0f)
  {
    float paint_scale = 1.0f / resource_scale;
    cogl_matrix_scale (&modelview, paint_scale, paint_scale, 1);
  }

然后找到了一段对 scale 做 ceiling 的代码：

if (_clutter_actor_get_real_resource_scale (priv->actor, &resource_scale))
  {
    ceiled_resource_scale = ceilf (resource_scale);
    stage_width *= ceiled_resource_scale;
    stage_height *= ceiled_resource_scale;
  }

这样，100% 和其他比例就区分开了。

另外，也在代码中发现：

#define SCALE_FACTORS_PER_INTEGER 4
#define SCALE_FACTORS_STEPS (1.0 / (float) SCALE_FACTORS_PER_INTEGER)
#define MINIMUM_SCALE_FACTOR 1.0f
#define MAXIMUM_SCALE_FACTOR 4.0f

这段代码规定了比例只能是 25% 的倍数。

我也试了一下用 xrandr --scale 1.5x1.5：效果就是窗口看起来都更小了，分辨率变成了 2880x1620，transform 是 diag(1.5, 1.5, 1.0)。

虚拟机测试

接着，用虚拟机做了一些测试。为了在 GNOME over Wayland 上使用 fractional scaling，需要运行：

$ gsettings set org.gnome.mutter experimental-features "['scale-monitor-framebuffer']"

接着又做了类似上面的测试（GNOME 设置分辨率一直是 2560x1600）：

放缩比例	xrandr 显示的分辨率
100%	2560x1600
125%	2048x1280
150%	1704x1065
175%	1464x915
200%	1280x800

在这个测试中，xrandr 显示的 transform 一直都是单位矩阵；还用了来自 xyproto/wallutils 的 wayinfo 命令查看输出的分辨率，一直是 2560x1600，DPI 一直是 96。用 wallutils 的 xinfo 看到的结果和 xrandr 一致（通过 XWayland）。但是和物理机有一点不同：物理机有一个选项问要不要打开 fractional scaling，下面还会提示性能下降的问题；但是虚拟机上并没有这个提示，而是直接给了一些 Scale 比例的选项。

尝试了一下，在 GNOME over X11 上是找不到 fractional scaling 的（没有出现设置 scale 的选项）。找到一个实现这个功能的 fork：https://github.com/puxplaying/mutter-x11-scaling，不过没有尝试过。

我也尝试在虚拟机中用 xrandr --scale，结果就是输出黑屏，需要重启 gdm 来恢复到登录界面。

更新：由于物理机使用的是 Ubuntu，想到是不是 Ubuntu 采用了上面那个 fork 的 patch，然后就在 changelog 中看到：

mutter (3.38.1-1ubuntu1) groovy; urgency=medium

  * Merge with debian, including new upstream version, remaining changes:
    - debian/gbp.conf: update upstream branch to point to ubuntu/master
    - debian/patches/x11-Add-support-for-fractional-scaling-using-Randr.patch:
      + X11: Add support for fractional scaling using Randr
  * d/p/clutter-backend-x11-Don-t-set-the-font-dpi-computed-on-X1.patch:
    - Dropped, applied upstream

也找到了对应的 patch 文件。这也就解释了，为什么网上会说 GNOME over X11 支持 fractional scaling，并且需要用 gsettings 打开，而我在 Debian 和 Arch Linux 上设置这个选项也没有用了。原来是 Ubuntu 加的私货啊。

在 patch 中，找到了这么一段配置的解释：

+    <key name="fractional-scale-mode" enum="org.gnome.mutter.X11.scale-mode">
+      <default>"scale-ui-down"</default>
+      <description>
+        Choose the scaling mode to be used under X11 via Randr extension.
+
+        Supported methods are:
+
+        • “scale-up”     — Scale everything up to the requested scale, shrinking
+                           the UI. The applications will look blurry when scaling
+                           at higher values and the resolution will be lowered.
+        • “scale-ui-down — Scale up the UI toolkits to the closest integer
+                           scaling value upwards, while scale down the display
+                           to match the requested scaling level.
+                           It increases the resolution of the logical display.
+      </description>
+    </key>

这样就可以解释前面看到的现象了：默认是 scale-ui-down，也就是先放大到两倍（closest integer scaling value upwards），再缩小（scale down the display to match the requested scaling level）。

2021年2月15日
分类于 software
需要 2 分钟阅读时间

使用 SSSD 的 LDAP 认证

前言

最近在研究替换一个老的用户系统，于是顺便学习了一下 LDAP，还有 SSSD。LDAP 是一个目录协议，顺带的，因为用户信息也可以存在里面，所以也就成了一个常见的用户认证协议。SSSD 就是一个 daemon，把系统的 NSS PAM 的机制和 LDAP 连接起来。

配置

其实很简单，安装 sssd 并且配置即可：

$ sudo apt install sssd
$ sudo vim /etc/sssd/sssd.conf
# file content:
[sssd]
config_file_version = 2
services = nss,pam
domains = LDAP

[domain/LDAP]
cache_credentials = true
enumerate = true
entry_cache_timeout = 10
ldap_network_timeout = 2

id_provider = ldap
auth_provider = ldap
chpass_provider = ldap

ldap_uri = ldap://127.0.0.1/
ldap_chpass_uri = ldap://127.0.0.1/
ldap_search_base = dc=example,dc=com
ldap_default_bind_dn = cn=localhost,ou=machines,dc=example,dc=com
ldap_default_authtok = REDACTED
$ sudo systemctl enable --now sssd

一些字段需要按照实际情况编写，请参考sssd.conf 和 sssd-ldap。

协议

那么，LDAP 里面的用户是如何和 Linux 里的用户对应起来的呢？可以看到，SSSD 会查询 posixAccount：

(&(objectclass=posixAccount)(uid=*)(uidNumber=*)(gidNumber=*))

然后，可以查到 posixAccount 的 schema，里面可以见到对应 /etc/passwd 的各个字段。相应的，也有 shadowAccount 对应 /etc/shadow。

按照要求配好以后（建议用 ldapvi 工具），就可以用 getent passwd 看到新增的用户了。

上面的部分是通过 NSS 接口来查询的，除了用户以外，还有其他的一些 NIS 信息可以通过 LDAP 查询。此外，如果要登录的话，则是用 PAM 认证，SSSD 则会把 PAM 认证转换成 LDAP 的 Bind：

$ su test
Password:
# sssd: bind to dn of test user with password

如果 Bind 成功，则认为登录成功；否则就是登录失败。

如果用户要修改密码，SSSD 默认用的是 RFC3062 LDAP Password Modify Extended Operation 的方式；如果服务器不支持的话，可以按照文档使用 ldap modify 方式来修改密码。

SSD 还可以配置 sudo 支持，也是用类似的方法，添加 objectClass=sudoRole 的目录项即可。可以参考 man sudoers.ldap 编写对应的目录项。

对于 SSH 配置，可以参考 RedHat 的文档，和参考 man sss_ssh_authorizedkeys 配置 authorized keys 命令。然后，给用户添加 sshPublicKey 属性即可，内容与 ~/.ssh/id_*.pub 一致。

在 Big Sur(M1) 上解决 LaTeX 找不到楷体字体的问题

背景

最近在尝试移植 MiKTeX 到 Apple Silicon 上，添加了一些 patch 以后就可以工作了，但遇到了新的问题，即找不到 KaiTi

~/Library/Application Support/MiKTeX/texmfs/install/tex/latex/ctex/fontset/ctex-fontset-macnew.def:99:
   Package fontspec Error:
      The font "Kaiti SC" cannot be found.

用 miktex-fc-list 命令找了一下，确实没有找到：

$ /Applications/MiKTeX\ Console.app/Contents/bin/miktex-fc-list | grep Kaiti
# Nothing

上网搜了一下，找到了一个解决方案：字体在目录 /System/Library/Frameworks/ApplicationServices.framework/Versions/A/Frameworks/ATS.framework/Versions/A/Support/FontSubsets/Kaiti.ttc 里，所以手动安装一下，就可以让 LaTeX 找到了。但我觉得，与其安装多一份在文件系统里，不如让 LaTeX 去找它。

解决方法

按照上面的线索，找到了 Kaiti.ttc 所在的路径：

$ fd Kaiti.ttc
/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc

可以看到，和上面的路径又不大一样。研究了一下 fontconfig，发现可以用 miktex-fc-conflist 找到配置文件的目录：

$ /Applications/MiKTeX\ Console.app/Contents/bin/miktex-fc-conflist
+ ~/Library/Application Support/MiKTeX/texmfs/config/fontconfig/config/localfonts2.conf: No description
+ ~/Library/Application Support/MiKTeX/texmfs/config/fontconfig/config/localfonts.conf: No description
...

看了下第一个文件（localfonts.conf）：

<?xml version="1.0" encoding="UTF-8"?>

<!--
  DO NOT EDIT THIS FILE! It will be replaced when MiKTeX is updated.
  Instead, edit the configuration file localfonts2.conf.
-->

<fontconfig>
<include>localfonts2.conf</include>
<dir>/Library/Fonts/</dir>
<dir>/System/Library/Fonts/</dir>
<dir>~/Library/Application Support/MiKTeX/texmfs/install/fonts/type1</dir>
<dir>~/Library/Application Support/MiKTeX/texmfs/install/fonts/opentype</dir>
<dir>~/Library/Application Support/MiKTeX/texmfs/install/fonts/truetype</dir>
</fontconfig>

可以看到，我们可以添加路径，不过建议修改的是 localfonts2.conf。按照类似的格式，修改成：

<?xml version="1.0"?>
<fontconfig>
<dir>/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets</dir>
<!-- REMOVE THIS LINE
<dir>Your font directory here</dir>
<dir>Your font directory here</dir>
<dir>Your font directory here</dir>
     REMOVE THIS LINE -->
</fontconfig>

UPDATE: 新版本 macOS 中，路径建议加上 /System/Library/AssetsV2/com_apple_MobileAsset_Font7：

<dir>/System/Library/AssetsV2/com_apple_MobileAsset_Font7</dir>

这样，就可以找到 Kaiti SC 了：

$ miktex-fc-list | grep Kaiti
/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc: Kaiti TC,楷體\-繁,楷体\-繁:style=Regular,標準體,常规体
/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc: Kaiti SC,楷體\-簡,楷体\-简:style=Regular,標準體,常规体
/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc: Kaiti SC,楷體\-簡,楷体\-简:style=Bold,粗體,粗体
/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc: Kaiti TC,楷體\-繁,楷体\-繁:style=Bold,粗體,粗体
/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc: Kaiti SC,楷體\-簡,楷体\-简:style=Black,黑體,黑体
/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc: Kaiti TC,楷體\-繁,楷体\-繁:style=Black,黑體,黑体
/System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc: STKaiti:style=Regular,標準體,Ordinær,Normal,Normaali,Regolare,レギュラー,일반체,Regulier,Обычный,常规体

这样就搞定了，用 LaTeX 找字体的时候也没有出现问题了。

如果你用的是 TeX Live，那么直接把上面的 Kaiti.ttc 路径复制到 ~/Library/Fonts 下即可。

如果是用 Nixpkgs 装的 Tex Live，则建议用符号链接的方法，把相关的字体添加到 ~/Library/Fonts 下：

cd ~/Library/Fonts
ln -s /System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/华文细黑.ttf # STHeiti
ln -s /System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/华文黑体.ttf # STHeiti
ln -s /System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/华文仿宋.ttf # STFangsong
ln -s /System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets/Kaiti.ttc # STKaiti

寻找系统自带字体文件和对应字体名字的方法：

fc-scan /System/Library/PrivateFrameworks/FontServices.framework/Versions/A/Resources/Fonts/Subsets

2021年2月9日
分类于 software
需要 4 分钟阅读时间

COMMON 符号

背景

在编译一个程序的时候，遇到了 undefined symbol 的问题。具体情况是这样的：

一开始的时候，直接把所有的源代码编译成 .o，再一次性链接，这样不会报错
后来，把一些代码编译成静态库，即把其中一部分源代码编译成 .o 后，用 ar 合并到一个 .a 中，再和其余的 .o 链接在一起，这时候就报错了：

Undefined symbols for architecture arm64:
  "_abcd", referenced from:
    ...

如果换台机器，编译（使用的是 gcc 10.2.0）就没有问题。

而如果去找这个符号存在的 .o 里，是可以找到的：

$ objdump -t /path/to/abcd.o
0000000000000028         *COM*  00000008 _abcd

在合成的静态库 .a 里，也是存在的（一个定义 + 若干个引用）：

$ objdump -t /path/to/libabc.a | grep abcd
0000000000000028         *COM*  00000008 _abcd
0000000000000000         *UND* _abcd
0000000000000000         *UND* _abcd
0000000000000000         *UND* _abcd
0000000000000000         *UND* _abcd
0000000000000000         *UND* _abcd

于是觉得很奇怪，就上网搜了一下，找到了一篇 StackOverflow 讲了这个问题。解决方案很简单，就是：

编译的时候打开 -fno-common 设置

而 gcc 10 不会出错的原因是，它默认从 -fcommon 改成了 -fno-common 。

COMMON 是什么

这时候，肯定不满足于找到一个解决方案，肯定还是会去找背后的原理。

首先，搜索了一下 COMMON 是什么，找到了 Investigating linking with COMMON symbols in ELF 这篇文章。

文章里讲了 COMMON 是做什么的：

Common symbols are a feature that allow a programmer to 'define' several variables of the same name in different source files. This is in contrast with the more popular way of doing, where you define a variable once in a source file, and reference it everywhere else in other source files, using extern. When common symbols are used, the linker will merge all symbols of the same name into a single memory location, the size of which is the largest type of the individual common symbol definitions. For example, if fileA.c defines an uninitialized 32-bit integer myint, and fileB.c defines an 8-bit char myint, then in the final executable, references to myint from both files will point to the same memory location (common location), and the linker will reserve 32 bits for that location.

文章里还讲了具体的实现方法：一个没有初始化的全局变量，在 -fcommon 的情况下，会设为 COMMON；如果有初始化，就按照初始化的值预分配到 .bss 或者 .data。链接的时候，如果有多个同名的 symbol，会有一个规则决定最后的 symbol 放到哪里；如果有冲突的话，就是我们熟悉的 multiple definition 错误了。

为啥会有这种需求，多个 variable 同名，不会冲突而且共享内存？又在别的地方看到说法，COMMON 是给 ancient 代码使用的，还有的提到了 FORTRAN。于是去搜了一下，果然，FORTRAN 是问题的关键

FORTRAN 里面的 COMMON

用关键词很容易可以搜索到讲 COMMON BLOCK in FORTRAN 的文章，FORTRAN 里面的 COMMON 是一种通过全局存储隐式传递参数的方法。拿文章里的例子：

      PROGRAM MAIN
      INTEGER A
      REAL    F,R,X,Y
      COMMON  R,A,F
      A = -14
      R = 99.9
      F = 0.2
      CALL SUB(X,Y)
      END

      SUBROUTINE SUB(P,Q)
      INTEGER I
      REAL    A,B,P,Q
      COMMON  A,I,B
      END

在函数 MAIN 和 SUB 中，都有 COMMON 语句，而 COMMON 后面的变量，就是存储在一个 COMMON 的 symbol 之中，按照顺序映射到 symbol 的内存地址。尝试编译一下上面的代码，然后看一下 symbol：

$ gfortran -g -c test.f -o test.o
$ objdump -t test.o

test.o: file format Mach-O arm64

SYMBOL TABLE:
0000000000000078 g     F __TEXT,__text _main
0000000000000000 g     F __TEXT,__text _sub_
000000000000000c         *COM*  00000010 ___BLNK__

可以看到，出现了一个叫做 ___BLNK__ 的 COMMON symbol，大小是 16 字节。看一下代码中是如何引用的：

$ objdump -S --reloc test.o

test.o: file format Mach-O arm64

Disassembly of section __TEXT,__text:

0000000000000018 _MAIN__:
;         PROGRAM MAIN
      18: fd 7b be a9                   stp x29, x30, [sp, #-32]!
      1c: fd 03 00 91                   mov x29, sp
;         A = -14
      20: 00 00 00 90                   adrp    x0, #0
        0000000000000020:  ARM64_RELOC_GOT_LOAD_PAGE21  ___BLNK__
      24: 00 00 40 f9                   ldr x0, [x0]
        0000000000000024:  ARM64_RELOC_GOT_LOAD_PAGEOFF12   ___BLNK__
      28: a1 01 80 12                   mov w1, #-14
      2c: 01 04 00 b9                   str w1, [x0, #4]
;         R = 99.9
      30: 00 00 00 90                   adrp    x0, #0
        0000000000000030:  ARM64_RELOC_GOT_LOAD_PAGE21  ___BLNK__
      34: 00 00 40 f9                   ldr x0, [x0]
        0000000000000034:  ARM64_RELOC_GOT_LOAD_PAGEOFF12   ___BLNK__
      38: a1 99 99 52                   mov w1, #52429
      3c: e1 58 a8 72                   movk    w1, #17095, lsl #16
      40: 20 00 27 1e                   fmov    s0, w1
      44: 00 00 00 bd                   str s0, [x0]
;         F = 0.2
      48: 00 00 00 90                   adrp    x0, #0
        0000000000000048:  ARM64_RELOC_GOT_LOAD_PAGE21  ___BLNK__
      4c: 00 00 40 f9                   ldr x0, [x0]
        000000000000004c:  ARM64_RELOC_GOT_LOAD_PAGEOFF12   ___BLNK__
      50: a1 99 99 52                   mov w1, #52429
      54: 81 c9 a7 72                   movk    w1, #15948, lsl #16
      58: 20 00 27 1e                   fmov    s0, w1
      5c: 00 08 00 bd                   str s0, [x0, #8]
;         CALL SUB(X,Y)
      60: e1 63 00 91                   add x1, sp, #24
      64: e0 73 00 91                   add x0, sp, #28
      68: 00 00 00 94                   bl  #0 <_MAIN__+0x50>
        0000000000000068:  ARM64_RELOC_BRANCH26 _sub_
;         END
      6c: 1f 20 03 d5                   nop
      70: fd 7b c2 a8                   ldp x29, x30, [sp], #32
      74: c0 03 5f d6                   ret

可以看到，在 MAIN 中引用 A 的时候，取的地址是 ___BLNK__+4，R 是 ___BLNK__+0，F 是 ___BLNK__+8。这和代码里的顺序也是一致的。所以在 SUB 中读 A I B 的时候，对应了 MAIN 中的 A R F。通过这种方式，可以在 MAIN 函数里面隐式地给所有函数传递参数。

此外，COMMON 还可以命名，这样就可以区分不同的参数用途：

        PROGRAM MAIN
        INTEGER A
        REAL    F,R,X,Y
        COMMON  R,A,F
        COMMON /test/ X,Y
        A = -14
        R = 99.9
        F = 0.2
        CALL SUB(X,Y)
        END

        SUBROUTINE SUB(P,Q)
        INTEGER I
        REAL    A,B,P,Q
        COMMON  A,I,B
        END

代码添加了一行 COMMON /test/，观察一下 symbol：

$ objdump -t test.o

test.o: file format Mach-O arm64

SYMBOL TABLE:
0000000000000088 g     F __TEXT,__text _main
0000000000000000 g     F __TEXT,__text _sub_
000000000000000c         *COM*  00000010 ___BLNK__
0000000000000008         *COM*  00000010 _test_

和预期的一致：出现了新的 COMMON symbol，对应了 named COMMON Block 里面的变量 X 和 Y。

再看一下汇编里怎么引用的：

;         CALL SUB(X,Y)
      60: 00 00 00 90                   adrp    x0, #0
                0000000000000060:  ARM64_RELOC_GOT_LOAD_PAGE21  _test_
      64: 00 00 40 f9                   ldr     x0, [x0]
                0000000000000064:  ARM64_RELOC_GOT_LOAD_PAGEOFF12       _test_
      68: 01 10 00 91                   add     x1, x0, #4
      6c: 00 00 00 90                   adrp    x0, #0
                000000000000006c:  ARM64_RELOC_GOT_LOAD_PAGE21  _test_
      70: 00 00 40 f9                   ldr     x0, [x0]
                0000000000000070:  ARM64_RELOC_GOT_LOAD_PAGEOFF12       _test_
      74: 00 00 00 94                   bl      #0 <_MAIN__+0x5c>
                0000000000000074:  ARM64_RELOC_BRANCH26 _sub_

可以看到，第一个参数（x0）为 _test_，第二个参数（x1）为 _test_+4，和预期也是一样的。

读到这里，就可以理解为啥有 COMMON symbol 了。可能是为了让 C 代码和 FORTRAN 代码可以互操作 COMMON symbol，就有了这么一出。也可能有的 C 库确实用了类似的方法来实现某些功能。

解决方案

但是，这种用法在现在来看是不推荐的，建议还是该 extern 就 extern，另外，在编译静态库的时候，记得加上 -fno-common。