缘起

现有十多台已加入Windows Server活动目录的CentOS6主机,无法使用域账号登录,只能使用本机系统登录。

该集群主机运行服务正常,网络服务正常。

思路

  1. 长时间无域账号登录导致的脱域;
  2. Samba版本升级引起的配置文件错误(丢失或配置项变更)。

排错

  1. 使用root账号登录重新加域

    1
    2
    3
    4
    5
    # 清空缓存
    net cache flush
    rm -f /var/lib/samba/*.tdb
    # 重新加域
    net ads join -U administrator@DOMAIN.com
  2. 加域之后无法登录,id username 无法拉取最近用户

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    # AD连接测试失败
    [root@server ~]# wbinfo -t
    checking the trust secret for domain EXAMPLE via RPC calls failed
    wbcCheckTrustCredentials(EXAMPLE): error code was NT_STATUS_DOMAIN_CONTROLLER_NOT_FOUND (0xc0000233)
    failed to call wbcCheckTrustCredentials: WBC_ERR_AUTH_ERROR
    Could not check secret
    [root@server ~]# wbinfo -P
    checking the NETLOGON for domain[EXAMPLE] dc connection to "" failed
    failed to call wbcPingDc: WBC_ERR_DOMAIN_NOT_FOUN
    [root@server ~]# wbinfo --verbose -i DOMAIN+username
    failed to call wbcGetpwnam: WBC_ERR_DOMAIN_NOT_FOUND
    Could not get info for user DOMAIN+username
  3. 修改配置文件,同步域账号

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    # There are BUILTIN domains on an AD server and the default "*" 'tdb' backend and range are needed to map the users not included in the other mapped domains. Not having this backend still causes a mapping error even if the user being mapped is not included in that range.

    [root@server ~]# vim /etc/samba/smb.conf
    ……
    idmap config * : backend = tdb
    idmap config DOMAIN:default = yes
    ……
    [root@server ~]# service winbind restart

    # 配置完成之后测试
    [root@server ~]# wbinfo -t
    checking the trust secret for domain FIRSTSHARE via RPC calls succeeded
    [root@server ~]# wbinfo -p
    Ping to winbindd succeeded

    参考:

    [Ticket]: https://access.redhat.com/solutions/338723 “wbinfo -i search returns an error with a two domain Samba configuration”

  4. 连接之后断开

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    # 经检查 /etc/log/secure 出现如下日志,可以排除sshd.conf配置项目,并将怀疑目标集中到pam模块
    Feb 15 18:09:48 vlnx101025 sshd[46730]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=172.28.0.41 user=sujx
    Feb 15 18:09:48 vlnx101025 sshd[46730]: pam_krb5[46730]: authentication fails for 'sujx' (sujx@domain.com): Authentication failure (KDC reply did not match expectations)
    Feb 15 18:09:48 vlnx101025 sshd[46730]: pam_winbind(sshd:auth): getting password (0x00000210)
    Feb 15 18:09:48 vlnx101025 sshd[46730]: pam_winbind(sshd:auth): pam_get_item returned a password
    Feb 15 18:09:48 vlnx101025 sshd[46730]: pam_winbind(sshd:auth): user 'sujx' granted access
    Feb 15 18:09:48 vlnx101025 sshd[46730]: pam_krb5[46730]: account checks fail for 'sujx': unknown reason -1765328237 (KDC reply did not match expectations)
    Feb 15 18:09:48 vlnx101025 sshd[46730]: pam_winbind(sshd:account): user 'sujx' granted access
    Feb 15 18:09:48 vlnx101025 sshd[46731]: fatal: Access denied for user sujx by PAM account configuration
    Feb 15 18:09:48 vlnx101025 sshd[46730]: Failed password for sujx from 172.28.0.41 port 9366 ssh2
    # 检查 /etc/pam.d/system-auth 无误
    # 检查 /etc/security/pam_winbind.conf 无误
    # 检查 /etc/sysconfig/authconfig 无误
    # 重新查看日志,从“KDC reply did not match expectations”入手

    # 错误重现
    [root@server ~]# kinit sujx
    KDC reply did not match expectations
    # 可正常登录
    [root@server ~]# kinit sujx@DOMAIN.COM
    [root@vlnx101025 ~]# klist
    Ticket cache: FILE:/tmp/krb5cc_0
    Default principal: sujx@DOMAIN.COM
    # 重新检查SAMBA配置文件
    [root@server ~]# vim /etc/samba/smb.conf
    ……
    # 将realm选项由小写改为大写
    # realm = domain.com
    realm = DOMAIN.COM
    ……
    [root@server ~]# service winbind restart

    # 然后可正常登录

完工

CentOS7+SSSD比CentOS6+Winbind稳定多了,这么多机器极少出现上述无法登录的情况。