Linux内核代码审计之CVE-2018-5703

由于第一篇文章是针对网络子系统的,因此这篇也还是找一个网络子系统的漏洞去审计。本文同样是学习 Linux 内核漏洞时的记录,从草稿箱中翻出,稍加修改后在此分享(灌水)一下。

这次选的漏洞是CVE-2018-5703,是syzcaller扫出来的,漏洞报告见: https://groups.google.com/forum/#!msg/syzkaller-bugs/0PBeVnSzfqQ/5eXAlM46BQAJ

已知信息

  1. Linux master分支,crash的commit为 61233580f1f33c50e159c50e24d80ffd2ba2e06b
  2. 漏洞出在tcp_v6_syn_recv_sock函数
  3. 漏洞类型为slab out-of-bound write (实际上是类型混淆)
  4. 和tls相关

分析

tcp_v6_syn_recv_sock这个函数是定义在虚函数表的syn_recv_sock:

static const struct inet_connection_sock_af_ops ipv6_specific = {
  .queue_xmit    = inet6_csk_xmit,
  .send_check    = tcp_v6_send_check,
  .rebuild_header    = inet6_sk_rebuild_header,
  .sk_rx_dst_set     = inet6_sk_rx_dst_set,
  .conn_request      = tcp_v6_conn_request,
  .syn_recv_sock     = tcp_v6_syn_recv_sock,
  .net_header_len    = sizeof(struct ipv6hdr),
  .net_frag_header_len = sizeof(struct frag_hdr),
  .setsockopt    = ipv6_setsockopt,
  .getsockopt    = ipv6_getsockopt,
  .addr2sockaddr     = inet6_csk_addr2sockaddr,
  .sockaddr_len      = sizeof(struct sockaddr_in6),

  .compat_setsockopt = compat_ipv6_setsockopt,
  .compat_getsockopt = compat_ipv6_getsockopt,

  .mtu_reduced       = tcp_v6_mtu_reduced,
};

类似的虚函数表在分析上一个漏洞的时候有见过,调用的链路如下:

► f 0 c025b1b8 sk_clone_lock
   f 1 c02b1278 inet_csk_clone_lock+16
   f 2 c02c561c tcp_create_openreq_child+24
   f 3 c02c2574 tcp_v4_syn_recv_sock+48
   f 4 c02c5c08 tcp_check_req+696
   f 5 c02c28a4 tcp_v4_do_rcv+360
   f 6 c02c28a4 tcp_v4_do_rcv+360
   f 7 c02c4d84 tcp_v4_rcv+1736
   f 8 c02a72ac ip_local_deliver_finish+284
   f 9 c0265c10 __netif_receive_skb+940
   f 10 c0265cd8 process_backlog+112

首先需要理解什么时候会调用到syn_recv_sock,那就需要了解ops虚函数表的作用,connection_sock_af_ops定义在include/net/inet_connection_sock.h:

 /*
   * Pointers to address related TCP functions
   * (i.e. things that depend on the address family)
   */
  struct inet_connection_sock_af_ops

如注释所说,包含地址(Address Family)相关的TCP函数,connection sock表示带连接的sock,定义如下:

/** inet_connection_sock - INET connection oriented sock
*
* @icsk_accept_queue:     FIFO of established children
* @icsk_bind_hash:    Bind node
* @icsk_timeout:      Timeout
* @icsk_retransmit_timer: Resend (no ack)
* @icsk_rto:          Retransmit timeout
* @icsk_pmtu_cookie       Last pmtu seen by socket
* @icsk_ca_ops        Pluggable congestion control hook
* @icsk_af_ops        Operations which are AF_INET{4,6} specific
* @icsk_ulp_ops       Pluggable ULP control hook
* @icsk_ulp_data      ULP private data
* @icsk_ca_state:     Congestion control state
* @icsk_retransmits:      Number of unrecovered [RTO] timeouts
* @icsk_pending:      Scheduled timer event
* @icsk_backoff:      Backoff
* @icsk_syn_retries:      Number of allowed SYN (or equivalent) retries
* @icsk_probes_out:       unanswered 0 window probes
* @icsk_ext_hdr_len:      Network protocol overhead (IP/IPv6 options)
* @icsk_ack:          Delayed ACK control data
* @icsk_mtup;         MTU probing control data
*/
struct inet_connection_sock;

例如,TCP sock就内联了connection sock作为第一个元素,所以tcp_sock指针可以转换成inet_connection_sock指针而没有任何副作用。

struct tcp_sock {
  /* inet_connection_sock has to be the first member of tcp_sock */
  struct inet_connection_sock inet_conn;
  u16 tcp_header_len; /* Bytes of tcp header to send      */
  // ...
}

既然漏洞出在这个函数,我们就需要先了解这个函数实现的功能。由于其是connection sock的函数,且字段为syn_recv_sock,因此有理由猜测这是TCP协议中接收SYN报文的回调函数。根据TCP协议的状态机,SYN报文会在两种状态下转换,分别是:

  1. LISTEN状态,Passive Open,接收到SYN,返回SYN+ACK,状态转换成SYN-RECEIVED
  2. SYN-SENT状态,Active Open,接收到SYN+ACK后,再次发送ACK,完成三次握手,进入ESTABLISHED状态

tcp_v6_syn_recv_sock顾名思义也就是IPv6的tcp链接接收到SYN的处理流程,调用堆栈如下:

►f 0 ffffffff847f4090 tcp_v6_syn_recv_sock
   f 1 ffffffff84655422 tcp_get_cookie_sock+258
   f 2 ffffffff84847bdd cookie_v6_check+6013
   f 3 ffffffff847f9d6d tcp_v6_do_rcv+3661
   f 4 ffffffff847f9d6d tcp_v6_do_rcv+3661
   f 5 ffffffff847fc73e tcp_v6_rcv+8942
   f 6 ffffffff8472c76f ip6_input_finish+879
   f 7 ffffffff8472dbe9 ip6_input+233
   f 8 ffffffff8472dbe9 ip6_input+233
   f 9 ffffffff8472be09 ip6_rcv_finish+425
   f 10 ffffffff8472be09 ip6_rcv_finish+425

虽然有实现的大致文档,但是关于函数细节的文档通常是没有的,所以,RTFSC!函数原型:

static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
                   struct request_sock *req,
                   struct dst_entry *dst,
                   struct request_sock *req_unhash,
                   bool *own_req)

request_sock包含了一次连接请求所需要的信息:

request_sock

首先检查协议是不是IP协议,如果是的话表示当前地址是IPv4-to-IPv6的地址,转换成IPv4地址后按照IPv4处理:

  if (skb->protocol == htons(ETH_P_IP)) {
      /*
       *  v6 mapped
       */

      newsk = tcp_v4_syn_recv_sock(sk, skb, req, dst,
                       req_unhash, own_req);
      // ...
  }

如果是正常的IPv6协议,则走ipv6的处理流程:

newsk = tcp_create_openreq_child(sk, req, skb);

在分析WrongZone漏洞的时候也看到过类似的流程,TCP连接建立会创建新的sock对象newsk,所以用户空间accept在收到新连接时会返回对应newsk的文件描述符。

回忆一下WrongZone中的类型混淆漏洞,是由于在运行过程中ipv6 sk的sk_prot被修改了,而新建的sock所使用的slab依赖于sk->sk_prot->slab,从而导致克隆的实际上是修改过类型的sock,与原始sk类型不一致(sk->sk_prot_creator),导致类型混淆。

那么这里是否也是同样的问题呢?其实在翻漏洞报告的时候不小心瞄到了一句话:

tls_init() changes sk->sk_prot from IPv6 to IPv4, which leads to this bug.

修改sk->sk_prot从IPv6改为IPv4,效果和IPV6_ADDRFORM一样!后者的实现:

if (sk->sk_protocol == IPPROTO_TCP) {
                  struct inet_connection_sock *icsk = inet_csk(sk);
                  local_bh_disable();
                  sock_prot_inuse_add(net, sk->sk_prot, -1);
                  sock_prot_inuse_add(net, &tcp_prot, 1);
                  local_bh_enable();
                  sk->sk_prot = &tcp_prot; // **BOOM**
                  icsk->icsk_af_ops = &ipv4_specific;
                  sk->sk_socket->ops = &inet_stream_ops;
                  sk->sk_family = PF_INET;
                  tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
              }

tls_init的实现:

// net/tls/tls_main.c

static inline void update_sk_prot(struct sock *sk, struct tls_context *ctx)
{
  sk->sk_prot = &tls_prots[ctx->tx_conf];
}

static int __init tls_register(void)
{
  build_protos(tls_prots, &tcp_prot);

  tcp_register_ulp(&tcp_tls_ulp_ops);

  return 0;
}

所以,newsk其实是从TCPv4 Slab中分配的,KASAN的告警触发在下面代码的memcpy处:

  newtcp6sk = (struct tcp6_sock *)newsk;
  inet_sk(newsk)->pinet6 = &newtcp6sk->inet6;

  newtp = tcp_sk(newsk);
  newinet = inet_sk(newsk);
  newnp = inet6_sk(newsk);

  memcpy(newnp, np, sizeof(struct ipv6_pinfo));

np为inet6_sk(sk),newnp为inet6_sk(newsk),inet6_sk定义如下:

static inline struct ipv6_pinfo *inet6_sk(const struct sock *__sk)
{
  return sk_fullsock(__sk) ? inet_sk(__sk)->pinet6 : NULL;
}

tcp6_sock结构:

struct tcp6_sock {
  struct tcp_sock   tcp;
  /* ipv6_pinfo has to be the last member of tcp6_sock, see inet6_sk_generic */
  struct ipv6_pinfo inet6;
};

发现问题了吗?newsk被强制转换成了tcp6_sock,后者的大小如下:

struct tcp6_sock {
        struct tcp_sock            tcp;                  /*     0  2528 */
        /* --- cacheline 39 boundary (2496 bytes) was 32 bytes ago --- */
        struct ipv6_pinfo          inet6;                /*  2528   160 */
        /* --- cacheline 42 boundary (2688 bytes) --- */

        /* size: 2688, cachelines: 42, members: 2 */
};

tcp_sock的slab object_size正好是2528,因此强制转换类型之后inet6部分就越界了。事实上KSAN的报错也是如此:

[   60.682681] The buggy address belongs to the object at ffff880066e4ea80
[   60.682681]  which belongs to the cache TCP of size 2528
[   60.685803] The buggy address is located 0 bytes to the right of
[   60.685803]  2528-byte region [ffff880066e4ea80, ffff880066e4f460)
[   60.689011] The buggy address belongs to the page:
[   60.690323] page:000000002b421986 count:1 mapcount:0 mapping:000000007a7e240e index:0xffff880066e4fffd compound_mapcount: 0
[   60.693179] flags: 0x1fffc0000008100(slab|head)
[   60.694336] raw: 01fffc0000008100 ffff880066e4e000 ffff880066e4fffd 0000000100000003
[   60.696251] raw: ffff88006a4b9348 ffffea0001ab0ca0 ffff8800690afdc0 0000000000000000
[   60.698152] page dumped because: kasan: bad access detected
[   60.699617]
[   60.700021] Memory state around the buggy address:
[   60.701230]  ffff880066e4f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   60.703039]  ffff880066e4f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   60.704813] >ffff880066e4f400: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[   60.706484]                                                        ^
[   60.708036]  ffff880066e4f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   60.709746]  ffff880066e4f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[   60.711449] ==================================================================

TLS

一个还没解释的问题是,内核如何进入到tls_init? 查看堆栈:

  f 0 ffffffff8468e1a0 tls_init
   f 1 ffffffff845724e9 tcp_set_ulp+345
   f 2 ffffffff844db4f6 do_tcp_setsockopt.isra+790
   f 3 ffffffff844dd3c0 tcp_setsockopt+176
   f 4 ffffffff842011b5 sock_common_setsockopt+149
   f 5 ffffffff841fc999 sys_setsockopt+393
   f 6 ffffffff841fc999 sys_setsockopt+393
   f 7 ffffffff8524bb11 entry_SYSCALL_64+113

可以看到是通过setsockopt,关于kernel对TLS的支持见文档 https://www.kernel.org/doc/html/latest/networking/tls.html 。简单来说,进入方法就是:

sock = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));

patch

该漏洞的修复如下:

commit d91c3e17f75f218022140dee18cf515292184a8f
Author: Ilya Lesokhin <ilyal@mellanox.com>
Date:   Tue Jan 16 15:31:52 2018 +0200

    net/tls: Only attach to sockets in ESTABLISHED state

    Calling accept on a TCP socket with a TLS ulp attached results
    in two sockets that share the same ulp context.
    The ulp context is freed while a socket is destroyed, so
    after one of the sockets is released, the second second will
    trigger a use after free when it tries to access the ulp context
    attached to it.
    We restrict the TLS ulp to sockets in ESTABLISHED state
    to prevent the scenario above.

    Fixes: 3c4d7559159b ("tls: kernel TLS support")
    Reported-by: syzbot+904e7cd6c5c741609228@syzkaller.appspotmail.com
    Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index e07ee3a..7b7a70e 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -454,6 +454,15 @@ static int tls_init(struct sock *sk)
        struct tls_context *ctx;
        int rc = 0;

+       /* The TLS ulp is currently supported only for TCP sockets
+        * in ESTABLISHED state.
+        * Supporting sockets in LISTEN state will require us
+        * to modify the accept implementation to clone rather then
+        * share the ulp context.
+        */
+       if (sk->sk_state != TCP_ESTABLISHED)
+               return -ENOTSUPP;
+
        /* allocate tls context */
        ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
        if (!ctx) {

在TLS ULP(upper level protocol)中做了进一步限制,明确指定不支持LISTEN状态的socket设置TLS,虽然是偷懒的做法,但是work。

以史为鉴

说起来这是第二次遇到因为sock的prot类型改变导致的漏洞了,这两个漏洞还是几乎相同时间报出来的。

会不会还存在其他类似的漏洞?简单地grep:

$ egrep "sk->sk_prot = " -nr net/
net/core/sock.c:1530:		sk->sk_prot = sk->sk_prot_creator = prot;
net/ipv6/ipv6_sockglue.c:218:				sk->sk_prot = &tcp_prot;
net/ipv6/ipv6_sockglue.c:232:				sk->sk_prot = prot;
net/tls/tls_main.c:58:	sk->sk_prot = &tls_prots[ctx->tx_conf];

第二行引起了WrongZone漏洞,第四行引起了本文的漏洞,第一行是创建的时候(sk_alloc),第三行呢?同样是在IPV6_ADDRFORM的处理分支中,与WrongZone漏洞并列,在UDP的分支中。

  if (sk->sk_protocol == IPPROTO_TCP) {
      struct inet_connection_sock *icsk = inet_csk(sk);
      local_bh_disable();
      sock_prot_inuse_add(net, sk->sk_prot, -1);
      sock_prot_inuse_add(net, &tcp_prot, 1);
      local_bh_enable();
      sk->sk_prot = &tcp_prot;
      icsk->icsk_af_ops = &ipv4_specific;
      sk->sk_socket->ops = &inet_stream_ops;
      sk->sk_family = PF_INET;
      tcp_sync_mss(sk, icsk->icsk_pmtu_cookie);
  } else {
      struct proto *prot = &udp_prot;

      if (sk->sk_protocol == IPPROTO_UDPLITE)
          prot = &udplite_prot;
      local_bh_disable();
      sock_prot_inuse_add(net, sk->sk_prot, -1);
      sock_prot_inuse_add(net, prot, 1);
      local_bh_enable();
      sk->sk_prot = prot;
      sk->sk_socket->ops = &inet_dgram_ops;
      sk->sk_family = PF_INET;
  }

也就是说TCPv6 UDP的socket也会在运行时变化类型(udpv6_prot <--> udp_prot/udplite_prot),这会导致类似的类型混淆问题吗? 这需要跟踪UDP sock的整个生命周期,以及所有可能触达的分支。

当然,根据前面两个漏洞的pattern,范围还可以缩小一些。前面两个漏洞都是发生在sock复制的时候sk_prot_creator没变导致的类型混淆。对于TCP而言,克隆发生在inet_csk_clone_lock,csk表示这是针对connection sock的,内部调用的是sk_clone_lock,搜索所有调用处:

$ egrep "\<sk_clone_lock"  -rn --binary-files=without-match net/
net/ipv4/inet_connection_sock.c:781:	struct sock *newsk = sk_clone_lock(sk, priority);
net/ipv4/inet_connection_sock.c:852:	/* sk_clone_lock locked the socket and set refcnt to 2 */
net/core/sock.c:1638: *	sk_clone_lock - clone a socket, and lock its clone
net/core/sock.c:1644:struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
net/core/sock.c:1752:EXPORT_SYMBOL_GPL(sk_clone_lock);

只有inet_csk_clone_lock一处调用,所以可以认为UDP协议是不受同类漏洞影响的。

总结

如果不看报告内容的话,即便知道这是一个类型混淆,一开始是找不到这个漏洞点的,根本原因是不知道内核对于 socket 还支持 tls 的功能,而且 tls 还可以改变内核 sock 的 prot 类型。这个例子也可以看出代码审计并非万能(毕竟这是 syzcaller fuzz 出来的),但通过审计形成的理解,我们可以通过简单的 grep 找到更多同类的漏洞,比如上面 UDP6 的潜在类型混淆。

虽然这次没有找到真正的漏洞,但要相信这不过是投入产出比的问题,漏洞只是理解系统的副产品,做之前想好自己真正想要的是什么就可以了。

PS:

weibo