Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discard occuring in read only mode? #2

Open
jthornber opened this issue Dec 3, 2013 · 0 comments
Open

discard occuring in read only mode? #2

jthornber opened this issue Dec 3, 2013 · 0 comments

Comments

@jthornber
Copy link
Owner

115:06 < snitm> ejt: why are we allowing discards when the pool transitions to read-only?
15:12 < ejt> maybe we're allowing them to be passed down, but not freeing mappings?
5:29 < snitm> ejt: ok, I see, yeah pool->process_prepared_discard = process_prepared_discard_passdown
15:31 < snitm> ejt: but pool->process_discard = process_discard; .. with no conditional behavior based on read-only mode in process_discard()

jthornber pushed a commit that referenced this issue Feb 17, 2014
Dave Jones reported a use after free in UDP stack :

[ 5059.434216] =========================
[ 5059.434314] [ BUG: held lock freed! ]
[ 5059.434420] 3.13.0-rc3+ #9 Not tainted
[ 5059.434520] -------------------------
[ 5059.434620] named/863 is freeing memory ffff88005e960000-ffff88005e96061f, with a lock still held there!
[ 5059.434815]  (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0
[ 5059.435012] 3 locks held by named/863:
[ 5059.435086]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8143054d>] __netif_receive_skb_core+0x11d/0x940
[ 5059.435295]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81467a5e>] ip_local_deliver_finish+0x3e/0x410
[ 5059.435500]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0
[ 5059.435734]
stack backtrace:
[ 5059.435858] CPU: 0 PID: 863 Comm: named Not tainted 3.13.0-rc3+ #9 [loadavg: 0.21 0.06 0.06 1/115 1365]
[ 5059.436052] Hardware name:                  /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010
[ 5059.436223]  0000000000000002 ffff88007e203ad8 ffffffff8153a372 ffff8800677130e0
[ 5059.436390]  ffff88007e203b10 ffffffff8108cafa ffff88005e960000 ffff88007b00cfc0
[ 5059.436554]  ffffea00017a5800 ffffffff8141c490 0000000000000246 ffff88007e203b48
[ 5059.436718] Call Trace:
[ 5059.436769]  <IRQ>  [<ffffffff8153a372>] dump_stack+0x4d/0x66
[ 5059.436904]  [<ffffffff8108cafa>] debug_check_no_locks_freed+0x15a/0x160
[ 5059.437037]  [<ffffffff8141c490>] ? __sk_free+0x110/0x230
[ 5059.437147]  [<ffffffff8112da2a>] kmem_cache_free+0x6a/0x150
[ 5059.437260]  [<ffffffff8141c490>] __sk_free+0x110/0x230
[ 5059.437364]  [<ffffffff8141c5c9>] sk_free+0x19/0x20
[ 5059.437463]  [<ffffffff8141cb25>] sock_edemux+0x25/0x40
[ 5059.437567]  [<ffffffff8141c181>] sock_queue_rcv_skb+0x81/0x280
[ 5059.437685]  [<ffffffff8149bd21>] ? udp_queue_rcv_skb+0xd1/0x4b0
[ 5059.437805]  [<ffffffff81499c82>] __udp_queue_rcv_skb+0x42/0x240
[ 5059.437925]  [<ffffffff81541d25>] ? _raw_spin_lock+0x65/0x70
[ 5059.438038]  [<ffffffff8149bebb>] udp_queue_rcv_skb+0x26b/0x4b0
[ 5059.438155]  [<ffffffff8149c712>] __udp4_lib_rcv+0x152/0xb00
[ 5059.438269]  [<ffffffff8149d7f5>] udp_rcv+0x15/0x20
[ 5059.438367]  [<ffffffff81467b2f>] ip_local_deliver_finish+0x10f/0x410
[ 5059.438492]  [<ffffffff81467a5e>] ? ip_local_deliver_finish+0x3e/0x410
[ 5059.438621]  [<ffffffff81468653>] ip_local_deliver+0x43/0x80
[ 5059.438733]  [<ffffffff81467f70>] ip_rcv_finish+0x140/0x5a0
[ 5059.438843]  [<ffffffff81468926>] ip_rcv+0x296/0x3f0
[ 5059.438945]  [<ffffffff81430b72>] __netif_receive_skb_core+0x742/0x940
[ 5059.439074]  [<ffffffff8143054d>] ? __netif_receive_skb_core+0x11d/0x940
[ 5059.442231]  [<ffffffff8108c81d>] ? trace_hardirqs_on+0xd/0x10
[ 5059.442231]  [<ffffffff81430d83>] __netif_receive_skb+0x13/0x60
[ 5059.442231]  [<ffffffff81431c1e>] netif_receive_skb+0x1e/0x1f0
[ 5059.442231]  [<ffffffff814334e0>] napi_gro_receive+0x70/0xa0
[ 5059.442231]  [<ffffffffa01de426>] rtl8169_poll+0x166/0x700 [r8169]
[ 5059.442231]  [<ffffffff81432bc9>] net_rx_action+0x129/0x1e0
[ 5059.442231]  [<ffffffff810478cd>] __do_softirq+0xed/0x240
[ 5059.442231]  [<ffffffff81047e25>] irq_exit+0x125/0x140
[ 5059.442231]  [<ffffffff81004241>] do_IRQ+0x51/0xc0
[ 5059.442231]  [<ffffffff81542bef>] common_interrupt+0x6f/0x6f

We need to keep a reference on the socket, by using skb_steal_sock()
at the right place.

Note that another patch is needed to fix a race in
udp_sk_rx_dst_set(), as we hold no lock protecting the dst.

Fixes: 421b388 ("udp: ipv4: Add udp early demux")
Reported-by: Dave Jones <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Shawn Bohrer <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
I see the following splat with 3.13-rc1 when attempting to perform DMA:

[  253.004516] Alignment trap: not handling instruction e1902f9f at [<c0204b40>]
[  253.004583] Unhandled fault: alignment exception (0x221) at 0xdfdfdfd7
[  253.004646] Internal error: : 221 [#1] PREEMPT SMP ARM
[  253.004691] Modules linked in: dmatest(+) [last unloaded: dmatest]
[  253.004798] CPU: 0 PID: 671 Comm: kthreadd Not tainted 3.13.0-rc1+ #2
[  253.004864] task: df9b0900 ti: df03e000 task.ti: df03e000
[  253.004937] PC is at dmaengine_unmap_put+0x14/0x34
[  253.005010] LR is at pl330_tasklet+0x3c8/0x550
[  253.005087] pc : [<c0204b44>]    lr : [<c0207478>]    psr: a00e0193
[  253.005087] sp : df03fe48  ip : 00000000  fp : df03bf18
[  253.005178] r10: bf00e108  r9 : 00000001  r8 : 00000000
[  253.005245] r7 : df837040  r6 : dfb41800  r5 : df837048  r4 : df837000
[  253.005316] r3 : dfdfdfcf  r2 : dfb41f80  r1 : df837048  r0 : dfdfdfd7
[  253.005384] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[  253.005459] Control: 30c5387d  Table: 9fb9ba80  DAC: fffffffd
[  253.005520] Process kthreadd (pid: 671, stack limit = 0xdf03e248)

This is due to desc->txd.unmap containing garbage (uninitialised memory).

Rather than add another dummy initialisation to _init_desc, instead
ensure that the descriptors are zero-initialised during allocation and
remove the dummy, per-field initialisation.

Cc: Andriy Shevchenko <[email protected]>
Acked-by: Jassi Brar <[email protected]>
Signed-off-by: Will Deacon <[email protected]>
Acked-by: Vinod Koul <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
BUG_ON(!vma) assumption is introduced by commit 0bf598d ("mbind:
add BUG_ON(!vma) in new_vma_page()"), however, even if

    address = __vma_address(page, vma);

and

    vma->start < address < vma->end

page_address_in_vma() may still return -EFAULT because of many other
conditions in it.  As a result the while loop in new_vma_page() may end
with vma=NULL.

This patch revert the commit and also fix the potential dereference NULL
pointer reported by Dan.

   http://marc.info/?l=linux-mm&m=137689530323257&w=2

  kernel BUG at mm/mempolicy.c:1204!
  invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  CPU: 3 PID: 7056 Comm: trinity-child3 Not tainted 3.13.0-rc3+ #2
  task: ffff8801ca5295d0 ti: ffff88005ab20000 task.ti: ffff88005ab20000
  RIP: new_vma_page+0x70/0x90
  RSP: 0000:ffff88005ab21db0  EFLAGS: 00010246
  RAX: fffffffffffffff2 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000008040075 RSI: ffff8801c3d74600 RDI: ffffea00079a8b80
  RBP: ffff88005ab21dc8 R08: 0000000000000004 R09: 0000000000000000
  R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffff2
  R13: ffffea00079a8b80 R14: 0000000000400000 R15: 0000000000400000

  FS:  00007ff49c6f4740(0000) GS:ffff880244e00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007ff49c68f994 CR3: 000000005a205000 CR4: 00000000001407e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Stack:
   ffffea00079a8b80 ffffea00079a8bc0 ffffea00079a8ba0 ffff88005ab21e50
   ffffffff811adc7a 0000000000000000 ffff8801ca5295d0 0000000464e224f8
   0000000000000000 0000000000000002 0000000000000000 ffff88020ce75c00
  Call Trace:
    migrate_pages+0x12a/0x850
    SYSC_mbind+0x513/0x6a0
    SyS_mbind+0xe/0x10
    ia32_do_call+0x13/0x13
  Code: 85 c0 75 2f 4c 89 e1 48 89 da 31 f6 bf da 00 02 00 65 44 8b 04 25 08 f7 1c 00 e8 ec fd ff ff 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 4c 89 e6 48 89 df ba 01 00 00 00 e8 48
  RIP  [<ffffffff8119f200>] new_vma_page+0x70/0x90
   RSP <ffff88005ab21db0>

Signed-off-by: Wanpeng Li <[email protected]>
Reported-by: Dave Jones <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Reviewed-by: Naoya Horiguchi <[email protected]>
Reviewed-by: Bob Liu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
Commit 2361613, "of/irq: Refactor interrupt-map parsing" changed
the refcount on the device_node causing an error in of_node_put():

ERROR: Bad of_node_put() on /pci@800000020000000
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-rc3-dirty #2
Call Trace:
[c00000003e403500] [c0000000000144fc] .show_stack+0x7c/0x1f0 (unreliable)
[c00000003e4035d0] [c00000000070f250] .dump_stack+0x88/0xb4
[c00000003e403650] [c0000000005e8768] .of_node_release+0xd8/0xf0
[c00000003e4036e0] [c0000000005eeafc] .of_irq_parse_one+0x10c/0x280
[c00000003e4037a0] [c0000000005efd4c] .of_irq_parse_pci+0x3c/0x1d0
[c00000003e403840] [c000000000038240] .pcibios_setup_device+0xa0/0x2e0
[c00000003e403910] [c0000000000398f0] .pcibios_setup_bus_devices+0x60/0xd0
[c00000003e403990] [c00000000003b3a4] .__of_scan_bus+0x1a4/0x2b0
[c00000003e403a80] [c00000000003a62c] .pcibios_scan_phb+0x30c/0x410
[c00000003e403b60] [c0000000009fe430] .pcibios_init+0x7c/0xd4

This patch adjusts the refcount in the walk of the interrupt tree.
When a match is found, there is no need to increase the refcount
on 'out_irq->np' as 'newpar' is already holding a ref. The refcount
balance between 'ipar' and 'newpar' is maintained in the skiplevel:
goto label.

This patch also removes the usage of the device_node variable 'old'
which seems useless after the latest changes.

Signed-off-by: Cédric Le Goater <[email protected]>
Signed-off-by: Rob Herring <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
The existing serial state notification handling expected older Option
devices, having a hardcoded assumption that the Modem port was always
USB interface #2.  That isn't true for devices from the past few years.

hso_serial_state_notification is a local cache of a USB Communications
Interface Class SERIAL_STATE notification from the device, and the
USB CDC specification (section 6.3, table 67 "Class-Specific Notifications")
defines wIndex as the USB interface the event applies to.  For hso
devices this will always be the Modem port, as the Modem port is the
only port which is set up to receive them by the driver.

So instead of always expecting USB interface #2, instead validate the
notification with the actual USB interface number of the Modem port.

Signed-off-by: Dan Williams <[email protected]>
Tested-by: H. Nikolaus Schaller <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
The way namespace tags are implemented in sysfs is more complicated
than necessary.  As each tag is a pointer value and required to be
non-NULL under a namespace enabled parent, there's no need to record
separately what type each tag is.  If multiple namespace types are
needed, which currently aren't, we can simply compare the tag to a set
of allowed tags in the superblock assuming that the tags, being
pointers, won't have the same value across multiple types.

This patch rips out kobj_ns_type handling from sysfs.  sysfs now has
an enable switch to turn on namespace under a node.  If enabled, all
children are required to have non-NULL namespace tags and filtered
against the super_block's tag.

kobject namespace determination is now performed in
lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary.
The sanity checks are also moved.  create_dir() is restructured to
ease such addition.  This removes most kobject namespace knowledge
from sysfs proper which will enable proper separation and layering of
sysfs.

This is the second try.  The first one was cb26a31 ("sysfs: drop
kobj_ns_type handling") which tried to automatically enable namespace
if there are children with non-NULL namespace tags; however, it was
broken for symlinks as they should inherit the target's tag iff
namespace is enabled in the parent.  This led to namespace filtering
enabled incorrectly for wireless net class devices through phy80211
symlinks and thus network configuration failure.  a1212d2
("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit.

This shouldn't introduce any behavior changes, for real.

v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was
    missing and caused build failure.  Reported by kbuild test robot.

Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Linus Torvalds <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Kay Sievers <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: kbuild test robot <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
Introduce kernfs interface to manipulate a directory which takes and
returns sysfs_dirents.

create_dir() is renamed to kernfs_create_dir_ns() and its argumantes
and return value are updated.  create_dir() usages are replaced with
kernfs_create_dir_ns() and sysfs_create_subdir() usages are replaced
with kernfs_create_dir().  Dup warnings are handled explicitly by
sysfs users of the kernfs interface.

sysfs_enable_ns() is renamed to kernfs_enable_ns().

This patch doesn't introduce any behavior changes.

v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS.

v3: kernfs_enable_ns() added.

v4: Refreshed on top of "sysfs: drop kobj_ns_type handling, take #2"
    so that this patch removes sysfs_enable_ns().

Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
Dave Jones got the following lockdep splat:

>  ======================================================
>  [ INFO: possible circular locking dependency detected ]
>  3.12.0-rc3+ #92 Not tainted
>  -------------------------------------------------------
>  trinity-child2/15191 is trying to acquire lock:
>   (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
>
> but task is already holding lock:
>   (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&ctx->lock){-.-...}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
>         [<ffffffff811500ff>] __perf_event_task_sched_out+0x2df/0x5e0
>         [<ffffffff81091b83>] perf_event_task_sched_out+0x93/0xa0
>         [<ffffffff81732052>] __schedule+0x1d2/0xa20
>         [<ffffffff81732f30>] preempt_schedule_irq+0x50/0xb0
>         [<ffffffff817352b6>] retint_kernel+0x26/0x30
>         [<ffffffff813eed04>] tty_flip_buffer_push+0x34/0x50
>         [<ffffffff813f0504>] pty_write+0x54/0x60
>         [<ffffffff813e900d>] n_tty_write+0x32d/0x4e0
>         [<ffffffff813e5838>] tty_write+0x158/0x2d0
>         [<ffffffff811c4850>] vfs_write+0xc0/0x1f0
>         [<ffffffff811c52cc>] SyS_write+0x4c/0xa0
>         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> -> #2 (&rq->lock){-.-.-.}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff81733f90>] _raw_spin_lock+0x40/0x80
>         [<ffffffff810980b2>] wake_up_new_task+0xc2/0x2e0
>         [<ffffffff81054336>] do_fork+0x126/0x460
>         [<ffffffff81054696>] kernel_thread+0x26/0x30
>         [<ffffffff8171ff93>] rest_init+0x23/0x140
>         [<ffffffff81ee1e4b>] start_kernel+0x3f6/0x403
>         [<ffffffff81ee1571>] x86_64_start_reservations+0x2a/0x2c
>         [<ffffffff81ee1664>] x86_64_start_kernel+0xf1/0xf4
>
> -> #1 (&p->pi_lock){-.-.-.}:
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>         [<ffffffff810979d1>] try_to_wake_up+0x31/0x350
>         [<ffffffff81097d62>] default_wake_function+0x12/0x20
>         [<ffffffff81084af8>] autoremove_wake_function+0x18/0x40
>         [<ffffffff8108ea38>] __wake_up_common+0x58/0x90
>         [<ffffffff8108ff59>] __wake_up+0x39/0x50
>         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>         [<ffffffff81111450>] __call_rcu+0x140/0x820
>         [<ffffffff81111b8d>] call_rcu+0x1d/0x20
>         [<ffffffff81093697>] cpu_attach_domain+0x287/0x360
>         [<ffffffff81099d7e>] build_sched_domains+0xe5e/0x10a0
>         [<ffffffff81efa7fc>] sched_init_smp+0x3b7/0x47a
>         [<ffffffff81ee1f4e>] kernel_init_freeable+0xf6/0x202
>         [<ffffffff817200be>] kernel_init+0xe/0x190
>         [<ffffffff8173d22c>] ret_from_fork+0x7c/0xb0
>
> -> #0 (&rdp->nocb_wq){......}:
>         [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
>         [<ffffffff810cc243>] lock_acquire+0x93/0x200
>         [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>         [<ffffffff8108ff43>] __wake_up+0x23/0x50
>         [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>         [<ffffffff81111450>] __call_rcu+0x140/0x820
>         [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
>         [<ffffffff81149abf>] put_ctx+0x4f/0x70
>         [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
>         [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
>         [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
>         [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
>         [<ffffffff8173d4e4>] tracesys+0xdd/0xe2
>
> other info that might help us debug this:
>
> Chain exists of:
>   &rdp->nocb_wq --> &rq->lock --> &ctx->lock
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(&ctx->lock);
>                                 lock(&rq->lock);
>                                 lock(&ctx->lock);
>    lock(&rdp->nocb_wq);
>
>  *** DEADLOCK ***
>
> 1 lock held by trinity-child2/15191:
>  #0:  (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
>
> stack backtrace:
> CPU: 2 PID: 15191 Comm: trinity-child2 Not tainted 3.12.0-rc3+ #92
>  ffffffff82565b70 ffff880070c2dbf8 ffffffff8172a363 ffffffff824edf40
>  ffff880070c2dc38 ffffffff81726741 ffff880070c2dc90 ffff88022383b1c0
>  ffff88022383aac0 0000000000000000 ffff88022383b188 ffff88022383b1c0
> Call Trace:
>  [<ffffffff8172a363>] dump_stack+0x4e/0x82
>  [<ffffffff81726741>] print_circular_bug+0x200/0x20f
>  [<ffffffff810cb7ca>] __lock_acquire+0x191a/0x1be0
>  [<ffffffff810c6439>] ? get_lock_stats+0x19/0x60
>  [<ffffffff8100b2f4>] ? native_sched_clock+0x24/0x80
>  [<ffffffff810cc243>] lock_acquire+0x93/0x200
>  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
>  [<ffffffff8173419b>] _raw_spin_lock_irqsave+0x4b/0x90
>  [<ffffffff8108ff43>] ? __wake_up+0x23/0x50
>  [<ffffffff8108ff43>] __wake_up+0x23/0x50
>  [<ffffffff8110d4f8>] __call_rcu_nocb_enqueue+0xa8/0xc0
>  [<ffffffff81111450>] __call_rcu+0x140/0x820
>  [<ffffffff8109bc8f>] ? local_clock+0x3f/0x50
>  [<ffffffff81111bb0>] kfree_call_rcu+0x20/0x30
>  [<ffffffff81149abf>] put_ctx+0x4f/0x70
>  [<ffffffff81154c3e>] perf_event_exit_task+0x12e/0x230
>  [<ffffffff81056b8d>] do_exit+0x30d/0xcc0
>  [<ffffffff810c9af5>] ? trace_hardirqs_on_caller+0x115/0x1e0
>  [<ffffffff810c9bcd>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff8105893c>] do_group_exit+0x4c/0xc0
>  [<ffffffff810589c4>] SyS_exit_group+0x14/0x20
>  [<ffffffff8173d4e4>] tracesys+0xdd/0xe2

The underlying problem is that perf is invoking call_rcu() with the
scheduler locks held, but in NOCB mode, call_rcu() will with high
probability invoke the scheduler -- which just might want to use its
locks.  The reason that call_rcu() needs to invoke the scheduler is
to wake up the corresponding rcuo callback-offload kthread, which
does the job of starting up a grace period and invoking the callbacks
afterwards.

One solution (championed on a related problem by Lai Jiangshan) is to
simply defer the wakeup to some point where scheduler locks are no longer
held.  Since we don't want to unnecessarily incur the cost of such
deferral, the task before us is threefold:

1.	Determine when it is likely that a relevant scheduler lock is held.

2.	Defer the wakeup in such cases.

3.	Ensure that all deferred wakeups eventually happen, preferably
	sooner rather than later.

We use irqs_disabled_flags() as a proxy for relevant scheduler locks
being held.  This works because the relevant locks are always acquired
with interrupts disabled.  We may defer more often than needed, but that
is at least safe.

The wakeup deferral is tracked via a new field in the per-CPU and
per-RCU-flavor rcu_data structure, namely ->nocb_defer_wakeup.

This flag is checked by the RCU core processing.  The __rcu_pending()
function now checks this flag, which causes rcu_check_callbacks()
to initiate RCU core processing at each scheduling-clock interrupt
where this flag is set.  Of course this is not sufficient because
scheduling-clock interrupts are often turned off (the things we used to
be able to count on!).  So the flags are also checked on entry to any
state that RCU considers to be idle, which includes both NO_HZ_IDLE idle
state and NO_HZ_FULL user-mode-execution state.

This approach should allow call_rcu() to be invoked regardless of what
locks you might be holding, the key word being "should".

Reported-by: Dave Jones <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Cc: Peter Zijlstra <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
ttyA has ld associated to n_gsm, when ttyA is closing, it triggers
to release gsmttyB's ld data dlci[B], then race would happen if gsmttyB
is opening in parallel.

(Note: This patch set differs from previous set in that it uses mutex
instead of spin lock to avoid race, so that it avoids sleeping in automic
context)

Here are race cases we found recently in test:

CASE #1
====================================================================
releasing dlci[B] race with gsmtty_install(gsmttyB), then panic
in gsmtty_open(gsmttyB), as below:

 tty_release(ttyA)                  tty_open(gsmttyB)
     |                                   |
   -----                           gsmtty_install(gsmttyB)
     |                                   |
   -----                    gsm_dlci_alloc(gsmttyB) => alloc dlci[B]
 tty_ldisc_release(ttyA)               -----
     |                                   |
 gsm_dlci_release(dlci[B])             -----
     |                                   |
 gsm_dlci_free(dlci[B])                -----
     |                                   |
   -----                           gsmtty_open(gsmttyB)

 gsmtty_open()
 {
     struct gsm_dlci *dlci = tty->driver_data; => here it uses dlci[B]
     ...
 }

 In gsmtty_open(gsmttyA), it uses dlci[B] which was release, so hit a panic.
=====================================================================

CASE #2
=====================================================================
releasing dlci[0] race with gsmtty_install(gsmttyB), then panic
in gsmtty_open(), as below:

 tty_release(ttyA)                  tty_open(gsmttyB)
     |                                   |
   -----                           gsmtty_install(gsmttyB)
     |                                   |
   -----                    gsm_dlci_alloc(gsmttyB) => alloc dlci[B]
     |                                   |
   -----                         gsmtty_open(gsmttyB) fail
     |                                   |
   -----                           tty_release(gsmttyB)
     |                                   |
   -----                           gsmtty_close(gsmttyB)
     |                                   |
   -----                        gsmtty_detach_dlci(dlci[B])
     |                                   |
   -----                             dlci_put(dlci[B])
     |                                   |
 tty_ldisc_release(ttyA)               -----
     |                                   |
 gsm_dlci_release(dlci[0])             -----
     |                                   |
 gsm_dlci_free(dlci[0])                -----
     |                                   |
   -----                             dlci_put(dlci[0])

 In gsmtty_detach_dlci(dlci[B]), it tries to use dlci[0] which was released,
 then hit panic.
=====================================================================

IMHO, n_gsm tty operations would refer released ldisc,  as long as
gsm_dlci_release() has chance to release ldisc data when some gsmtty operations
are ongoing..

This patch is try to avoid it by:

1) in n_gsm driver, use a global gsm mutex lock to avoid gsm_dlci_release() run in
parallel with gsmtty_install();

2) Increase dlci's ref count in gsmtty_install() instead of in gsmtty_open(), the
purpose is to prevent gsm_dlci_release() releasing dlci after gsmtty_install()
allocats dlci but before gsmtty_open increases dlci's ref count;

3) Decrease dlci's ref count in gsmtty_remove(), a tty framework API, this is the
opposite process of step 2).

Signed-off-by: Chao Bi <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
…to next/dt

From Jason Cooper:
mvebu DT changes for v3.14 (set #2)

 - mvebu
    - Netgear ReadyNAS cleanup
    - add Netgear ReadyNAS 2120

 - kirkwood
    - Netgear ReadyNAS cleanup
    - add Netgear ReadyNAS NV+ v2

* tag 'mvebu-dt-3.14-2' of git://git.infradead.org/linux-mvebu:
  ARM: kirkwood: Add support for NETGEAR ReadyNAS NV+ v2
  ARM: mvebu: Add Netgear ReadyNAS 2120 board
  ARM: mvebu: Fix whitespace in NETGEAR ReadyNAS .dts files
  ARM: mvebu: NETGEAR ReadyNAS 104 .dts cleanup
  ARM: mvebu: NETGEAR ReadyNAS 102 .dts cleanup
  ARM: kirkwood: NETGEAR ReadyNAS Duo v2 .dts cleanup

Signed-off-by: Kevin Hilman <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
Commit 4d9b109(tty: Prevent deadlock in n_gsm driver)
tried to close all the virtual ports synchronously before closing the
phycial ports, so the tty_vhangup() is used.

But the tty_unlock/lock() is wrong:
tty_release
 tty_ldisc_release
  tty_lock_pair(tty, o_tty)  < == Here the tty is for physical port
  tty_ldisc_kill
    gsmld_close
      gsm_cleanup_mux
        gsm_dlci_release
          tty = tty_port_tty_get(&dlci->port)
                            < == Here the tty(s) are for virtual port

They are different ttys, so before tty_vhangup(virtual tty), do not need
to call the tty_unlock(virtual tty) at all which causes unbalanced unlock
warning.

When enabling mutex debugging option, we will hit the below warning also:
[   99.276903] =====================================
[   99.282172] [ BUG: bad unlock balance detected! ]
[   99.287442] 3.10.20-261976-gaec5ba0 #44 Tainted: G           O
[   99.293972] -------------------------------------
[   99.299240] mmgr/152 is trying to release lock (&tty->legacy_mutex) at:
[   99.306693] [<c1b2dcad>] mutex_unlock+0xd/0x10
[   99.311669] but there are no more locks to release!
[   99.317131]
[   99.317131] other info that might help us debug this:
[   99.324440] 3 locks held by mmgr/152:
[   99.328542]  #0:  (&tty->legacy_mutex/1){......}, at: [<c1b30ab0>] tty_lock_nested+0x40/0x90
[   99.338116]  #1:  (&tty->ldisc_mutex){......}, at: [<c15dbd02>] tty_ldisc_kill+0x22/0xd0
[   99.347284]  #2:  (&gsm->mutex){......}, at: [<c15e3d83>] gsm_cleanup_mux+0x73/0x170
[   99.356060]
[   99.356060] stack backtrace:
[   99.360932] CPU: 0 PID: 152 Comm: mmgr Tainted: G           O 3.10.20-261976-gaec5ba0 #44
[   99.370086]  ef4a4de0 ef4a4de0 ef4c1d98 c1b27b91 ef4c1db8 c1292655 c1dd10f5 c1b2dcad
[   99.378921]  c1b2dcad ef4a4de0 ef4a528c ffffffff ef4c1dfc c12930dd 00000246 00000000
[   99.387754]  00000000 00000000 c15e1926 00000000 00000001 ddfa7530 00000003 c1b2dcad
[   99.396588] Call Trace:
[   99.399326]  [<c1b27b91>] dump_stack+0x16/0x18
[   99.404307]  [<c1292655>] print_unlock_imbalance_bug+0xe5/0xf0
[   99.410840]  [<c1b2dcad>] ? mutex_unlock+0xd/0x10
[   99.416110]  [<c1b2dcad>] ? mutex_unlock+0xd/0x10
[   99.421382]  [<c12930dd>] lock_release_non_nested+0x1cd/0x210
[   99.427818]  [<c15e1926>] ? gsm_destroy_network+0x36/0x130
[   99.433964]  [<c1b2dcad>] ? mutex_unlock+0xd/0x10
[   99.439235]  [<c12931a2>] lock_release+0x82/0x1c0
[   99.444505]  [<c1b2dcad>] ? mutex_unlock+0xd/0x10
[   99.449776]  [<c1b2dcad>] ? mutex_unlock+0xd/0x10
[   99.455047]  [<c1b2dc2f>] __mutex_unlock_slowpath+0x5f/0xd0
[   99.461288]  [<c1b2dcad>] mutex_unlock+0xd/0x10
[   99.466365]  [<c1b30bb1>] tty_unlock+0x21/0x50
[   99.471345]  [<c15e3dd1>] gsm_cleanup_mux+0xc1/0x170
[   99.476906]  [<c15e44d2>] gsmld_close+0x52/0x90
[   99.481983]  [<c15db905>] tty_ldisc_close.isra.1+0x35/0x50
[   99.488127]  [<c15dbd0c>] tty_ldisc_kill+0x2c/0xd0
[   99.493494]  [<c15dc7af>] tty_ldisc_release+0x2f/0x50
[   99.499152]  [<c15d572c>] tty_release+0x37c/0x4b0
[   99.504424]  [<c1b2dcad>] ? mutex_unlock+0xd/0x10
[   99.509695]  [<c1b2dcad>] ? mutex_unlock+0xd/0x10
[   99.514967]  [<c1372f6e>] ? eventpoll_release_file+0x7e/0x90
[   99.521307]  [<c1335849>] __fput+0xd9/0x200
[   99.525996]  [<c133597d>] ____fput+0xd/0x10
[   99.530685]  [<c125c731>] task_work_run+0x81/0xb0
[   99.535957]  [<c12019e9>] do_notify_resume+0x49/0x70
[   99.541520]  [<c1b30dc4>] work_notifysig+0x29/0x31
[   99.546897] ------------[ cut here ]------------

So here we can call tty_vhangup() directly which is for virtual port.

Reviewed-by: Chao Bi <[email protected]>
Signed-off-by: Liu, Chuansheng <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
…vebu into next/boards

From Jason Cooper:
mvebu defconfig changes for v3.14 (incremental #2)

 - mvebu
    - enable nand support

* tag 'mvebu-defconfig-3.14-2' of git://git.infradead.org/linux-mvebu:
  ARM: mvebu: config: Enable NAND support

Signed-off-by: Olof Johansson <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
As part of normal operaions, the hrtimer subsystem frequently calls
into the timekeeping code, creating a locking order of
  hrtimer locks -> timekeeping locks

clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
between the timekeeping the hrtimer subsystem, so that we could
notify the hrtimer subsytem the time had changed while holding
the timekeeping locks. This was done by scheduling delayed work
that would run later once we were out of the timekeeing code.

But unfortunately the lock chains are complex enoguh that in
scheduling delayed work, we end up eventually trying to grab
an hrtimer lock.

Sasha Levin noticed this in testing when the new seqlock lockdep
enablement triggered the following (somewhat abrieviated) message:

[  251.100221] ======================================================
[  251.100221] [ INFO: possible circular locking dependency detected ]
[  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
[  251.101967] -------------------------------------------------------
[  251.101967] kworker/10:1/4506 is trying to acquire lock:
[  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
[  251.101967]
[  251.101967] but task is already holding lock:
[  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] which lock already depends on the new lock.
[  251.101967]
[  251.101967]
[  251.101967] the existing dependency chain (in reverse order) is:
[  251.101967]
-> #5 (hrtimer_bases.lock#11){-.-...}:
[snipped]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[snipped]
-> #3 (&rq->lock){-.-.-.}:
[snipped]
-> #2 (&p->pi_lock){-.-.-.}:
[snipped]
-> #1 (&(&pool->lock)->rlock){-.-...}:
[  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
[  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
[  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
[  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
[  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
[  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
[  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
[  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
[  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
[  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
[  251.101967]
-> #0 (timekeeper_seq){----..}:
[snipped]
[  251.101967] other info that might help us debug this:
[  251.101967]
[  251.101967] Chain exists of:
  timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11

[  251.101967]  Possible unsafe locking scenario:
[  251.101967]
[  251.101967]        CPU0                    CPU1
[  251.101967]        ----                    ----
[  251.101967]   lock(hrtimer_bases.lock#11);
[  251.101967]                                lock(&rt_b->rt_runtime_lock);
[  251.101967]                                lock(hrtimer_bases.lock#11);
[  251.101967]   lock(timekeeper_seq);
[  251.101967]
[  251.101967]  *** DEADLOCK ***
[  251.101967]
[  251.101967] 3 locks held by kworker/10:1/4506:
[  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] stack backtrace:
[  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
[  251.101967] Workqueue: events clock_was_set_work

So the best solution is to avoid calling clock_was_set_delayed() while
holding the timekeeping lock, and instead using a flag variable to
decide if we should call clock_was_set() once we've released the locks.

This works for the case here, where the do_adjtimex() was the deadlock
trigger point. Unfortuantely, in update_wall_time() we still hold
the jiffies lock, which would deadlock with the ipi triggered by
clock_was_set(), preventing us from calling it even after we drop the
timekeeping lock. So instead call clock_was_set_delayed() at that point.

Cc: Thomas Gleixner <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Richard Cochran <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: stable <[email protected]> #3.10+
Reported-by: Sasha Levin <[email protected]>
Tested-by: Sasha Levin <[email protected]>
Signed-off-by: John Stultz <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
…bu into next/drivers

From Jason Cooper:
mvebu driver changes for v3.14 (incremental #2)
 - rtc
    - add driver for Intersil ISL12057 chip

* tag 'mvebu-drivers-3.14-2' of git://git.infradead.org/linux-mvebu:
  rtc: Add support for Intersil ISL12057 I2C RTC chip

b the commit.
Signed-off-by: Olof Johansson <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
…nto next/cleanup

From Jason Cooper:
mvebu SoC changes for v3.14 (incremental #2)

 - mvebu
    - some Armada cleanup to prep for a new SoC in mach-mvebu/

* tag 'mvebu-soc-3.14-2' of git://git.infradead.org/linux-mvebu:
  ARM: mvebu: move Armada 370/XP specific definitions to armada-370-xp.h
  ARM: mvebu: remove prototypes of non-existing functions from common.h
  ARM: mvebu: move ARMADA_XP_MAX_CPUS to armada-370-xp.h

Signed-off-by: Olof Johansson <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
…ake #2]

setlocalversion script was testing the presence of .git directory in
order to find out if git is used as SCM to track the current kernel
project. However in some cases, .git is not a directory but can be a
file: when the kernel is a git submodule part of a git super project for
example.

This patch just fixes this by using 'git rev-parse --show-cdup' to check
that the current directory is the kernel git topdir. This has the
advantage to not test and rely on git internal infrastructure directly.

Signed-off-by: Franck Bui-Huu <[email protected]>
Signed-off-by: Michal Marek <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
Jon Maloy says:

====================
tipc: link setup and failover improvements

This series consists of four unrelated commits with different purposes.

- Commit #1 is purely cosmetic and pedagogic, hopefully making the
  failover/tunneling logics slightly easier to understand.
- Commit #2 fixes a bug that has always been in the code, but was not
  discovered until very recently.
- Commit #3 fixes a non-fatal race issue in the neighbour discovery
  code.
- Commit #4 removes an unnecessary indirection step during link
  startup.
====================

Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
Sometimes we may meet the following lockdep issue.
The root cause is .set_clock callback is executed with spin_lock_irqsave
in sdhci_do_set_ios. However, the IMX set_clock callback will try to access
clk_get_rate which is using a mutex lock.

The fix avoids access mutex in .set_clock callback by initializing the
pltfm_host->clock at probe time and use it later instead of calling
clk_get_rate again in atomic context.

[ INFO: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected ]
3.13.0-rc1+ #285 Not tainted
------------------------------------------------------
kworker/u8:1/29 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 (prepare_lock){+.+...}, at: [<80480b08>] clk_prepare_lock+0x44/0xe4

and this task is already holding:
 (&(&host->lock)->rlock#2){-.-...}, at: [<804611f4>] sdhci_do_set_ios+0x20/0x720
which would create a new lock dependency:
 (&(&host->lock)->rlock#2){-.-...} -> (prepare_lock){+.+...}

but this new dependency connects a HARDIRQ-irq-safe lock:
 (&(&host->lock)->rlock#2){-.-...}
... which became HARDIRQ-irq-safe at:
  [<8005f030>] mark_lock+0x140/0x6ac
  [<80060760>] __lock_acquire+0xb30/0x1cbc
  [<800620d0>] lock_acquire+0x70/0x84
  [<8061d2f0>] _raw_spin_lock+0x30/0x40
  [<80460668>] sdhci_irq+0x24/0xa68
  [<8006b1d4>] handle_irq_event_percpu+0x54/0x18c
  [<8006b350>] handle_irq_event+0x44/0x64
  [<8006e50c>] handle_fasteoi_irq+0xa0/0x170
  [<8006a8f0>] generic_handle_irq+0x30/0x44
  [<8000f238>] handle_IRQ+0x54/0xbc
  [<8000864c>] gic_handle_irq+0x30/0x64
  [<80013024>] __irq_svc+0x44/0x5c
  [<80614c58>] printk+0x38/0x40
  [<804622a8>] sdhci_add_host+0x844/0xbcc
  [<80464948>] sdhci_esdhc_imx_probe+0x378/0x67c
  [<8032ee88>] platform_drv_probe+0x20/0x50
  [<8032d48c>] driver_probe_device+0x118/0x234
  [<8032d690>] __driver_attach+0x9c/0xa0
  [<8032b89c>] bus_for_each_dev+0x68/0x9c
  [<8032cf44>] driver_attach+0x20/0x28
  [<8032cbc8>] bus_add_driver+0x148/0x1f4
  [<8032dce0>] driver_register+0x80/0x100
  [<8032ee54>] __platform_driver_register+0x50/0x64
  [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
  [<80008980>] do_one_initcall+0x108/0x16c
  [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
  [<80611c50>] kernel_init+0x10/0x120
  [<8000e9c8>] ret_from_fork+0x14/0x2c

to a HARDIRQ-irq-unsafe lock:
 (prepare_lock){+.+...}
... which became HARDIRQ-irq-unsafe at:
...  [<8005f030>] mark_lock+0x140/0x6ac
  [<8005f604>] mark_held_locks+0x68/0x12c
  [<8005f780>] trace_hardirqs_on_caller+0xb8/0x1d8
  [<8005f8b4>] trace_hardirqs_on+0x14/0x18
  [<8061a130>] mutex_trylock+0x180/0x20c
  [<80480ad8>] clk_prepare_lock+0x14/0xe4
  [<804816a4>] clk_notifier_register+0x28/0xf0
  [<80015120>] twd_clk_init+0x50/0x68
  [<80008980>] do_one_initcall+0x108/0x16c
  [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
  [<80611c50>] kernel_init+0x10/0x120
  [<8000e9c8>] ret_from_fork+0x14/0x2c

other info that might help us debug this:

 Possible interrupt unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(prepare_lock);
                               local_irq_disable();
                               lock(&(&host->lock)->rlock#2);
                               lock(prepare_lock);
  <Interrupt>
    lock(&(&host->lock)->rlock#2);

 *** DEADLOCK ***

3 locks held by kworker/u8:1/29:
 #0:  (kmmcd){.+.+.+}, at: [<8003db18>] process_one_work+0x128/0x468
 #1:  ((&(&host->detect)->work)){+.+.+.}, at: [<8003db18>] process_one_work+0x128/0x468
 #2:  (&(&host->lock)->rlock#2){-.-...}, at: [<804611f4>] sdhci_do_set_ios+0x20/0x720

the dependencies between HARDIRQ-irq-safe lock and the holding lock:
-> (&(&host->lock)->rlock#2){-.-...} ops: 330 {
   IN-HARDIRQ-W at:
                    [<8005f030>] mark_lock+0x140/0x6ac
                    [<80060760>] __lock_acquire+0xb30/0x1cbc
                    [<800620d0>] lock_acquire+0x70/0x84
                    [<8061d2f0>] _raw_spin_lock+0x30/0x40
                    [<80460668>] sdhci_irq+0x24/0xa68
                    [<8006b1d4>] handle_irq_event_percpu+0x54/0x18c
                    [<8006b350>] handle_irq_event+0x44/0x64
                    [<8006e50c>] handle_fasteoi_irq+0xa0/0x170
                    [<8006a8f0>] generic_handle_irq+0x30/0x44
                    [<8000f238>] handle_IRQ+0x54/0xbc
                    [<8000864c>] gic_handle_irq+0x30/0x64
                    [<80013024>] __irq_svc+0x44/0x5c
                    [<80614c58>] printk+0x38/0x40
                    [<804622a8>] sdhci_add_host+0x844/0xbcc
                    [<80464948>] sdhci_esdhc_imx_probe+0x378/0x67c
                    [<8032ee88>] platform_drv_probe+0x20/0x50
                    [<8032d48c>] driver_probe_device+0x118/0x234
                    [<8032d690>] __driver_attach+0x9c/0xa0
                    [<8032b89c>] bus_for_each_dev+0x68/0x9c
                    [<8032cf44>] driver_attach+0x20/0x28
                    [<8032cbc8>] bus_add_driver+0x148/0x1f4
                    [<8032dce0>] driver_register+0x80/0x100
                    [<8032ee54>] __platform_driver_register+0x50/0x64
                    [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
                    [<80008980>] do_one_initcall+0x108/0x16c
                    [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                    [<80611c50>] kernel_init+0x10/0x120
                    [<8000e9c8>] ret_from_fork+0x14/0x2c
   IN-SOFTIRQ-W at:
                    [<8005f030>] mark_lock+0x140/0x6ac
                    [<80060204>] __lock_acquire+0x5d4/0x1cbc
                    [<800620d0>] lock_acquire+0x70/0x84
                    [<8061d40c>] _raw_spin_lock_irqsave+0x40/0x54
                    [<8045e4a4>] sdhci_tasklet_finish+0x1c/0x120
                    [<8002b538>] tasklet_action+0xa0/0x15c
                    [<8002b778>] __do_softirq+0x118/0x290
                    [<8002bcf4>] irq_exit+0xb4/0x10c
                    [<8000f240>] handle_IRQ+0x5c/0xbc
                    [<8000864c>] gic_handle_irq+0x30/0x64
                    [<80013024>] __irq_svc+0x44/0x5c
                    [<80614c58>] printk+0x38/0x40
                    [<804622a8>] sdhci_add_host+0x844/0xbcc
                    [<80464948>] sdhci_esdhc_imx_probe+0x378/0x67c
                    [<8032ee88>] platform_drv_probe+0x20/0x50
                    [<8032d48c>] driver_probe_device+0x118/0x234
                    [<8032d690>] __driver_attach+0x9c/0xa0
                    [<8032b89c>] bus_for_each_dev+0x68/0x9c
                    [<8032cf44>] driver_attach+0x20/0x28
                    [<8032cbc8>] bus_add_driver+0x148/0x1f4
                    [<8032dce0>] driver_register+0x80/0x100
                    [<8032ee54>] __platform_driver_register+0x50/0x64
                    [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
                    [<80008980>] do_one_initcall+0x108/0x16c
                    [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                    [<80611c50>] kernel_init+0x10/0x120
                    [<8000e9c8>] ret_from_fork+0x14/0x2c
   INITIAL USE at:
                   [<8005f030>] mark_lock+0x140/0x6ac
                   [<8005ff0c>] __lock_acquire+0x2dc/0x1cbc
                   [<800620d0>] lock_acquire+0x70/0x84
                   [<8061d40c>] _raw_spin_lock_irqsave+0x40/0x54
                   [<804611f4>] sdhci_do_set_ios+0x20/0x720
                   [<80461924>] sdhci_set_ios+0x30/0x3c
                   [<8044cea0>] mmc_power_up+0x6c/0xd0
                   [<8044dac4>] mmc_start_host+0x60/0x70
                   [<8044eb3c>] mmc_add_host+0x60/0x88
                   [<8046225c>] sdhci_add_host+0x7f8/0xbcc
                   [<80464948>] sdhci_esdhc_imx_probe+0x378/0x67c
                   [<8032ee88>] platform_drv_probe+0x20/0x50
                   [<8032d48c>] driver_probe_device+0x118/0x234
                   [<8032d690>] __driver_attach+0x9c/0xa0
                   [<8032b89c>] bus_for_each_dev+0x68/0x9c
                   [<8032cf44>] driver_attach+0x20/0x28
                   [<8032cbc8>] bus_add_driver+0x148/0x1f4
                   [<8032dce0>] driver_register+0x80/0x100
                   [<8032ee54>] __platform_driver_register+0x50/0x64
                   [<8084b094>] sdhci_esdhc_imx_driver_init+0x18/0x20
                   [<80008980>] do_one_initcall+0x108/0x16c
                   [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                   [<80611c50>] kernel_init+0x10/0x120
                   [<8000e9c8>] ret_from_fork+0x14/0x2c
 }
 ... key      at: [<80e040e8>] __key.26952+0x0/0x8
 ... acquired at:
   [<8005eb60>] check_usage+0x3d0/0x5c0
   [<8005edac>] check_irq_usage+0x5c/0xb8
   [<80060d38>] __lock_acquire+0x1108/0x1cbc
   [<800620d0>] lock_acquire+0x70/0x84
   [<8061a210>] mutex_lock_nested+0x54/0x3c0
   [<80480b08>] clk_prepare_lock+0x44/0xe4
   [<8048188c>] clk_get_rate+0x14/0x64
   [<8046374c>] esdhc_pltfm_set_clock+0x20/0x2a4
   [<8045d70c>] sdhci_set_clock+0x4c/0x498
   [<80461518>] sdhci_do_set_ios+0x344/0x720
   [<80461924>] sdhci_set_ios+0x30/0x3c
   [<8044c390>] __mmc_set_clock+0x44/0x60
   [<8044cd4c>] mmc_set_clock+0x10/0x14
   [<8044f8f4>] mmc_init_card+0x1b4/0x1520
   [<80450f00>] mmc_attach_mmc+0xb4/0x194
   [<8044da08>] mmc_rescan+0x294/0x2f0
   [<8003db94>] process_one_work+0x1a4/0x468
   [<8003e850>] worker_thread+0x118/0x3e0
   [<80044de0>] kthread+0xd4/0xf0
   [<8000e9c8>] ret_from_fork+0x14/0x2c

the dependencies between the lock to be acquired and HARDIRQ-irq-unsafe lock:
-> (prepare_lock){+.+...} ops: 395 {
   HARDIRQ-ON-W at:
                    [<8005f030>] mark_lock+0x140/0x6ac
                    [<8005f604>] mark_held_locks+0x68/0x12c
                    [<8005f780>] trace_hardirqs_on_caller+0xb8/0x1d8
                    [<8005f8b4>] trace_hardirqs_on+0x14/0x18
                    [<8061a130>] mutex_trylock+0x180/0x20c
                    [<80480ad8>] clk_prepare_lock+0x14/0xe4
                    [<804816a4>] clk_notifier_register+0x28/0xf0
                    [<80015120>] twd_clk_init+0x50/0x68
                    [<80008980>] do_one_initcall+0x108/0x16c
                    [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                    [<80611c50>] kernel_init+0x10/0x120
                    [<8000e9c8>] ret_from_fork+0x14/0x2c
   SOFTIRQ-ON-W at:
                    [<8005f030>] mark_lock+0x140/0x6ac
                    [<8005f604>] mark_held_locks+0x68/0x12c
                    [<8005f7c8>] trace_hardirqs_on_caller+0x100/0x1d8
                    [<8005f8b4>] trace_hardirqs_on+0x14/0x18
                    [<8061a130>] mutex_trylock+0x180/0x20c
                    [<80480ad8>] clk_prepare_lock+0x14/0xe4
                    [<804816a4>] clk_notifier_register+0x28/0xf0
                    [<80015120>] twd_clk_init+0x50/0x68
                    [<80008980>] do_one_initcall+0x108/0x16c
                    [<8081cca4>] kernel_init_freeable+0x10c/0x1d0
                    [<80611c50>] kernel_init+0x10/0x120
                    [<8000e9c8>] ret_from_fork+0x14/0x2c
   INITIAL USE at:
                   [<8005f030>] mark_lock+0x140/0x6ac
                   [<8005ff0c>] __lock_acquire+0x2dc/0x1cbc
                   [<800620d0>] lock_acquire+0x70/0x84
                   [<8061a0c8>] mutex_trylock+0x118/0x20c
                   [<80480ad8>] clk_prepare_lock+0x14/0xe4
                   [<80482af8>] __clk_init+0x1c/0x45c
                   [<8048306c>] _clk_register+0xd0/0x170
                   [<80483148>] clk_register+0x3c/0x7c
                   [<80483b4c>] clk_register_fixed_rate+0x88/0xd8
                   [<80483c04>] of_fixed_clk_setup+0x68/0x94
                   [<8084c6fc>] of_clk_init+0x44/0x68
                   [<808202b0>] time_init+0x2c/0x38
                   [<8081ca14>] start_kernel+0x1e4/0x368
                   [<10008074>] 0x10008074
 }
 ... key      at: [<808afebc>] prepare_lock+0x38/0x48
 ... acquired at:
   [<8005eb94>] check_usage+0x404/0x5c0
   [<8005edac>] check_irq_usage+0x5c/0xb8
   [<80060d38>] __lock_acquire+0x1108/0x1cbc
   [<800620d0>] lock_acquire+0x70/0x84
   [<8061a210>] mutex_lock_nested+0x54/0x3c0
   [<80480b08>] clk_prepare_lock+0x44/0xe4
   [<8048188c>] clk_get_rate+0x14/0x64
   [<8046374c>] esdhc_pltfm_set_clock+0x20/0x2a4
   [<8045d70c>] sdhci_set_clock+0x4c/0x498
   [<80461518>] sdhci_do_set_ios+0x344/0x720
   [<80461924>] sdhci_set_ios+0x30/0x3c
   [<8044c390>] __mmc_set_clock+0x44/0x60
   [<8044cd4c>] mmc_set_clock+0x10/0x14
   [<8044f8f4>] mmc_init_card+0x1b4/0x1520
   [<80450f00>] mmc_attach_mmc+0xb4/0x194
   [<8044da08>] mmc_rescan+0x294/0x2f0
   [<8003db94>] process_one_work+0x1a4/0x468
   [<8003e850>] worker_thread+0x118/0x3e0
   [<80044de0>] kthread+0xd4/0xf0
   [<8000e9c8>] ret_from_fork+0x14/0x2c

stack backtrace:
CPU: 2 PID: 29 Comm: kworker/u8:1 Not tainted 3.13.0-rc1+ #285
Workqueue: kmmcd mmc_rescan
Backtrace:
[<80012160>] (dump_backtrace+0x0/0x10c) from [<80012438>] (show_stack+0x18/0x1c)
 r6:00000000 r5:00000000 r4:8088ecc8 r3:bfa11200
[<80012420>] (show_stack+0x0/0x1c) from [<80616b14>] (dump_stack+0x84/0x9c)
[<80616a90>] (dump_stack+0x0/0x9c) from [<8005ebb4>] (check_usage+0x424/0x5c0)
 r5:80979940 r4:bfa29b44
[<8005e790>] (check_usage+0x0/0x5c0) from [<8005edac>] (check_irq_usage+0x5c/0xb8)
[<8005ed50>] (check_irq_usage+0x0/0xb8) from [<80060d38>] (__lock_acquire+0x1108/0x1cbc)
 r8:bfa115e8 r7:80df9884 r6:80dafa9c r5:00000003 r4:bfa115d0
[<8005fc30>] (__lock_acquire+0x0/0x1cbc) from [<800620d0>] (lock_acquire+0x70/0x84)
[<80062060>] (lock_acquire+0x0/0x84) from [<8061a210>] (mutex_lock_nested+0x54/0x3c0)
 r7:bfa11200 r6:80dafa9c r5:00000000 r4:80480b08
[<8061a1bc>] (mutex_lock_nested+0x0/0x3c0) from [<80480b08>] (clk_prepare_lock+0x44/0xe4)
[<80480ac4>] (clk_prepare_lock+0x0/0xe4) from [<8048188c>] (clk_get_rate+0x14/0x64)
 r6:03197500 r5:bf0e9aa8 r4:bf827400 r3:808ae128
[<80481878>] (clk_get_rate+0x0/0x64) from [<8046374c>] (esdhc_pltfm_set_clock+0x20/0x2a4)
 r5:bf0e9aa8 r4:bf0e9c40
[<8046372c>] (esdhc_pltfm_set_clock+0x0/0x2a4) from [<8045d70c>] (sdhci_set_clock+0x4c/0x498)
[<8045d6c0>] (sdhci_set_clock+0x0/0x498) from [<80461518>] (sdhci_do_set_ios+0x344/0x720)
 r8:0000003b r7:20000113 r6:bf0e9d68 r5:bf0e9aa8 r4:bf0e9c40
r3:00000000
[<804611d4>] (sdhci_do_set_ios+0x0/0x720) from [<80461924>] (sdhci_set_ios+0x30/0x3c)
 r9:00000004 r8:bf131000 r7:bf131048 r6:00000000 r5:bf0e9aa8
r4:bf0e9800
[<804618f4>] (sdhci_set_ios+0x0/0x3c) from [<8044c390>] (__mmc_set_clock+0x44/0x60)
 r5:03197500 r4:bf0e9800
[<8044c34c>] (__mmc_set_clock+0x0/0x60) from [<8044cd4c>] (mmc_set_clock+0x10/0x14)
 r5:00000000 r4:bf0e9800
[<8044cd3c>] (mmc_set_clock+0x0/0x14) from [<8044f8f4>] (mmc_init_card+0x1b4/0x1520)
[<8044f740>] (mmc_init_card+0x0/0x1520) from [<80450f00>] (mmc_attach_mmc+0xb4/0x194)
[<80450e4c>] (mmc_attach_mmc+0x0/0x194) from [<8044da08>] (mmc_rescan+0x294/0x2f0)
 r5:8065f358 r4:bf0e9af8
[<8044d774>] (mmc_rescan+0x0/0x2f0) from [<8003db94>] (process_one_work+0x1a4/0x468)
 r8:00000000 r7:bfa29eb0 r6:bf80dc00 r5:bf0e9af8 r4:bf9e3f00
r3:8044d774
[<8003d9f0>] (process_one_work+0x0/0x468) from [<8003e850>] (worker_thread+0x118/0x3e0)
[<8003e738>] (worker_thread+0x0/0x3e0) from [<80044de0>] (kthread+0xd4/0xf0)
[<80044d0c>] (kthread+0x0/0xf0) from [<8000e9c8>] (ret_from_fork+0x14/0x2c)
 r7:00000000 r6:00000000 r5:80044d0c r4:bf9e7f00

Fixes: 0ddf03c mmc: esdhc-imx: parse max-frequency from devicetree
Signed-off-by: Dong Aisheng <[email protected]>
Acked-by: Shawn Guo <[email protected]>
Tested-by: Philippe De Muyter <[email protected]>
Cc: stable <[email protected]> # 3.13
Signed-off-by: Chris Ball <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
Avoid circular mutex lock by pushing the dev->lock to the .fini
callback on each extension.

As em28xx-dvb, em28xx-alsa and em28xx-rc have their own data
structures, and don't touch at the common structure during .fini,
only em28xx-v4l needs to be locked.

[   90.994317] ======================================================
[   90.994356] [ INFO: possible circular locking dependency detected ]
[   90.994395] 3.13.0-rc1+ #24 Not tainted
[   90.994427] -------------------------------------------------------
[   90.994458] khubd/54 is trying to acquire lock:
[   90.994490]  (&card->controls_rwsem){++++.+}, at: [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.994656]
[   90.994656] but task is already holding lock:
[   90.994688]  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]
[   90.994843]
[   90.994843] which lock already depends on the new lock.
[   90.994843]
[   90.994874]
[   90.994874] the existing dependency chain (in reverse order) is:
[   90.994905]
-> #1 (&dev->lock){+.+.+.}:
[   90.995057]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995121]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995182]        [<ffffffff816a5b6c>] mutex_lock_nested+0x5c/0x3c0
[   90.995245]        [<ffffffffa0422cca>] em28xx_vol_put_mute+0x1ba/0x1d0 [em28xx_alsa]
[   90.995309]        [<ffffffffa017813d>] snd_ctl_elem_write+0xfd/0x140 [snd]
[   90.995376]        [<ffffffffa01791c2>] snd_ctl_ioctl+0xe2/0x810 [snd]
[   90.995442]        [<ffffffff811db8b0>] do_vfs_ioctl+0x300/0x520
[   90.995504]        [<ffffffff811dbb51>] SyS_ioctl+0x81/0xa0
[   90.995568]        [<ffffffff816b1929>] system_call_fastpath+0x16/0x1b
[   90.995630]
-> #0 (&card->controls_rwsem){++++.+}:
[   90.995780]        [<ffffffff810b7a47>] check_prevs_add+0x947/0x950
[   90.995841]        [<ffffffff810b8fa3>] __lock_acquire+0xb43/0x1330
[   90.995901]        [<ffffffff810b9f82>] lock_acquire+0xa2/0x120
[   90.995962]        [<ffffffff816a762b>] down_write+0x3b/0xa0
[   90.996022]        [<ffffffffa0177b08>] snd_ctl_dev_free+0x28/0x60 [snd]
[   90.996088]        [<ffffffffa017a255>] snd_device_free+0x65/0x140 [snd]
[   90.996154]        [<ffffffffa017a751>] snd_device_free_all+0x61/0xa0 [snd]
[   90.996219]        [<ffffffffa0173af4>] snd_card_do_free+0x14/0x130 [snd]
[   90.996283]        [<ffffffffa0173f14>] snd_card_free+0x84/0x90 [snd]
[   90.996349]        [<ffffffffa0423397>] em28xx_audio_fini+0x97/0xb0 [em28xx_alsa]
[   90.996411]        [<ffffffffa040dba6>] em28xx_close_extension+0x56/0x90 [em28xx]
[   90.996475]        [<ffffffffa040f639>] em28xx_usb_disconnect+0x79/0x90 [em28xx]
[   90.996539]        [<ffffffff814a06e7>] usb_unbind_interface+0x67/0x1d0
[   90.996620]        [<ffffffff8142920f>] __device_release_driver+0x7f/0xf0
[   90.996682]        [<ffffffff814292a5>] device_release_driver+0x25/0x40
[   90.996742]        [<ffffffff81428b0c>] bus_remove_device+0x11c/0x1a0
[   90.996801]        [<ffffffff81425536>] device_del+0x136/0x1d0
[   90.996863]        [<ffffffff8149e0c0>] usb_disable_device+0xb0/0x290
[   90.996923]        [<ffffffff814930c5>] usb_disconnect+0xb5/0x1d0
[   90.996984]        [<ffffffff81495ab6>] hub_port_connect_change+0xd6/0xad0
[   90.997044]        [<ffffffff814967c3>] hub_events+0x313/0x9b0
[   90.997105]        [<ffffffff81496e95>] hub_thread+0x35/0x170
[   90.997165]        [<ffffffff8108ea2f>] kthread+0xff/0x120
[   90.997226]        [<ffffffff816b187c>] ret_from_fork+0x7c/0xb0
[   90.997287]
[   90.997287] other info that might help us debug this:
[   90.997287]
[   90.997318]  Possible unsafe locking scenario:
[   90.997318]
[   90.997348]        CPU0                    CPU1
[   90.997378]        ----                    ----
[   90.997408]   lock(&dev->lock);
[   90.997497]                                lock(&card->controls_rwsem);
[   90.997607]                                lock(&dev->lock);
[   90.997697]   lock(&card->controls_rwsem);
[   90.997786]
[   90.997786]  *** DEADLOCK ***
[   90.997786]
[   90.997817] 5 locks held by khubd/54:
[   90.997847]  #0:  (&__lockdep_no_validate__){......}, at: [<ffffffff81496564>] hub_events+0xb4/0x9b0
[   90.998025]  #1:  (&__lockdep_no_validate__){......}, at: [<ffffffff81493076>] usb_disconnect+0x66/0x1d0
[   90.998204]  #2:  (&__lockdep_no_validate__){......}, at: [<ffffffff8142929d>] device_release_driver+0x1d/0x40
[   90.998383]  #3:  (em28xx_devlist_mutex){+.+.+.}, at: [<ffffffffa040db77>] em28xx_close_extension+0x27/0x90 [em28xx]
[   90.998567]  #4:  (&dev->lock){+.+.+.}, at: [<ffffffffa040db81>] em28xx_close_extension+0x31/0x90 [em28xx]

Reviewed-by: Frank Schäfer <[email protected]>
Tested-by: Antti Palosaari <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
Paul Durrant says:

====================
make skb_checksum_setup generally available

Both xen-netfront and xen-netback need to be able to set up the partial
checksum offset of an skb and may also need to recalculate the pseudo-
header checksum in the process. This functionality is currently private
and duplicated between the two drivers.

Patch #1 of this series moves the implementation into the core network code
as there is nothing xen-specific about it and it is potentially useful to
any network driver.
Patch #2 removes the private implementation from netback.
Patch #3 removes the private implementation from netfront.

v2:
- Put skb_checksum_setup in skbuff.c rather than dev.c
- remove inline
====================

Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
When the PR host is running on a POWER8 machine in POWER8 mode, it
will use doorbell interrupts for IPIs.  If one of them arrives while
we are in the guest, we pop out of the guest with trap number 0xA00,
which isn't handled by kvmppc_handle_exit_pr, leading to the following
BUG_ON:

[  331.436215] exit_nr=0xa00 | pc=0x1d2c | msr=0x800000000000d032
[  331.437522] ------------[ cut here ]------------
[  331.438296] kernel BUG at arch/powerpc/kvm/book3s_pr.c:982!
[  331.439063] Oops: Exception in kernel mode, sig: 5 [#2]
[  331.439819] SMP NR_CPUS=1024 NUMA pSeries
[  331.440552] Modules linked in: tun nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw virtio_net kvm binfmt_misc ibmvscsi scsi_transport_srp scsi_tgt virtio_blk
[  331.447614] CPU: 11 PID: 1296 Comm: qemu-system-ppc Tainted: G      D      3.11.7-200.2.fc19.ppc64p7 #1
[  331.448920] task: c0000003bdc8c000 ti: c0000003bd32c000 task.ti: c0000003bd32c000
[  331.450088] NIP: d0000000025d6b9c LR: d0000000025d6b98 CTR: c0000000004cfdd0
[  331.451042] REGS: c0000003bd32f420 TRAP: 0700   Tainted: G      D       (3.11.7-200.2.fc19.ppc64p7)
[  331.452331] MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>  CR: 28004824  XER: 20000000
[  331.454616] SOFTE: 1
[  331.455106] CFAR: c000000000848bb8
[  331.455726]
GPR00: d0000000025d6b98 c0000003bd32f6a0 d0000000026017b8 0000000000000032
GPR04: c0000000018627f8 c000000001873208 320d0a3030303030 3030303030643033
GPR08: c000000000c490a8 0000000000000000 0000000000000000 0000000000000002
GPR12: 0000000028004822 c00000000fdc6300 0000000000000000 00000100076ec310
GPR16: 000000002ae343b8 00003ffffd397398 0000000000000000 0000000000000000
GPR20: 00000100076f16f4 00000100076ebe60 0000000000000008 ffffffffffffffff
GPR24: 0000000000000000 0000008001041e60 0000000000000000 0000008001040ce8
GPR28: c0000003a2d80000 0000000000000a00 0000000000000001 c0000003a2681810
[  331.466504] NIP [d0000000025d6b9c] .kvmppc_handle_exit_pr+0x75c/0xa80 [kvm]
[  331.466999] LR [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm]
[  331.467517] Call Trace:
[  331.467909] [c0000003bd32f6a0] [d0000000025d6b98] .kvmppc_handle_exit_pr+0x758/0xa80 [kvm] (unreliable)
[  331.468553] [c0000003bd32f750] [d0000000025d98f0] kvm_start_lightweight+0xb4/0xc4 [kvm]
[  331.469189] [c0000003bd32f920] [d0000000025d7648] .kvmppc_vcpu_run_pr+0xd8/0x270 [kvm]
[  331.469838] [c0000003bd32f9c0] [d0000000025cf748] .kvmppc_vcpu_run+0xc8/0xf0 [kvm]
[  331.470790] [c0000003bd32fa50] [d0000000025cc19c] .kvm_arch_vcpu_ioctl_run+0x5c/0x1b0 [kvm]
[  331.471401] [c0000003bd32fae0] [d0000000025c4888] .kvm_vcpu_ioctl+0x478/0x730 [kvm]
[  331.472026] [c0000003bd32fc90] [c00000000026192c] .do_vfs_ioctl+0x4dc/0x7a0
[  331.472561] [c0000003bd32fd80] [c000000000261cc4] .SyS_ioctl+0xd4/0xf0
[  331.473095] [c0000003bd32fe30] [c000000000009ed8] syscall_exit+0x0/0x98
[  331.473633] Instruction dump:
[  331.473766] 4bfff9b4 2b9d0800 419efc18 60000000 60420000 3d220000 e8bf11a0 e8df12a8
[  331.474733] 7fa4eb78 e8698660 48015165 e8410028 <0fe00000> 813f00e4 3ba00000 39290001
[  331.475386] ---[ end trace 49fc47d994c1f8f2 ]---
[  331.479817]

This fixes the problem by making kvmppc_handle_exit_pr() recognize the
interrupt.  We also need to jump to the doorbell interrupt handler in
book3s_segment.S to handle the interrupt on the way out of the guest.
Having done that, there's nothing further to be done in
kvmppc_handle_exit_pr().

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Alexander Graf <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
…t/mmarek/kbuild

Pull kbuild changes from Michal Marek:
 - fix make -s detection with make-4.0
 - fix for scripts/setlocalversion when the kernel repository is a
   submodule
 - do not hardcode ';' in macros that expand to assembler code, as some
   architectures' assemblers use a different character for newline
 - Fix passing --gdwarf-2 to the assembler

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  frv: Remove redundant debugging info flag
  mn10300: Remove redundant debugging info flag
  kbuild: Fix debugging info generation for .S files
  arch: use ASM_NL instead of ';' for assembler new line character in the macro
  kbuild: Fix silent builds with make-4
  Fix detectition of kernel git repository in setlocalversion script [take #2]
jthornber pushed a commit that referenced this issue Feb 17, 2014
… into fixes

mvebu fixes for v3.13 (incremental #2)

 - allow building and booting DT and non-DT plat-orion SoCs
 - catch proper return value for kirkwood_pm_init()
 - properly check return of of_iomap to solve boot hangs (mirabox, others)
 - remove a compile warning on Armada 370 with non-SMP.

* tag 'mvebu-fixes-3.13-2' of git://git.infradead.org/linux-mvebu:
  ARM: mvebu: fix compilation warning on Armada 370 (i.e. non-SMP)
  ARM: mvebu: Fix kernel hang in mvebu_soc_id_init() when of_iomap failed
  ARM: kirkwood: kirkwood_pm_init() should return void
  ARM: orion: provide C-style interrupt handler for MULTI_IRQ_HANDLER

Signed-off-by: Olof Johansson <[email protected]>
jthornber pushed a commit that referenced this issue Feb 17, 2014
sdata->u.ap.request_smps_work can’t be flushed synchronously
under wdev_lock(wdev) since ieee80211_request_smps_ap_work
itself locks the same lock.
While at it, reset the driver_smps_mode when the ap is
stopped to its default: OFF.

This solves:

======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0-ipeer+ #2 Tainted: G           O
-------------------------------------------------------
rmmod/2867 is trying to acquire lock:
  ((&sdata->u.ap.request_smps_work)){+.+...}, at: [<c105b8d0>] flush_work+0x0/0x90

but task is already holding lock:
  (&wdev->mtx){+.+.+.}, at: [<f9b32626>] cfg80211_stop_ap+0x26/0x230 [cfg80211]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&wdev->mtx){+.+.+.}:
        [<c10aefa9>] lock_acquire+0x79/0xe0
        [<c1607a1a>] mutex_lock_nested+0x4a/0x360
        [<fb06288b>] ieee80211_request_smps_ap_work+0x2b/0x50 [mac80211]
        [<c105cdd8>] process_one_work+0x198/0x450
        [<c105d469>] worker_thread+0xf9/0x320
        [<c10669ff>] kthread+0x9f/0xb0
        [<c1613397>] ret_from_kernel_thread+0x1b/0x28

-> #0 ((&sdata->u.ap.request_smps_work)){+.+...}:
        [<c10ae9df>] __lock_acquire+0x183f/0x1910
        [<c10aefa9>] lock_acquire+0x79/0xe0
        [<c105b917>] flush_work+0x47/0x90
        [<c105d867>] __cancel_work_timer+0x67/0xe0
        [<c105d90f>] cancel_work_sync+0xf/0x20
        [<fb0765cc>] ieee80211_stop_ap+0x8c/0x340 [mac80211]
        [<f9b3268c>] cfg80211_stop_ap+0x8c/0x230 [cfg80211]
        [<f9b0d8f9>] cfg80211_leave+0x79/0x100 [cfg80211]
        [<f9b0da72>] cfg80211_netdev_notifier_call+0xf2/0x4f0 [cfg80211]
        [<c160f2c9>] notifier_call_chain+0x59/0x130
        [<c106c6de>] __raw_notifier_call_chain+0x1e/0x30
        [<c106c70f>] raw_notifier_call_chain+0x1f/0x30
        [<c14f8213>] call_netdevice_notifiers_info+0x33/0x70
        [<c14f8263>] call_netdevice_notifiers+0x13/0x20
        [<c14f82a4>] __dev_close_many+0x34/0xb0
        [<c14f83fe>] dev_close_many+0x6e/0xc0
        [<c14f9c77>] rollback_registered_many+0xa7/0x1f0
        [<c14f9dd4>] unregister_netdevice_many+0x14/0x60
        [<fb06f4d9>] ieee80211_remove_interfaces+0xe9/0x170 [mac80211]
        [<fb055116>] ieee80211_unregister_hw+0x56/0x110 [mac80211]
        [<fa3e9396>] iwl_op_mode_mvm_stop+0x26/0xe0 [iwlmvm]
        [<f9b9d8ca>] _iwl_op_mode_stop+0x3a/0x70 [iwlwifi]
        [<f9b9d96f>] iwl_opmode_deregister+0x6f/0x90 [iwlwifi]
        [<fa405179>] __exit_compat+0xd/0x19 [iwlmvm]
        [<c10b8bf9>] SyS_delete_module+0x179/0x2b0
        [<c1613421>] sysenter_do_call+0x12/0x32

Fixes: 687da13 ("mac80211: implement SMPS for AP")
Cc: <[email protected]> [3.13]
Reported-by: Ilan Peer <[email protected]>
Signed-off-by: Emmanuel Grumbach <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
jthornber pushed a commit that referenced this issue Apr 3, 2014
sparc_cpu_model isn't in asm/system.h any more, so remove a comment
about it.

Signed-off-by: David Howells <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: [email protected]
Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Apr 3, 2014
Pull sparc fixes from David Miller:
 "Three minor fixes from David Howells and Paul Gortmaker"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  Sparc: sparc_cpu_model isn't in asm/system.h any more [ver #2]
  sparc32: make copy_to/from_user_page() usable from modular code
  sparc32: fix build failure for arch_jump_label_transform
jthornber pushed a commit that referenced this issue Apr 3, 2014
vmxnet3's netpoll driver is incorrectly coded.  It directly calls
vmxnet3_do_poll, which is the driver internal napi poll routine.  As the netpoll
controller method doesn't block real napi polls in any way, there is a potential
for race conditions in which the netpoll controller method and the napi poll
method run concurrently.  The result is data corruption causing panics such as this
one recently observed:
PID: 1371   TASK: ffff88023762caa0  CPU: 1   COMMAND: "rs:main Q:Reg"
 #0 [ffff88023abd5780] machine_kexec at ffffffff81038f3b
 #1 [ffff88023abd57e0] crash_kexec at ffffffff810c5d92
 #2 [ffff88023abd58b0] oops_end at ffffffff8152b570
 #3 [ffff88023abd58e0] die at ffffffff81010e0b
 #4 [ffff88023abd5910] do_trap at ffffffff8152add4
 #5 [ffff88023abd5970] do_invalid_op at ffffffff8100cf95
 #6 [ffff88023abd5a10] invalid_op at ffffffff8100bf9b
    [exception RIP: vmxnet3_rq_rx_complete+1968]
    RIP: ffffffffa00f1e80  RSP: ffff88023abd5ac8  RFLAGS: 00010086
    RAX: 0000000000000000  RBX: ffff88023b5dcee0  RCX: 00000000000000c0
    RDX: 0000000000000000  RSI: 00000000000005f2  RDI: ffff88023b5dcee0
    RBP: ffff88023abd5b48   R8: 0000000000000000   R9: ffff88023a3b6048
    R10: 0000000000000000  R11: 0000000000000002  R12: ffff8802398d4cd8
    R13: ffff88023af35140  R14: ffff88023b60c890  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88023abd5b50] vmxnet3_do_poll at ffffffffa00f204a [vmxnet3]
 #8 [ffff88023abd5b80] vmxnet3_netpoll at ffffffffa00f209c [vmxnet3]
 #9 [ffff88023abd5ba0] netpoll_poll_dev at ffffffff81472bb7

The fix is to do as other drivers do, and have the poll controller call the top
half interrupt handler, which schedules a napi poll properly to recieve frames

Tested by myself, successfully.

Signed-off-by: Neil Horman <[email protected]>
CC: Shreyas Bhatewara <[email protected]>
CC: "VMware, Inc." <[email protected]>
CC: "David S. Miller" <[email protected]>
CC: [email protected]
Reviewed-by: Shreyas N Bhatewara <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Apr 3, 2014
There are two problematic situations.

A deadlock can happen when is_percpu is false because it can get
interrupted while holding the spinlock. Then it executes
ovs_flow_stats_update() in softirq context which tries to get
the same lock.

The second sitation is that when is_percpu is true, the code
correctly disables BH but only for the local CPU, so the
following can happen when locking the remote CPU without
disabling BH:

       CPU#0                            CPU#1
  ovs_flow_stats_get()
   stats_read()
 +->spin_lock remote CPU#1        ovs_flow_stats_get()
 |  <interrupted>                  stats_read()
 |  ...                       +-->  spin_lock remote CPU#0
 |                            |     <interrupted>
 |  ovs_flow_stats_update()   |     ...
 |   spin_lock local CPU#0 <--+     ovs_flow_stats_update()
 +---------------------------------- spin_lock local CPU#1

This patch disables BH for both cases fixing the deadlocks.
Acked-by: Jesse Gross <[email protected]>

=================================
[ INFO: inconsistent lock state ]
3.14.0-rc8-00007-g632b06a #1 Tainted: G          I
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
swapper/0/0 [HC0[0]:SC1[5]:HE1:SE0] takes:
(&(&cpu_stats->lock)->rlock){+.?...}, at: [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff810f973f>] __lock_acquire+0x68f/0x1c40
[<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
[<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
[<ffffffffa05dd9e4>] ovs_flow_stats_get+0xc4/0x1e0 [openvswitch]
[<ffffffffa05da855>] ovs_flow_cmd_fill_info+0x185/0x360 [openvswitch]
[<ffffffffa05daf05>] ovs_flow_cmd_build_info.constprop.27+0x55/0x90 [openvswitch]
[<ffffffffa05db41d>] ovs_flow_cmd_new_or_set+0x4dd/0x570 [openvswitch]
[<ffffffff816c245d>] genl_family_rcv_msg+0x1cd/0x3f0
[<ffffffff816c270e>] genl_rcv_msg+0x8e/0xd0
[<ffffffff816c0239>] netlink_rcv_skb+0xa9/0xc0
[<ffffffff816c0798>] genl_rcv+0x28/0x40
[<ffffffff816bf830>] netlink_unicast+0x100/0x1e0
[<ffffffff816bfc57>] netlink_sendmsg+0x347/0x770
[<ffffffff81668e9c>] sock_sendmsg+0x9c/0xe0
[<ffffffff816692d9>] ___sys_sendmsg+0x3a9/0x3c0
[<ffffffff8166a911>] __sys_sendmsg+0x51/0x90
[<ffffffff8166a962>] SyS_sendmsg+0x12/0x20
[<ffffffff817e3ce9>] system_call_fastpath+0x16/0x1b
irq event stamp: 1740726
hardirqs last  enabled at (1740726): [<ffffffff8175d5e0>] ip6_finish_output2+0x4f0/0x840
hardirqs last disabled at (1740725): [<ffffffff8175d59b>] ip6_finish_output2+0x4ab/0x840
softirqs last  enabled at (1740674): [<ffffffff8109be12>] _local_bh_enable+0x22/0x50
softirqs last disabled at (1740675): [<ffffffff8109db05>] irq_exit+0xc5/0xd0

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&cpu_stats->lock)->rlock);
  <Interrupt>
    lock(&(&cpu_stats->lock)->rlock);

 *** DEADLOCK ***

5 locks held by swapper/0/0:
 #0:  (((&ifa->dad_timer))){+.-...}, at: [<ffffffff810a7155>] call_timer_fn+0x5/0x320
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81788a55>] mld_sendpack+0x5/0x4a0
 #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8175d149>] ip6_finish_output2+0x59/0x840
 #3:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8168ba75>] __dev_queue_xmit+0x5/0x9b0
 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa05e41b5>] internal_dev_xmit+0x5/0x110 [openvswitch]

stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G          I  3.14.0-rc8-00007-g632b06a #1
Hardware name:                  /DX58SO, BIOS SOX5810J.86A.5599.2012.0529.2218 05/29/2012
 0000000000000000 0fcf20709903df0c ffff88042d603808 ffffffff817cfe3c
 ffffffff81c134c0 ffff88042d603858 ffffffff817cb6da 0000000000000005
 ffffffff00000001 ffff880400000000 0000000000000006 ffffffff81c134c0
Call Trace:
 <IRQ>  [<ffffffff817cfe3c>] dump_stack+0x4d/0x66
 [<ffffffff817cb6da>] print_usage_bug+0x1f4/0x205
 [<ffffffff810f7f10>] ? check_usage_backwards+0x180/0x180
 [<ffffffff810f8963>] mark_lock+0x223/0x2b0
 [<ffffffff810f96d3>] __lock_acquire+0x623/0x1c40
 [<ffffffff810f5707>] ? __lock_is_held+0x57/0x80
 [<ffffffffa05e26c6>] ? masked_flow_lookup+0x236/0x250 [openvswitch]
 [<ffffffff810fb4e2>] lock_acquire+0xa2/0x1d0
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffff817d8d9e>] _raw_spin_lock+0x3e/0x80
 [<ffffffffa05dd8a1>] ? ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dd8a1>] ovs_flow_stats_update+0x51/0xd0 [openvswitch]
 [<ffffffffa05dcc64>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
 [<ffffffff810f93f7>] ? __lock_acquire+0x347/0x1c40
 [<ffffffffa05e3bea>] ovs_vport_receive+0x2a/0x30 [openvswitch]
 [<ffffffffa05e4218>] internal_dev_xmit+0x68/0x110 [openvswitch]
 [<ffffffffa05e41b5>] ? internal_dev_xmit+0x5/0x110 [openvswitch]
 [<ffffffff8168b4a6>] dev_hard_start_xmit+0x2e6/0x8b0
 [<ffffffff8168be87>] __dev_queue_xmit+0x417/0x9b0
 [<ffffffff8168ba75>] ? __dev_queue_xmit+0x5/0x9b0
 [<ffffffff8175d5e0>] ? ip6_finish_output2+0x4f0/0x840
 [<ffffffff8168c430>] dev_queue_xmit+0x10/0x20
 [<ffffffff8175d641>] ip6_finish_output2+0x551/0x840
 [<ffffffff8176128a>] ? ip6_finish_output+0x9a/0x220
 [<ffffffff8176128a>] ip6_finish_output+0x9a/0x220
 [<ffffffff8176145f>] ip6_output+0x4f/0x1f0
 [<ffffffff81788c29>] mld_sendpack+0x1d9/0x4a0
 [<ffffffff817895b8>] mld_send_initial_cr.part.32+0x88/0xa0
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8178e301>] ipv6_mc_dad_complete+0x31/0x50
 [<ffffffff817690d7>] addrconf_dad_completed+0x147/0x220
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff8176934f>] addrconf_dad_timer+0x19f/0x1c0
 [<ffffffff810a71e9>] call_timer_fn+0x99/0x320
 [<ffffffff810a7155>] ? call_timer_fn+0x5/0x320
 [<ffffffff817691b0>] ? addrconf_dad_completed+0x220/0x220
 [<ffffffff810a76c4>] run_timer_softirq+0x254/0x3b0
 [<ffffffff8109d47d>] __do_softirq+0x12d/0x480

Signed-off-by: Flavio Leitner <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
akiradeveloper referenced this issue Apr 9, 2014
…-format

[writeboost] fix: kfree(buf) places wrongly
jthornber added a commit that referenced this issue Apr 30, 2014
Commit c140e1c ("dm thin: use per thin device deferred bio lists")
introduced the use of an rculist for all active thin devices.  The use
of rcu_read_lock() in process_deferred_bios() can result in a BUG if a
dm_bio_prison_cell must be allocated as a side-effect of bio_detain():

 BUG: sleeping function called from invalid context at mm/mempool.c:203
 in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
 3 locks held by kworker/u8:0/6:
   #0:  ("dm-" "thin"){.+.+..}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
   #1:  ((&pool->worker)){+.+...}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
   #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816360b5>] do_worker+0x5/0x4d0

We can't process deferred bios with the rcu lock held, since
dm_bio_prison_cell allocation may block if the bio-prison's cell mempool
is exhausted.

To fix:

- Introduce a refcount and completion field to each thin_c

- Add thin_get/put methods for adjusting the refcount.  If the refcount
  hits zero then the completion is triggered.

- Initialise refcount to 1 when creating thin_c

- When iterating the active_thins list we thin_get() whilst the rcu
  lock is held.

- After the rcu lock is dropped we process the deferred bios for that
  thin.

- When destroying a thin_c we thin_put() and then wait for the
  completion -- to avoid a race between the worker thread iterating
  from that thin_c and destroying the thin_c.

Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
jthornber pushed a commit that referenced this issue Jan 23, 2015
Redefine REALTEK_USB_DEVICE for the desired USB interface for probe().
There are three USB interfaces for the device. USB_CLASS_COMM and
USB_CLASS_CDC_DATA are for ECM mode (config #2). USB_CLASS_VENDOR_SPEC
is for the vendor mode (config #1). However, we are not interesting
in USB_CLASS_CDC_DATA for probe(), so redefine REALTEK_USB_DEVICE
to ignore the USB interface class of USB_CLASS_CDC_DATA.

Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Jan 23, 2015
…nux/kernel/git/aegl/linux

Pull pstore update #2 from Tony Luck:
 "Couple of pstore-ram enhancements to allow use of different memory
  attributes"

* tag 'please-pull-morepstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  pstore-ram: Allow optional mapping with pgprot_noncached
  pstore-ram: Fix hangs by using write-combine mappings
jthornber pushed a commit that referenced this issue Feb 23, 2015
When unloading the module 'g_hid.ko', the urb request will be dequeued and the
completion routine will be excuted. If there is no urb packet, the urb request
will not be added to the endpoint queue and the completion routine pointer in
urb request is NULL.

Accessing to this NULL function pointer will cause the Oops issue reported
below.

Add the code to check if the urb request is in the endpoint queue
or not. If the urb request is not in the endpoint queue, a negative
error code will be returned.

Here is the Oops log:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = dedf0000
[00000000] *pgd=3ede5831, *pte=00000000, *ppte=00000000
Internal error: Oops: 80000007 [#1] ARM
Modules linked in: g_hid(-) usb_f_hid libcomposite
CPU: 0 PID: 923 Comm: rmmod Not tainted 3.18.0+ #2
Hardware name: Atmel SAMA5 (Device Tree)
task: df6b1100 ti: dedf6000 task.ti: dedf6000
PC is at 0x0
LR is at usb_gadget_giveback_request+0xc/0x10
pc : [<00000000>]    lr : [<c02ace88>]    psr: 60000093
sp : dedf7eb0  ip : df572634  fp : 00000000
r10: 00000000  r9 : df52e210  r8 : 60000013
r7 : df6a9858  r6 : df52e210  r5 : df6a9858  r4 : df572600
r3 : 00000000  r2 : ffffff98  r1 : df572600  r0 : df6a9868
Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 10c53c7d  Table: 3edf0059  DAC: 00000015
Process rmmod (pid: 923, stack limit = 0xdedf6230)
Stack: (0xdedf7eb0 to 0xdedf8000)
7ea0:                                     00000000 c02adbbc df572580 deced608
7ec0: df572600 df6a9868 df572634 c02aed3c df577c00 c01b8608 00000000 df6be27c
7ee0: 00200200 00100100 bf0162f4 c000e544 dedf6000 00000000 00000000 bf010c00
7f00: bf0162cc bf00159c 00000000 df572980 df52e218 00000001 df5729b8 bf0031d0
[..]
[<c02ace88>] (usb_gadget_giveback_request) from [<c02adbbc>] (request_complete+0x64/0x88)
[<c02adbbc>] (request_complete) from [<c02aed3c>] (usba_ep_dequeue+0x70/0x128)
[<c02aed3c>] (usba_ep_dequeue) from [<bf010c00>] (hidg_unbind+0x50/0x7c [usb_f_hid])
[<bf010c00>] (hidg_unbind [usb_f_hid]) from [<bf00159c>] (remove_config.isra.6+0x98/0x9c [libcomposite])
[<bf00159c>] (remove_config.isra.6 [libcomposite]) from [<bf0031d0>] (__composite_unbind+0x34/0x98 [libcomposite])
[<bf0031d0>] (__composite_unbind [libcomposite]) from [<c02acee0>] (usb_gadget_remove_driver+0x50/0x78)
[<c02acee0>] (usb_gadget_remove_driver) from [<c02ad570>] (usb_gadget_unregister_driver+0x64/0x94)
[<c02ad570>] (usb_gadget_unregister_driver) from [<bf0160c0>] (hidg_cleanup+0x10/0x34 [g_hid])
[<bf0160c0>] (hidg_cleanup [g_hid]) from [<c0056748>] (SyS_delete_module+0x118/0x19c)
[<c0056748>] (SyS_delete_module) from [<c000e3c0>] (ret_fast_syscall+0x0/0x30)
Code: bad PC value

Signed-off-by: Songjun Wu <[email protected]>
[[email protected]: reworked the commit message]
Signed-off-by: Nicolas Ferre <[email protected]>
Fixes: 914a3f3 ("USB: add atmel_usba_udc driver")
Cc: <[email protected]> # 2.6.x-ish
Signed-off-by: Felipe Balbi <[email protected]>
jthornber pushed a commit that referenced this issue Feb 23, 2015
This patch is to fix two deadlock cases.
Deadlock 1:
CPU #1
 pinctrl_register-> pinctrl_get ->
 create_pinctrl
 (Holding lock pinctrl_maps_mutex)
 -> get_pinctrl_dev_from_devname
 (Trying to acquire lock pinctrldev_list_mutex)
CPU #0
 pinctrl_unregister
 (Holding lock pinctrldev_list_mutex)
 -> pinctrl_put ->> pinctrl_free ->
 pinctrl_dt_free_maps -> pinctrl_unregister_map
 (Trying to acquire lock pinctrl_maps_mutex)

Simply to say
CPU#1 is holding lock A and trying to acquire lock B,
CPU#0 is holding lock B and trying to acquire lock A.

Deadlock 2:
CPU #3
 pinctrl_register-> pinctrl_get ->
 create_pinctrl
 (Holding lock pinctrl_maps_mutex)
 -> get_pinctrl_dev_from_devname
 (Trying to acquire lock pinctrldev_list_mutex)
CPU #2
 pinctrl_unregister
 (Holding lock pctldev->mutex)
 -> pinctrl_put ->> pinctrl_free ->
 pinctrl_dt_free_maps -> pinctrl_unregister_map
 (Trying to acquire lock pinctrl_maps_mutex)
CPU #0
 tegra_gpio_request
 (Holding lock pinctrldev_list_mutex)
 -> pinctrl_get_device_gpio_range
 (Trying to acquire lock pctldev->mutex)

Simply to say
CPU#3 is holding lock A and trying to acquire lock D,
CPU#2 is holding lock B and trying to acquire lock A,
CPU#0 is holding lock D and trying to acquire lock B.

Cc: Stable <[email protected]>
Signed-off-by: Jim Lin <[email protected]>
Signed-off-by: Linus Walleij <[email protected]>
jthornber pushed a commit that referenced this issue Feb 23, 2015
It is possible for ata_sff_flush_pio_task() to set ap->hsm_task_state to
HSM_ST_IDLE in between the time __ata_sff_port_intr() checks for HSM_ST_IDLE
and before it calls ata_sff_hsm_move() causing ata_sff_hsm_move() to BUG().

This problem is hard to reproduce making this patch hard to verify, but this
fix will prevent the race.

I have not been able to reproduce the problem, but here is a crash dump from
a 2.6.32 kernel.

On examining the ata port's state, its hsm_task_state field has a value of HSM_ST_IDLE:

crash> struct ata_port.hsm_task_state ffff881c1121c000
  hsm_task_state = 0

Normally, this should not be possible as ata_sff_hsm_move() was called from ata_sff_host_intr(),
which checks hsm_task_state and won't call ata_sff_hsm_move() if it has a HSM_ST_IDLE value.

PID: 11053  TASK: ffff8816e846cae0  CPU: 0   COMMAND: "sshd"
 #0 [ffff88008ba03960] machine_kexec at ffffffff81038f3b
 #1 [ffff88008ba039c0] crash_kexec at ffffffff810c5d92
 #2 [ffff88008ba03a90] oops_end at ffffffff8152b510
 #3 [ffff88008ba03ac0] die at ffffffff81010e0b
 #4 [ffff88008ba03af0] do_trap at ffffffff8152ad74
 #5 [ffff88008ba03b50] do_invalid_op at ffffffff8100cf95
 #6 [ffff88008ba03bf0] invalid_op at ffffffff8100bf9b
    [exception RIP: ata_sff_hsm_move+317]
    RIP: ffffffff813a77ad  RSP: ffff88008ba03ca0  RFLAGS: 00010097
    RAX: 0000000000000000  RBX: ffff881c1121dc60  RCX: 0000000000000000
    RDX: ffff881c1121dd10  RSI: ffff881c1121dc60  RDI: ffff881c1121c000
    RBP: ffff88008ba03d00   R8: 0000000000000000   R9: 000000000000002e
    R10: 000000000001003f  R11: 000000000000009b  R12: ffff881c1121c000
    R13: 0000000000000000  R14: 0000000000000050  R15: ffff881c1121dd78
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88008ba03d08] ata_sff_host_intr at ffffffff813a7fbd
 #8 [ffff88008ba03d38] ata_sff_interrupt at ffffffff813a821e
 #9 [ffff88008ba03d78] handle_IRQ_event at ffffffff810e6ec0
--- <IRQ stack> ---
    [exception RIP: pipe_poll+48]
    RIP: ffffffff81192780  RSP: ffff880f26d459b8  RFLAGS: 00000246
    RAX: 0000000000000000  RBX: ffff880f26d459c8  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: 0000000000000000  RDI: ffff881a0539fa80
    RBP: ffffffff8100bb8e   R8: ffff8803b23324a0   R9: 0000000000000000
    R10: ffff880f26d45dd0  R11: 0000000000000008  R12: ffffffff8109b646
    R13: ffff880f26d45948  R14: 0000000000000246  R15: 0000000000000246
    ORIG_RAX: ffffffffffffff10  CS: 0010  SS: 0018
    RIP: 00007f26017435c3  RSP: 00007fffe020c420  RFLAGS: 00000206
    RAX: 0000000000000017  RBX: ffffffff8100b072  RCX: 00007fffe020c45c
    RDX: 00007f2604a3f120  RSI: 00007f2604a3f140  RDI: 000000000000000d
    RBP: 0000000000000000   R8: 00007fffe020e570   R9: 0101010101010101
    R10: 0000000000000000  R11: 0000000000000246  R12: 00007fffe020e5f0
    R13: 00007fffe020e5f4  R14: 00007f26045f373c  R15: 00007fffe020e5e0
    ORIG_RAX: 0000000000000017  CS: 0033  SS: 002b

Somewhere between the ata_sff_hsm_move() check and the ata_sff_host_intr() check, the value changed.
On examining the other cpus to see what else was running, another cpu was running the error handler
routines:

PID: 326    TASK: ffff881c11014aa0  CPU: 1   COMMAND: "scsi_eh_1"
 #0 [ffff88008ba27e90] crash_nmi_callback at ffffffff8102fee6
 #1 [ffff88008ba27ea0] notifier_call_chain at ffffffff8152d515
 #2 [ffff88008ba27ee0] atomic_notifier_call_chain at ffffffff8152d57a
 #3 [ffff88008ba27ef0] notify_die at ffffffff810a154e
 #4 [ffff88008ba27f20] do_nmi at ffffffff8152b1db
 #5 [ffff88008ba27f50] nmi at ffffffff8152aaa0
    [exception RIP: _spin_lock_irqsave+47]
    RIP: ffffffff8152a1ff  RSP: ffff881c11a73aa0  RFLAGS: 00000006
    RAX: 0000000000000001  RBX: ffff881c1121deb8  RCX: 0000000000000000
    RDX: 0000000000000246  RSI: 0000000000000020  RDI: ffff881c122612d8
    RBP: ffff881c11a73aa0   R8: ffff881c17083800   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: ffff881c1121c000
    R13: 000000000000001f  R14: ffff881c1121dd50  R15: ffff881c1121dc60
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
--- <NMI exception stack> ---
 #6 [ffff881c11a73aa0] _spin_lock_irqsave at ffffffff8152a1ff
 #7 [ffff881c11a73aa8] ata_exec_internal_sg at ffffffff81396fb5
 #8 [ffff881c11a73b58] ata_exec_internal at ffffffff81397109
 #9 [ffff881c11a73bd8] atapi_eh_request_sense at ffffffff813a34eb

Before it tried to acquire a spinlock, ata_exec_internal_sg() called ata_sff_flush_pio_task().
This function will set ap->hsm_task_state to HSM_ST_IDLE, and has no locking around setting this
value. ata_sff_flush_pio_task() can then race with the interrupt handler and potentially set
HSM_ST_IDLE at a fatal moment, which will trigger a kernel BUG.

v2: Fixup comment in ata_sff_flush_pio_task()

tj: Further updated comment.  Use ap->lock instead of shost lock and
    use the [un]lock_irq variant instead of the irqsave/restore one.

Signed-off-by: David Milburn <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected]
jthornber pushed a commit that referenced this issue Feb 23, 2015
resp_rsup_opcodes() may get called from atomic context and would need to
use GFP_ATOMIC for allocations:

[ 1237.913419] BUG: sleeping function called from invalid context at mm/slub.c:1262
[ 1237.914865] in_atomic(): 1, irqs_disabled(): 0, pid: 7556, name: trinity-c311
[ 1237.916142] 3 locks held by trinity-c311/7556:
[ 1237.916981] #0: (sb_writers#5){.+.+.+}, at: do_readv_writev (include/linux/fs.h:2346 fs/read_write.c:844)
[ 1237.919713] #1: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:297)
[ 1237.922626] Mutex: counter: -1 owner: trinity-c311
[ 1237.924044] #2: (s_active#51){.+.+.+}, at: kernfs_fop_write (fs/kernfs/file.c:297)
[ 1237.925960] Preemption disabled blk_execute_rq_nowait (block/blk-exec.c:95)
[ 1237.927416]
[ 1237.927680] CPU: 24 PID: 7556 Comm: trinity-c311 Not tainted 3.19.0-rc4-next-20150116-sasha-00054-g4ad498c-dirty #1744
[ 1237.929603]  ffff8804fc9d8000 ffff8804d9bc3548 ffffffff9d439fb2 0000000000000000
[ 1237.931097]  0000000000000000 ffff8804d9bc3588 ffffffff9a18389a ffff8804d9bc3598
[ 1237.932466]  ffffffff9a1b1715 ffffffffa15935d8 ffffffff9e6f8cb1 00000000000004ee
[ 1237.933984] Call Trace:
[ 1237.934434] dump_stack (lib/dump_stack.c:52)
[ 1237.935323] ___might_sleep (kernel/sched/core.c:7339)
[ 1237.936259] ? mark_held_locks (kernel/locking/lockdep.c:2549)
[ 1237.937293] __might_sleep (kernel/sched/core.c:7305)
[ 1237.938272] __kmalloc (mm/slub.c:1262 mm/slub.c:2419 mm/slub.c:2491 mm/slub.c:3291)
[ 1237.939137] ? resp_rsup_opcodes (include/linux/slab.h:435 drivers/scsi/scsi_debug.c:1689)
[ 1237.940173] resp_rsup_opcodes (include/linux/slab.h:435 drivers/scsi/scsi_debug.c:1689)
[ 1237.941211] ? add_host_store (drivers/scsi/scsi_debug.c:1584)
[ 1237.942261] scsi_debug_queuecommand (drivers/scsi/scsi_debug.c:5276)
[ 1237.943404] ? blk_rq_map_sg (block/blk-merge.c:254)
[ 1237.944398] ? scsi_init_sgtable (drivers/scsi/scsi_lib.c:1095)
[ 1237.945402] sdebug_queuecommand_lock_or_not (drivers/scsi/scsi_debug.c:5300)
[ 1237.946735] scsi_dispatch_cmd (drivers/scsi/scsi_lib.c:1706)
[ 1237.947720] scsi_queue_rq (drivers/scsi/scsi_lib.c:1996)
[ 1237.948687] __blk_mq_run_hw_queue (block/blk-mq.c:816)
[ 1237.949796] blk_mq_run_hw_queue (block/blk-mq.c:896)
[ 1237.950903] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:154 kernel/locking/spinlock.c:183)
[ 1237.951862] blk_mq_insert_request (block/blk-mq.c:1037)
[ 1237.952876] blk_execute_rq_nowait (block/blk-exec.c:95)
[ 1237.953981] ? lockdep_init_map (kernel/locking/lockdep.c:3034)
[ 1237.954967] blk_execute_rq (block/blk-exec.c:131)
[ 1237.955929] ? blk_rq_bio_prep (block/blk-core.c:2835)
[ 1237.956913] scsi_execute (drivers/scsi/scsi_lib.c:252)
[ 1237.957821] scsi_execute_req_flags (drivers/scsi/scsi_lib.c:281)
[ 1237.958968] scsi_report_opcode (drivers/scsi/scsi.c:956)
[ 1237.960009] sd_revalidate_disk (drivers/scsi/sd.c:2707 drivers/scsi/sd.c:2792)
[ 1237.961139] revalidate_disk (fs/block_dev.c:1081)
[ 1237.962223] sd_rescan (drivers/scsi/sd.c:1532)
[ 1237.963142] scsi_rescan_device (drivers/scsi/scsi_scan.c:1579)
[ 1237.964165] store_rescan_field (drivers/scsi/scsi_sysfs.c:672)
[ 1237.965254] dev_attr_store (drivers/base/core.c:138)
[ 1237.966319] sysfs_kf_write (fs/sysfs/file.c:131)
[ 1237.967289] kernfs_fop_write (fs/kernfs/file.c:311)
[ 1237.968274] do_readv_writev (fs/read_write.c:722 fs/read_write.c:854)
[ 1237.969295] ? __acct_update_integrals (kernel/tsacct.c:145)
[ 1237.970452] ? kernfs_fop_open (fs/kernfs/file.c:271)
[ 1237.971505] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:154 kernel/locking/spinlock.c:183)
[ 1237.972512] ? context_tracking_user_exit (include/linux/vtime.h:89 include/linux/jump_label.h:114 include/trace/events/context_tracking.h:47 kernel/context_tracking.c:140)
[ 1237.973668] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2578 kernel/locking/lockdep.c:2625)
[ 1237.974882] ? trace_hardirqs_on (kernel/locking/lockdep.c:2633)
[ 1237.975850] vfs_writev (fs/read_write.c:893)
[ 1237.976691] SyS_writev (fs/read_write.c:926 fs/read_write.c:917)
[ 1237.977538] system_call_fastpath (arch/x86/kernel/entry_64.S:423)

Signed-off-by: Sasha Levin <[email protected]>
Acked-by: Douglas Gilbert <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
jthornber pushed a commit that referenced this issue Feb 23, 2015
Commit e61734c ("cgroup: remove cgroup->name") added two extra
newlines to memcg oom kill log messages.  This makes dmesg hard to read
and parse.  The issue affects 3.15+.

Example:

  Task in /t                          <<< extra #1
   killed as a result of limit of /t
                                      <<< extra #2
  memory: usage 102400kB, limit 102400kB, failcnt 274712

Remove the extra newlines from memcg oom kill messages, so the messages
look like:

  Task in /t killed as a result of limit of /t
  memory: usage 102400kB, limit 102400kB, failcnt 240649

Fixes: e61734c ("cgroup: remove cgroup->name")
Signed-off-by: Greg Thelen <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
jthornber pushed a commit that referenced this issue Feb 23, 2015
Ben Hutchings says:

====================
Fixes for sh_eth #2

I'm continuing review and testing of Ethernet support on the R-Car H2
chip.  This series fixes more of the issues I've found, but it won't be
the last set.

These are not tested on any of the other supported chips.
====================

Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Feb 23, 2015
Nicolas Dichtel says:

====================
netns: audit netdevice creation with IFLA_NET_NS_[PID|FD]

When one of these attributes is set, the netdevice is created into the netns
pointed by IFLA_NET_NS_[PID|FD] (see the call to rtnl_create_link() in
rtnl_newlink()). Let's call this netns the dest_net. After this creation, if the
newlink handler exists, it is called with a netns argument that points to the
netns where the netlink message has been received (called src_net in the code)
which is the link netns.
Hence, with one of these attributes, it's possible to create a x-netns
netdevice.

Here is the result of my code review:
- all ip tunnels (sit, ipip, ip6_tunnels, gre[tap][v6], ip_vti[6]) does not
  really allows to use this feature: the netdevice is created in the dest_net
  and the src_net is completely ignored in the newlink handler.
- VLAN properly handles this x-netns creation.
- bridge ignores src_net, which seems fine (NETIF_F_NETNS_LOCAL is set).
- CAIF subsystem is not clear for me (I don't know how it works), but it seems
  to wrongly use src_net. Patch #1 tries to fix this, but it was done only by
  code review (and only compile-tested), so please carefully review it. I may
  miss something.
- HSR subsystem uses src_net to parse IFLA_HSR_SLAVE[1|2], but the netdevice has
  the flag NETIF_F_NETNS_LOCAL, so the question is: does this netdevice really
  supports x-netns? If not, the newlink handler should use the dest_net instead
  of src_net, I can provide the patch.
- ieee802154 uses also src_net and does not have NETIF_F_NETNS_LOCAL. Same
  question: does this netdevice really supports x-netns?
- bonding ignores src_net and flag NETIF_F_NETNS_LOCAL is set, ie x-netns is not
  supported. Fine.
- CAN does not support rtnl/newlink, ok.
- ipvlan uses src_net and does not have NETIF_F_NETNS_LOCAL. After looking at
  the code, it seems that this drivers support x-netns. Am I right?
- macvlan/macvtap uses src_net and seems to have x-netns support.
- team ignores src_net and has the flag NETIF_F_NETNS_LOCAL, ie x-netns is not
  supported. Ok.
- veth uses src_net and have x-netns support ;-) Ok.
- VXLAN didn't properly handle this. The link netns (vxlan->net) is the src_net
  and not dest_net (see patch #2). Note that it was already possible to create a
  x-netns vxlan before the commit f01ec1c ("vxlan: add x-netns support")
  but the nedevice remains broken.

To summarize:
 - CAIF patch must be carefully reviewed
 - for HSR, ieee802154, ipvlan: is x-netns supported?
====================

Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Jun 10, 2015
This fixes a regression introduced in commit 25fedfc, "KVM: PPC:
Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which
leads to a user-triggerable oops.

In the case where we try to run a vcore on a physical core that is
not in single-threaded mode, or the vcore has too many threads for
the physical core, we iterate the list of runnable vcpus to make
each one return an EBUSY error to userspace.  Since this involves
taking each vcpu off the runnable_threads list for the vcore, we
need to use list_for_each_entry_safe rather than list_for_each_entry
to traverse the list.  Otherwise the kernel will crash with an oops
message like this:

Unable to handle kernel paging request for data at address 0x000fff88
Faulting instruction address: 0xd00000001e635dc8
Oops: Kernel access of bad area, sig: 11 [#2]
SMP NR_CPUS=1024 NUMA PowerNV
...
CPU: 48 PID: 91256 Comm: qemu-system-ppc Tainted: G      D        3.18.0 #1
task: c00000274e507500 ti: c0000027d1924000 task.ti: c0000027d1924000
NIP: d00000001e635dc8 LR: d00000001e635df8 CTR: c00000000011ba50
REGS: c0000027d19275b0 TRAP: 0300   Tainted: G      D         (3.18.0)
MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 22002824  XER: 00000000
CFAR: c000000000008468 DAR: 00000000000fff88 DSISR: 40000000 SOFTE: 1
GPR00: d00000001e635df8 c0000027d1927830 d00000001e64c850 0000000000000001
GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000
GPR08: 0000000000200200 0000000000000000 0000000000000000 d00000001e63e588
GPR12: 0000000000002200 c000000007dbc800 c000000fc7800000 000000000000000a
GPR16: fffffffffffffffc c000000fd5439690 c000000fc7801c98 0000000000000001
GPR20: 0000000000000003 c0000027d1927aa8 c000000fd543b348 c000000fd543b350
GPR24: 0000000000000000 c000000fa57f0000 0000000000000030 0000000000000000
GPR28: fffffffffffffff0 c000000fd543b328 00000000000fe468 c000000fd543b300
NIP [d00000001e635dc8] kvmppc_run_core+0x198/0x17c0 [kvm_hv]
LR [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv]
Call Trace:
[c0000027d1927830] [d00000001e635df8] kvmppc_run_core+0x1c8/0x17c0 [kvm_hv] (unreliable)
[c0000027d1927a30] [d00000001e638350] kvmppc_vcpu_run_hv+0x5b0/0xdd0 [kvm_hv]
[c0000027d1927b70] [d00000001e510504] kvmppc_vcpu_run+0x44/0x60 [kvm]
[c0000027d1927ba0] [d00000001e50d4a4] kvm_arch_vcpu_ioctl_run+0x64/0x170 [kvm]
[c0000027d1927be0] [d00000001e504be8] kvm_vcpu_ioctl+0x5e8/0x7a0 [kvm]
[c0000027d1927d40] [c0000000002d6720] do_vfs_ioctl+0x490/0x780
[c0000027d1927de0] [c0000000002d6ae4] SyS_ioctl+0xd4/0xf0
[c0000027d1927e30] [c000000000009358] syscall_exit+0x0/0x98
Instruction dump:
60000000 60420000 387e1b30 38800003 38a00001 38c00000 480087d9 e8410018
ebde1c98 7fbdf040 3bdee368 419e0048 <813e1b20> 939e1b18 2f890001 409effcc
---[ end trace 8cdf50251cca6680 ]---

Fixes: 25fedfc
Signed-off-by: Paul Mackerras <[email protected]>
Reviewed-by: Alexander Graf <[email protected]>
Signed-off-by: Paolo Bonzini <[email protected]>
jthornber pushed a commit that referenced this issue Jun 10, 2015
Currently in snd_pcm_update_hw_ptr0 during interrupt,
we consider there were double acknowledged interrupts when:
1. HW reported pointer is smaller than expected, and
2. Time from last update time (hdelta) is over half a buffer time.

However, when HW reported pointer is only a few bytes smaller than
expected, and when hdelta is just a little larger than half a buffer time
(e.g. ping-pong buffer), it wrongly treats this IRQ as double acknowledged.

The condition #2 uses jiffies, but jiffies is not high resolution
since it is integer. We should consider jiffies inaccuracy.

Signed-off-by: Koro Chen <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
jthornber pushed a commit that referenced this issue Jun 11, 2015
Perf top raise a warning if a kernel sample is collected but kernel map
is restricted. The warning message needs to dereference al.map->dso...

However, previous perf_event__preprocess_sample() doesn't always
guarantee al.map != NULL, for example, when kernel map is restricted.

This patch validates al.map before dereferencing, avoid the segfault.

Before this patch:

 $ cat /proc/sys/kernel/kptr_restrict
 1
 $ perf top -p  120183
 perf: Segmentation fault
 -------- backtrace --------
 /path/to/perf[0x509868]
 /lib64/libc.so.6(+0x3545f)[0x7f9a1540045f]
 /path/to/perf[0x448820]
 /path/to/perf(cmd_top+0xe3c)[0x44a5dc]
 /path/to/perf[0x4766a2]
 /path/to/perf(main+0x5f5)[0x42e545]
 /lib64/libc.so.6(__libc_start_main+0xf4)[0x7f9a153ecbd4]
 /path/to/perf[0x42e674]

And gdb call trace:

 Program received signal SIGSEGV, Segmentation fault.
 perf_event__process_sample (machine=0xa44030, sample=0x7fffffffa4c0, evsel=0xa43b00, event=0x7ffff41c3000, tool=0x7fffffffa8a0)
    at builtin-top.c:736
 736				  !RB_EMPTY_ROOT(&al.map->dso->symbols[MAP__FUNCTION]) ?
 (gdb) bt
 #0  perf_event__process_sample (machine=0xa44030, sample=0x7fffffffa4c0, evsel=0xa43b00, event=0x7ffff41c3000, tool=0x7fffffffa8a0)
     at builtin-top.c:736
 #1  perf_top__mmap_read_idx (top=top@entry=0x7fffffffa8a0, idx=idx@entry=0) at builtin-top.c:855
 #2  0x000000000044a5dd in perf_top__mmap_read (top=0x7fffffffa8a0) at builtin-top.c:872
 #3  __cmd_top (top=0x7fffffffa8a0) at builtin-top.c:997
 #4  cmd_top (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>) at builtin-top.c:1267
 #5  0x00000000004766a3 in run_builtin (p=p@entry=0x8a6ce8 <commands+264>, argc=argc@entry=3, argv=argv@entry=0x7fffffffdf70)
      at perf.c:371
 #6  0x000000000042e546 in handle_internal_command (argv=0x7fffffffdf70, argc=3) at perf.c:430
 #7  run_argv (argv=0x7fffffffdcf0, argcp=0x7fffffffdcfc) at perf.c:474
 #8  main (argc=3, argv=0x7fffffffdf70) at perf.c:589
 (gdb)

Signed-off-by: Wang Nan <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: David Ahern <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Zefan Li <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
jthornber pushed a commit that referenced this issue Jun 11, 2015
Commit 0223334 ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
before testing clone->q->mq_ops.  It was an oversight to discontinue
that check for 1 of the 2 use-cases for free_rq_clone():
1) free_rq_clone() called when an unmapped original request is requeued
2) free_rq_clone() called in the request-based IO completion path

The clone->q check made sense for case #1 but not for #2.  However, we
cannot just reinstate the check as it'd mask a serious bug in the IO
completion case #2 -- no in-flight request should have an uninitialized
request_queue (basic block layer refcounting _should_ ensure this).

The NULL pointer seen for case #1 is detailed here:
https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html

Fix this free_rq_clone() NULL pointer by simply checking if the
mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
blk-mq) rather than checking clone->q->mq_ops.  This avoids the need to
dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
uninitialized clone request is being completed.

Reported-by: Bart Van Assche <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
jthornber pushed a commit that referenced this issue Jun 11, 2015
…/stblinux into fixes

Merge "MAINTAINERS update for Broadcom SoCs for 4.1 #2" from Florian Fainelli:

This pull request contains 3 changes to the MAINTAINERS file for Broadcom SoCs:

- add Ray and Scott for mach-bcm
- remove Christian for mach-bcm
- remove Marc for brcmstb

* tag 'arm-soc/for-4.1/maintainers' of http://github.com/broadcom/stblinux:
  MAINTAINERS: Update brcmstb entry
  MAINTAINERS: Remove Christian Daudt for mach-bcm
  MAINTAINERS: Update mach-bcm maintainers list
jthornber pushed a commit that referenced this issue Aug 31, 2015
In dev_queue_xmit() net_cls protected with rcu-bh.

[  270.730026] ===============================
[  270.730029] [ INFO: suspicious RCU usage. ]
[  270.730033] 4.2.0-rc3+ #2 Not tainted
[  270.730036] -------------------------------
[  270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage!
[  270.730041] other info that might help us debug this:
[  270.730043] rcu_scheduler_active = 1, debug_locks = 1
[  270.730045] 2 locks held by dhclient/748:
[  270.730046]  #0:  (rcu_read_lock_bh){......}, at: [<ffffffff81682b70>] __dev_queue_xmit+0x50/0x960
[  270.730085]  #1:  (&qdisc_tx_lock){+.....}, at: [<ffffffff81682d60>] __dev_queue_xmit+0x240/0x960
[  270.730090] stack backtrace:
[  270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2
[  270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[  270.730100]  0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007
[  270.730103]  ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00
[  270.730105]  ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8
[  270.730108] Call Trace:
[  270.730121]  [<ffffffff817ad487>] dump_stack+0x4c/0x65
[  270.730148]  [<ffffffff810ca4f2>] lockdep_rcu_suspicious+0xe2/0x120
[  270.730153]  [<ffffffff816a62d2>] task_cls_state+0x92/0xa0
[  270.730158]  [<ffffffffa00b534f>] cls_cgroup_classify+0x4f/0x120 [cls_cgroup]
[  270.730164]  [<ffffffff816aac74>] tc_classify_compat+0x74/0xc0
[  270.730166]  [<ffffffff816ab573>] tc_classify+0x33/0x90
[  270.730170]  [<ffffffffa00bcb0a>] htb_enqueue+0xaa/0x4a0 [sch_htb]
[  270.730172]  [<ffffffff81682e26>] __dev_queue_xmit+0x306/0x960
[  270.730174]  [<ffffffff81682b70>] ? __dev_queue_xmit+0x50/0x960
[  270.730176]  [<ffffffff816834a3>] dev_queue_xmit_sk+0x13/0x20
[  270.730185]  [<ffffffff81787770>] dev_queue_xmit+0x10/0x20
[  270.730187]  [<ffffffff8178b91c>] packet_snd.isra.62+0x54c/0x760
[  270.730190]  [<ffffffff8178be25>] packet_sendmsg+0x2f5/0x3f0
[  270.730203]  [<ffffffff81665245>] ? sock_def_readable+0x5/0x190
[  270.730210]  [<ffffffff817b64bb>] ? _raw_spin_unlock+0x2b/0x40
[  270.730216]  [<ffffffff8173bcbc>] ? unix_dgram_sendmsg+0x5cc/0x640
[  270.730219]  [<ffffffff8165f367>] sock_sendmsg+0x47/0x50
[  270.730221]  [<ffffffff8165f42f>] sock_write_iter+0x7f/0xd0
[  270.730232]  [<ffffffff811fd4c7>] __vfs_write+0xa7/0xf0
[  270.730234]  [<ffffffff811fe5b8>] vfs_write+0xb8/0x190
[  270.730236]  [<ffffffff811fe8c2>] SyS_write+0x52/0xb0
[  270.730239]  [<ffffffff817b6bae>] entry_SYSCALL_64_fastpath+0x12/0x76

Signed-off-by: Konstantin Khlebnikov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Aug 31, 2015
The cgroup attaches inode->i_wb via mark_inode_dirty and when set_page_writeback
is called, __inc_wb_stat() updates i_wb's stat.

So, we need to explicitly call set_page_dirty->__mark_inode_dirty in prior to
any writebacking pages.

This patch should resolve the following kernel panic reported by Andreas Reis.

https://bugzilla.kernel.org/show_bug.cgi?id=101801

--- Comment #2 from Andreas Reis <[email protected]> ---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
PGD 2951ff067 PUD 2df43f067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in:
CPU: 7 PID: 10356 Comm: gcc Tainted: G        W       4.2.0-1-cu #1
Hardware name: Gigabyte Technology Co., Ltd. G1.Sniper M5/G1.Sniper M5, BIOS
T01 02/03/2015
task: ffff880295044f80 ti: ffff880295140000 task.ti: ffff880295140000
RIP: 0010:[<ffffffff8149deea>]  [<ffffffff8149deea>]
__percpu_counter_add+0x1a/0x90
RSP: 0018:ffff880295143ac8  EFLAGS: 00010082
RAX: 0000000000000003 RBX: ffffea000a526d40 RCX: 0000000000000001
RDX: 0000000000000020 RSI: 0000000000000001 RDI: 0000000000000088
RBP: ffff880295143ae8 R08: 0000000000000000 R09: ffff88008f69bb30
R10: 00000000fffffffa R11: 0000000000000000 R12: 0000000000000088
R13: 0000000000000001 R14: ffff88041d099000 R15: ffff880084a205d0
FS:  00007f8549374700(0000) GS:ffff88042f3c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a8 CR3: 000000033e1d5000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 0000000000000000 ffffea000a526d40 ffff880084a20738 ffff880084a20750
 ffff880295143b48 ffffffff811cc91e ffff880000000000 0000000000000296
 0000000000000000 ffff880417090198 0000000000000000 ffffea000a526d40
Call Trace:
 [<ffffffff811cc91e>] __test_set_page_writeback+0xde/0x1d0
 [<ffffffff813fee87>] do_write_data_page+0xe7/0x3a0
 [<ffffffff813faeea>] gc_data_segment+0x5aa/0x640
 [<ffffffff813fb0b8>] do_garbage_collect+0x138/0x150
 [<ffffffff813fb3fe>] f2fs_gc+0x1be/0x3e0
 [<ffffffff81405541>] f2fs_balance_fs+0x81/0x90
 [<ffffffff813ee357>] f2fs_unlink+0x47/0x1d0
 [<ffffffff81239329>] vfs_unlink+0x109/0x1b0
 [<ffffffff8123e3d7>] do_unlinkat+0x287/0x2c0
 [<ffffffff8123ebc6>] SyS_unlink+0x16/0x20
 [<ffffffff81942e2e>] entry_SYSCALL_64_fastpath+0x12/0x71
Code: 41 5e 5d c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 49
89 f5 41 54 49 89 fc 53 48 83 ec 08 65 ff 05 e6 d9 b6 7e <48> 8b 47 20 48 63 ca
65 8b 18 48 63 db 48 01 f3 48 39 cb 7d 0a
RIP  [<ffffffff8149deea>] __percpu_counter_add+0x1a/0x90
 RSP <ffff880295143ac8>
CR2: 00000000000000a8
---[ end trace 5132449a58ed93a3 ]---
note: gcc[10356] exited with preempt_count 2

Signed-off-by: Jaegeuk Kim <[email protected]>
jthornber pushed a commit that referenced this issue Aug 31, 2015
Hayes Wang says:

====================
r8152: issues fix

v2:
Replace patch #2 with "r8152: fix wakeup settings".

v1:
These patches are used to fix issues.
====================

Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Aug 31, 2015
Hayes Wang says:

====================
r8152: device reset

v3:
For patch #2, remove cancel_delayed_work().

v2:
For patch #1, remove usb_autopm_get_interface(), usb_autopm_put_interface(), and
the checking of intf->condition.

For patch #2, replace the original method with usb_queue_reset_device() to reset
the device.

v1:
Although the driver works normally, we find the device may get all 0xff data when
transmitting packets on certain platforms. It would break the device and no packet
could be transmitted. The reset is necessary to recover the hw for this situation.
====================

Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Aug 31, 2015
Nikolay has reported a hang when a memcg reclaim got stuck with the
following backtrace:

PID: 18308  TASK: ffff883d7c9b0a30  CPU: 1   COMMAND: "rsync"
  #0 __schedule at ffffffff815ab152
  #1 schedule at ffffffff815ab76e
  #2 schedule_timeout at ffffffff815ae5e5
  #3 io_schedule_timeout at ffffffff815aad6a
  #4 bit_wait_io at ffffffff815abfc6
  #5 __wait_on_bit at ffffffff815abda5
  #6 wait_on_page_bit at ffffffff8111fd4f
  #7 shrink_page_list at ffffffff81135445
  #8 shrink_inactive_list at ffffffff81135845
  #9 shrink_lruvec at ffffffff81135ead
 #10 shrink_zone at ffffffff811360c3
 #11 shrink_zones at ffffffff81136eff
 #12 do_try_to_free_pages at ffffffff8113712f
 #13 try_to_free_mem_cgroup_pages at ffffffff811372be
 #14 try_charge at ffffffff81189423
 #15 mem_cgroup_try_charge at ffffffff8118c6f5
 #16 __add_to_page_cache_locked at ffffffff8112137d
 #17 add_to_page_cache_lru at ffffffff81121618
 #18 pagecache_get_page at ffffffff8112170b
 #19 grow_dev_page at ffffffff811c8297
 #20 __getblk_slow at ffffffff811c91d6
 #21 __getblk_gfp at ffffffff811c92c1
 #22 ext4_ext_grow_indepth at ffffffff8124565c
 #23 ext4_ext_create_new_leaf at ffffffff81246ca8
 #24 ext4_ext_insert_extent at ffffffff81246f09
 #25 ext4_ext_map_blocks at ffffffff8124a848
 #26 ext4_map_blocks at ffffffff8121a5b7
 #27 mpage_map_one_extent at ffffffff8121b1fa
 #28 mpage_map_and_submit_extent at ffffffff8121f07b
 #29 ext4_writepages at ffffffff8121f6d5
 #30 do_writepages at ffffffff8112c490
 #31 __filemap_fdatawrite_range at ffffffff81120199
 #32 filemap_flush at ffffffff8112041c
 #33 ext4_alloc_da_blocks at ffffffff81219da1
 #34 ext4_rename at ffffffff81229b91
 #35 ext4_rename2 at ffffffff81229e32
 #36 vfs_rename at ffffffff811a08a5
 #37 SYSC_renameat2 at ffffffff811a3ffc
 #38 sys_renameat2 at ffffffff811a408e
 #39 sys_rename at ffffffff8119e51e
 #40 system_call_fastpath at ffffffff815afa89

Dave Chinner has properly pointed out that this is a deadlock in the
reclaim code because ext4 doesn't submit pages which are marked by
PG_writeback right away.

The heuristic was introduced by commit e62e384 ("memcg: prevent OOM
with too many dirty pages") and it was applied only when may_enter_fs
was specified.  The code has been changed by c3b94f4 ("memcg:
further prevent OOM with too many dirty pages") which has removed the
__GFP_FS restriction with a reasoning that we do not get into the fs
code.  But this is not sufficient apparently because the fs doesn't
necessarily submit pages marked PG_writeback for IO right away.

ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
submit the bio.  Instead it tries to map more pages into the bio and
mpage_map_one_extent might trigger memcg charge which might end up
waiting on a page which is marked PG_writeback but hasn't been submitted
yet so we would end up waiting for something that never finishes.

Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
before we go to wait on the writeback.  The page fault path, which is
the only path that triggers memcg oom killer since 3.12, shouldn't
require GFP_NOFS and so we shouldn't reintroduce the premature OOM
killer issue which was originally addressed by the heuristic.

As per David Chinner the xfs is doing similar thing since 2.6.15 already
so ext4 is not the only affected filesystem.  Moreover he notes:

: For example: IO completion might require unwritten extent conversion
: which executes filesystem transactions and GFP_NOFS allocations. The
: writeback flag on the pages can not be cleared until unwritten
: extent conversion completes. Hence memory reclaim cannot wait on
: page writeback to complete in GFP_NOFS context because it is not
: safe to do so, memcg reclaim or otherwise.

Cc: [email protected] # 3.9+
[[email protected]: corrected the control flow]
Fixes: c3b94f4 ("memcg: further prevent OOM with too many dirty pages")
Reported-by: Nikolay Borisov <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
Signed-off-by: Hugh Dickins <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
jthornber pushed a commit that referenced this issue Aug 31, 2015
The shm implementation internally uses shmem or hugetlbfs inodes for shm
segments.  As these inodes are never directly exposed to userspace and
only accessed through the shm operations which are already hooked by
security modules, mark the inodes with the S_PRIVATE flag so that inode
security initialization and permission checking is skipped.

This was motivated by the following lockdep warning:

  ======================================================
   [ INFO: possible circular locking dependency detected ]
   4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G        W
  -------------------------------------------------------
   httpd/1597 is trying to acquire lock:
   (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
   but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
   which lock already depends on the new lock.
   the existing dependency chain (in reverse order) is:
   -> #3 (&mm->mmap_sem){++++++}:
        lock_acquire+0xc7/0x270
        __might_fault+0x7a/0xa0
        filldir+0x9e/0x130
        xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
        xfs_readdir+0x1b4/0x330 [xfs]
        xfs_file_readdir+0x2b/0x30 [xfs]
        iterate_dir+0x97/0x130
        SyS_getdents+0x91/0x120
        entry_SYSCALL_64_fastpath+0x12/0x76
   -> #2 (&xfs_dir_ilock_class){++++.+}:
        lock_acquire+0xc7/0x270
        down_read_nested+0x57/0xa0
        xfs_ilock+0x167/0x350 [xfs]
        xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
        xfs_attr_get+0xbd/0x190 [xfs]
        xfs_xattr_get+0x3d/0x70 [xfs]
        generic_getxattr+0x4f/0x70
        inode_doinit_with_dentry+0x162/0x670
        sb_finish_set_opts+0xd9/0x230
        selinux_set_mnt_opts+0x35c/0x660
        superblock_doinit+0x77/0xf0
        delayed_superblock_init+0x10/0x20
        iterate_supers+0xb3/0x110
        selinux_complete_init+0x2f/0x40
        security_load_policy+0x103/0x600
        sel_write_load+0xc1/0x750
        __vfs_write+0x37/0x100
        vfs_write+0xa9/0x1a0
        SyS_write+0x58/0xd0
        entry_SYSCALL_64_fastpath+0x12/0x76
  ...

Signed-off-by: Stephen Smalley <[email protected]>
Reported-by: Morten Stevens <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Acked-by: Paul Moore <[email protected]>
Cc: Manfred Spraul <[email protected]>
Cc: Davidlohr Bueso <[email protected]>
Cc: Prarit Bhargava <[email protected]>
Cc: Eric Paris <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
jthornber pushed a commit that referenced this issue Jul 20, 2016
We must not attempt to send WMI packets while holding the data-lock,
as it may deadlock:

BUG: sleeping function called from invalid context at drivers/net/wireless/ath/ath10k/wmi.c:1824
in_atomic(): 1, irqs_disabled(): 0, pid: 2878, name: wpa_supplicant

=============================================
[ INFO: possible recursive locking detected ]
4.4.6+ #21 Tainted: G        W  O
---------------------------------------------
wpa_supplicant/2878 is trying to acquire lock:
 (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]

but task is already holding lock:
 (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&ar->data_lock)->rlock);
  lock(&(&ar->data_lock)->rlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

4 locks held by wpa_supplicant/2878:
 #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816493ca>] rtnl_lock+0x12/0x14
 #1:  (&ar->conf_mutex){+.+.+.}, at: [<ffffffffa0706932>] ath10k_add_interface+0x3b/0xbda [ath10k_core]
 #2:  (&(&ar->data_lock)->rlock){+.-...}, at: [<ffffffffa070251b>] ath10k_peer_create+0x122/0x1ae [ath10k_core]
 #3:  (rcu_read_lock){......}, at: [<ffffffffa062f304>] rcu_read_lock+0x0/0x66 [mac80211]

stack backtrace:
CPU: 3 PID: 2878 Comm: wpa_supplicant Tainted: G        W  O    4.4.6+ #21
Hardware name: To be filled by O.E.M. To be filled by O.E.M./ChiefRiver, BIOS 4.6.5 06/07/2013
 0000000000000000 ffff8801fcadf8f0 ffffffff8137086d ffffffff82681720
 ffffffff82681720 ffff8801fcadf9b0 ffffffff8112e3be ffff8801fcadf920
 0000000100000000 ffffffff82681720 ffffffffa0721500 ffff8801fcb8d348
Call Trace:
 [<ffffffff8137086d>] dump_stack+0x81/0xb6
 [<ffffffff8112e3be>] __lock_acquire+0xc5b/0xde7
 [<ffffffffa0721500>] ? ath10k_wmi_tx_beacons_iter+0x15/0x11a [ath10k_core]
 [<ffffffff8112d0d0>] ? mark_lock+0x24/0x201
 [<ffffffff8112e908>] lock_acquire+0x132/0x1cb
 [<ffffffff8112e908>] ? lock_acquire+0x132/0x1cb
 [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
 [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
 [<ffffffff816f9e2b>] _raw_spin_lock_bh+0x31/0x40
 [<ffffffffa0721511>] ? ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
 [<ffffffffa0721511>] ath10k_wmi_tx_beacons_iter+0x26/0x11a [ath10k_core]
 [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
 [<ffffffffa062eb18>] __iterate_interfaces+0x9d/0x13d [mac80211]
 [<ffffffffa062f609>] ieee80211_iterate_active_interfaces_atomic+0x32/0x3e [mac80211]
 [<ffffffffa07214eb>] ? ath10k_wmi_cmd_send_nowait+0x1ce/0x1ce [ath10k_core]
 [<ffffffffa071fa9f>] ath10k_wmi_tx_beacons_nowait.isra.13+0x14/0x16 [ath10k_core]
 [<ffffffffa0721676>] ath10k_wmi_cmd_send+0x71/0x242 [ath10k_core]
 [<ffffffffa07023f6>] ath10k_wmi_peer_delete+0x3f/0x42 [ath10k_core]
 [<ffffffffa0702557>] ath10k_peer_create+0x15e/0x1ae [ath10k_core]
 [<ffffffffa0707004>] ath10k_add_interface+0x70d/0xbda [ath10k_core]
 [<ffffffffa05fffcc>] drv_add_interface+0x123/0x1a5 [mac80211]
 [<ffffffffa061554b>] ieee80211_do_open+0x351/0x667 [mac80211]
 [<ffffffffa06158aa>] ieee80211_open+0x49/0x4c [mac80211]
 [<ffffffff8163ecf9>] __dev_open+0x88/0xde
 [<ffffffff8163ef6e>] __dev_change_flags+0xa4/0x13a
 [<ffffffff8163f023>] dev_change_flags+0x1f/0x54
 [<ffffffff816a5532>] devinet_ioctl+0x2b9/0x5c9
 [<ffffffff816514dd>] ? copy_to_user+0x32/0x38
 [<ffffffff816a6115>] inet_ioctl+0x81/0x9d
 [<ffffffff816a6115>] ? inet_ioctl+0x81/0x9d
 [<ffffffff81621cf8>] sock_do_ioctl+0x20/0x3d
 [<ffffffff816223c4>] sock_ioctl+0x222/0x22e
 [<ffffffff8121cf95>] do_vfs_ioctl+0x453/0x4d7
 [<ffffffff81625603>] ? __sys_recvmsg+0x4c/0x5b
 [<ffffffff81225af1>] ? __fget_light+0x48/0x6c
 [<ffffffff8121d06b>] SyS_ioctl+0x52/0x74
 [<ffffffff816fa736>] entry_SYSCALL_64_fastpath+0x16/0x7a

Signed-off-by: Ben Greear <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>
jthornber pushed a commit that referenced this issue Jul 20, 2016
Nicolas Dichtel says:

====================
ovs: fix rtnl notifications on interface deletion

There was no rtnl notifications for interfaces (gre, vxlan, geneve) created
by ovs. This problem is fixed by adjusting the creation path.

v1 -> v2:
 - add patch #1 and #4
 - rework error handling in patch #2
====================

Acked-by: Pravin B Shelar <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Jul 20, 2016
When run tipcTS&tipcTC test suite, the following complaint appears:

[   56.926168] ===============================
[   56.926169] [ INFO: suspicious RCU usage. ]
[   56.926171] 4.7.0-rc1+ #160 Not tainted
[   56.926173] -------------------------------
[   56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() usage!
[   56.926175]
[   56.926175] other info that might help us debug this:
[   56.926175]
[   56.926177]
[   56.926177] rcu_scheduler_active = 1, debug_locks = 1
[   56.926179] 3 locks held by swapper/4/0:
[   56.926180]  #0:  (((&req->timer))){+.-...}, at: [<ffffffff810e79b5>] call_timer_fn+0x5/0x340
[   56.926203]  #1:  (&(&req->lock)->rlock){+.-...}, at: [<ffffffffa000c29b>] disc_timeout+0x1b/0xd0 [tipc]
[   56.926212]  #2:  (rcu_read_lock){......}, at: [<ffffffffa00055e0>] tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
[   56.926218]
[   56.926218] stack backtrace:
[   56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ #160
[   56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   56.926224]  0000000000000000 ffff880016803d28 ffffffff813c4423 ffff8800154252c0
[   56.926227]  0000000000000001 ffff880016803d58 ffffffff810b7512 ffff8800124d8120
[   56.926230]  ffff880013f8a160 ffff8800132b5ccc ffff8800124d8120 ffff880016803d88
[   56.926234] Call Trace:
[   56.926235]  <IRQ>  [<ffffffff813c4423>] dump_stack+0x67/0x94
[   56.926250]  [<ffffffff810b7512>] lockdep_rcu_suspicious+0xe2/0x120
[   56.926256]  [<ffffffffa00051f1>] tipc_l2_send_msg+0x131/0x1c0 [tipc]
[   56.926261]  [<ffffffffa000567c>] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc]
[   56.926266]  [<ffffffffa00055e0>] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
[   56.926273]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926278]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926283]  [<ffffffffa000c2d6>] disc_timeout+0x56/0xd0 [tipc]
[   56.926288]  [<ffffffff810e7a68>] call_timer_fn+0xb8/0x340
[   56.926291]  [<ffffffff810e79b5>] ? call_timer_fn+0x5/0x340
[   56.926296]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926300]  [<ffffffff810e8f4a>] run_timer_softirq+0x23a/0x390
[   56.926306]  [<ffffffff810f89ff>] ? clockevents_program_event+0x7f/0x130
[   56.926316]  [<ffffffff819727c3>] __do_softirq+0xc3/0x4a2
[   56.926323]  [<ffffffff8106ba5a>] irq_exit+0x8a/0xb0
[   56.926327]  [<ffffffff81972456>] smp_apic_timer_interrupt+0x46/0x60
[   56.926331]  [<ffffffff81970a49>] apic_timer_interrupt+0x89/0x90
[   56.926333]  <EOI>  [<ffffffff81027fda>] ? default_idle+0x2a/0x1a0
[   56.926340]  [<ffffffff81027fd8>] ? default_idle+0x28/0x1a0
[   56.926342]  [<ffffffff810289cf>] arch_cpu_idle+0xf/0x20
[   56.926345]  [<ffffffff810adf0f>] default_idle_call+0x2f/0x50
[   56.926347]  [<ffffffff810ae145>] cpu_startup_entry+0x215/0x3e0
[   56.926353]  [<ffffffff81040ad9>] start_secondary+0xf9/0x100

The warning appears as rtnl_dereference() is wrongly used in
tipc_l2_send_msg() under RCU read lock protection. Instead the proper
usage should be that rcu_dereference_rtnl() is called here.

Fixes: 5b7066c ("tipc: stricter filtering of packets in bearer layer")
Acked-by: Jon Maloy <[email protected]>
Signed-off-by: Ying Xue <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
jthornber pushed a commit that referenced this issue Jul 20, 2016
While testing the deadline scheduler + cgroup setup I hit this
warning.

[  132.612935] ------------[ cut here ]------------
[  132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80
[  132.612952] Modules linked in: (a ton of modules...)
[  132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2
[  132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
[  132.612982]  0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e
[  132.612984]  0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b
[  132.612985]  00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00
[  132.612986] Call Trace:
[  132.612987]  <IRQ>  [<ffffffff813d229e>] dump_stack+0x63/0x85
[  132.612994]  [<ffffffff810a652b>] __warn+0xcb/0xf0
[  132.612997]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
[  132.612999]  [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20
[  132.613000]  [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80
[  132.613008]  [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20
[  132.613010]  [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10
[  132.613015]  [<ffffffff811388ac>] put_css_set+0x5c/0x60
[  132.613016]  [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0
[  132.613017]  [<ffffffff810a3912>] __put_task_struct+0x42/0x140
[  132.613018]  [<ffffffff810e776a>] dl_task_timer+0xca/0x250
[  132.613027]  [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170
[  132.613030]  [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270
[  132.613031]  [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190
[  132.613034]  [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60
[  132.613035]  [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50
[  132.613037]  [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0
[  132.613038]  <EOI>  [<ffffffff81063466>] ? native_safe_halt+0x6/0x10
[  132.613043]  [<ffffffff81037a4e>] default_idle+0x1e/0xd0
[  132.613044]  [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20
[  132.613046]  [<ffffffff810e8fda>] default_idle_call+0x2a/0x40
[  132.613047]  [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340
[  132.613048]  [<ffffffff81050235>] start_secondary+0x155/0x190
[  132.613049] ---[ end trace f91934d162ce9977 ]---

The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt
context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid
this problem - and other problems of sharing a spinlock with an
interrupt.

Cc: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: [email protected]
Cc: [email protected] # 4.5+
Cc: [email protected]
Reviewed-by: Rik van Riel <[email protected]>
Reviewed-by: "Luis Claudio R. Goncalves" <[email protected]>
Signed-off-by: Daniel Bristot de Oliveira <[email protected]>
Acked-by: Zefan Li <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
jthornber pushed a commit that referenced this issue Jul 20, 2016
An ENOMEM when creating a pair tty in tty_ldisc_setup causes a null
pointer dereference in devpts_kill_index because tty->link->driver_data
is NULL.  The oops was triggered with the pty stressor in stress-ng when
in a low memory condition.

tty_init_dev tries to clean up a tty_ldisc_setup ENOMEM error by calling
release_tty, however, this ultimately tries to clean up the NULL pair'd
tty in pty_unix98_remove, triggering the Oops.

Add check to pty_unix98_remove to only clean up fsi if it is not NULL.

Ooops:

[   23.020961] Oops: 0000 [#1] SMP
[   23.020976] Modules linked in: ppdev snd_hda_codec_generic snd_hda_intel snd_hda_codec parport_pc snd_hda_core snd_hwdep parport snd_pcm input_leds joydev snd_timer serio_raw snd soundcore i2c_piix4 mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel qxl aes_x86_64 ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect psmouse sysimgblt floppy fb_sys_fops drm pata_acpi jitterentropy_rng drbg ansi_cprng
[   23.020978] CPU: 0 PID: 1452 Comm: stress-ng-pty Not tainted 4.7.0-rc4+ #2
[   23.020978] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[   23.020979] task: ffff88007ba30000 ti: ffff880078ea8000 task.ti: ffff880078ea8000
[   23.020981] RIP: 0010:[<ffffffff813f11ff>]  [<ffffffff813f11ff>] ida_remove+0x1f/0x120
[   23.020981] RSP: 0018:ffff880078eabb60  EFLAGS: 00010a03
[   23.020982] RAX: 4444444444444567 RBX: 0000000000000000 RCX: 000000000000001f
[   23.020982] RDX: 000000000000014c RSI: 000000000000026f RDI: 0000000000000000
[   23.020982] RBP: ffff880078eabb70 R08: 0000000000000004 R09: 0000000000000036
[   23.020983] R10: 000000000000026f R11: 0000000000000000 R12: 000000000000026f
[   23.020983] R13: 000000000000026f R14: ffff88007c944b40 R15: 000000000000026f
[   23.020984] FS:  00007f9a2f3cc700(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[   23.020984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.020985] CR2: 0000000000000010 CR3: 000000006c81b000 CR4: 00000000001406f0
[   23.020988] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.020988] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.020988] Stack:
[   23.020989]  0000000000000000 000000000000026f ffff880078eabb90 ffffffff812a5a99
[   23.020990]  0000000000000000 00000000fffffff4 ffff880078eabba8 ffffffff814f9cbe
[   23.020991]  ffff88007965c800 ffff880078eabbc8 ffffffff814eef43 fffffffffffffff4
[   23.020991] Call Trace:
[   23.021000]  [<ffffffff812a5a99>] devpts_kill_index+0x29/0x50
[   23.021002]  [<ffffffff814f9cbe>] pty_unix98_remove+0x2e/0x50
[   23.021006]  [<ffffffff814eef43>] release_tty+0xb3/0x1b0
[   23.021007]  [<ffffffff814f18d4>] tty_init_dev+0xd4/0x1c0
[   23.021011]  [<ffffffff814f9fae>] ptmx_open+0xae/0x190
[   23.021013]  [<ffffffff812254ef>] chrdev_open+0xbf/0x1b0
[   23.021015]  [<ffffffff8121d973>] do_dentry_open+0x203/0x310
[   23.021016]  [<ffffffff81225430>] ? cdev_put+0x30/0x30
[   23.021017]  [<ffffffff8121ee44>] vfs_open+0x54/0x80
[   23.021018]  [<ffffffff8122b8fc>] ? may_open+0x8c/0x100
[   23.021019]  [<ffffffff8122f26b>] path_openat+0x2eb/0x1440
[   23.021020]  [<ffffffff81230534>] ? putname+0x54/0x60
[   23.021022]  [<ffffffff814f6f97>] ? n_tty_ioctl_helper+0x27/0x100
[   23.021023]  [<ffffffff81231651>] do_filp_open+0x91/0x100
[   23.021024]  [<ffffffff81230596>] ? getname_flags+0x56/0x1f0
[   23.021026]  [<ffffffff8123fc66>] ? __alloc_fd+0x46/0x190
[   23.021027]  [<ffffffff8121f1e4>] do_sys_open+0x124/0x210
[   23.021028]  [<ffffffff8121f2ee>] SyS_open+0x1e/0x20
[   23.021035]  [<ffffffff81845576>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[   23.021044] Code: 63 28 45 31 e4 eb dd 0f 1f 44 00 00 55 4c 63 d6 48 ba 89 88 88 88 88 88 88 88 4c 89 d0 b9 1f 00 00 00 48 f7 e2 48 89 e5 41 54 53 <8b> 47 10 48 89 fb 8d 3c c5 00 00 00 00 48 c1 ea 09 b8 01 00 00
[   23.021045] RIP  [<ffffffff813f11ff>] ida_remove+0x1f/0x120
[   23.021045]  RSP <ffff880078eabb60>
[   23.021046] CR2: 0000000000000010

Signed-off-by: Colin Ian King <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jthornber pushed a commit that referenced this issue Jul 20, 2016
Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.

Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.

Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.

This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()

  Bad kernel stack pointer 3fffcfa1a370 at c000000000009980
  Oops: Bad kernel stack pointer, sig: 6 [#1]
  CPU: 0 PID: 2006 Comm: tm-execed Not tainted
  NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000
  REGS: c00000003ffefd40 TRAP: 0700   Not tainted
  MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]>  CR: 00000000  XER: 00000000
  CFAR: c0000000000098b4 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000
  GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000
  NIP [c000000000009980] fast_exception_return+0xb0/0xb8
  LR [0000000000000000]           (null)
  Call Trace:
  Instruction dump:
  f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070
  e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b

  Kernel BUG at c000000000043e80 [verbose debug info unavailable]
  Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033)
  Oops: Unrecoverable exception, sig: 6 [#2]
  CPU: 0 PID: 2006 Comm: tm-execed Tainted: G      D
  task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000
  NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000
  REGS: c00000003ffef7e0 TRAP: 0700   Tainted: G      D
  MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]>  CR: 28002828  XER: 00000000
  CFAR: c000000000015a20 SOFTE: 0
  PACATMSCRATCH: b00000010000d033
  GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000
  GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000
  GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004
  GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000
  GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000
  GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80
  NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c
  LR [c000000000015a24] __switch_to+0x1f4/0x420
  Call Trace:
  Instruction dump:
  7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
  4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020

This fixes CVE-2016-5828.

Fixes: bc2a940 ("powerpc: Hook in new transactional memory code")
Cc: [email protected] # v3.9+
Signed-off-by: Cyril Bur <[email protected]>
Signed-off-by: Michael Ellerman <[email protected]>
jthornber pushed a commit that referenced this issue Jul 20, 2016
The USB core contains a bug that can show up when a USB-3 host
controller is removed.  If the primary (USB-2) hcd structure is
released before the shared (USB-3) hcd, the core will try to do a
double-free of the common bandwidth_mutex.

The problem was described in graphical form by Chung-Geol Kim, who
first reported it:

=================================================
     At *remove USB(3.0) Storage
     sequence <1> --> <5> ((Problem Case))
=================================================
                                  VOLD
------------------------------------|------------
                                 (uevent)
                            ________|_________
                           |<1>               |
                           |dwc3_otg_sm_work  |
                           |usb_put_hcd       |
                           |peer_hcd(kref=2)|
                           |__________________|
                            ________|_________
                           |<2>               |
                           |New USB BUS #2    |
                           |                  |
                           |peer_hcd(kref=1)  |
                           |                  |
                         --(Link)-bandXX_mutex|
                         | |__________________|
                         |
    ___________________  |
   |<3>                | |
   |dwc3_otg_sm_work   | |
   |usb_put_hcd        | |
   |primary_hcd(kref=1)| |
   |___________________| |
    _________|_________  |
   |<4>                | |
   |New USB BUS #1     | |
   |hcd_release        | |
   |primary_hcd(kref=0)| |
   |                   | |
   |bandXX_mutex(free) |<-
   |___________________|
                               (( VOLD ))
                            ______|___________
                           |<5>               |
                           |      SCSI        |
                           |usb_put_hcd       |
                           |peer_hcd(kref=0)  |
                           |*hcd_release      |
                           |bandXX_mutex(free*)|<- double free
                           |__________________|

=================================================

This happens because hcd_release() frees the bandwidth_mutex whenever
it sees a primary hcd being released (which is not a very good idea
in any case), but in the course of releasing the primary hcd, it
changes the pointers in the shared hcd in such a way that the shared
hcd will appear to be primary when it gets released.

This patch fixes the problem by changing hcd_release() so that it
deallocates the bandwidth_mutex only when the _last_ hcd structure
referencing it is released.  The patch also removes an unnecessary
test, so that when an hcd is released, both the shared_hcd and
primary_hcd pointers in the hcd's peer will be cleared.

Signed-off-by: Alan Stern <[email protected]>
Reported-by: Chung-Geol Kim <[email protected]>
Tested-by: Chung-Geol Kim <[email protected]>
CC: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jthornber pushed a commit that referenced this issue Jul 28, 2016
I found a race condition triggering VM_BUG_ON() in freeze_page(), when
running a testcase with 3 processes:
  - process 1: keep writing thp,
  - process 2: keep clearing soft-dirty bits from virtual address of process 1
  - process 3: call migratepages for process 1,

The kernel message is like this:

  kernel BUG at /src/linux-dev/mm/huge_memory.c:3096!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: cfg80211 rfkill crc32c_intel ppdev serio_raw pcspkr virtio_balloon virtio_console parport_pc parport pvpanic acpi_cpufreq tpm_tis tpm i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy virtio_pci virtio_ring virtio
  CPU: 0 PID: 28863 Comm: migratepages Not tainted 4.6.0-v4.6-160602-0827-+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff880037320000 ti: ffff88007cdd0000 task.ti: ffff88007cdd0000
  RIP: 0010:[<ffffffff811f8e06>]  [<ffffffff811f8e06>] split_huge_page_to_list+0x496/0x590
  RSP: 0018:ffff88007cdd3b70  EFLAGS: 00010202
  RAX: 0000000000000001 RBX: ffff88007c7b88c0 RCX: 0000000000000000
  RDX: 0000000000000000 RSI: 0000000700000200 RDI: ffffea0003188000
  RBP: ffff88007cdd3bb8 R08: 0000000000000001 R09: 00003ffffffff000
  R10: ffff880000000000 R11: ffffc000001fffff R12: ffffea0003188000
  R13: ffffea0003188000 R14: 0000000000000000 R15: 0400000000000080
  FS:  00007f8ec241d740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000             CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f8ec1f3ed20 CR3: 000000003707b000 CR4: 00000000000006f0
  Call Trace:
    ? list_del+0xd/0x30
    queue_pages_pte_range+0x4d1/0x590
    __walk_page_range+0x204/0x4e0
    walk_page_range+0x71/0xf0
    queue_pages_range+0x75/0x90
    ? queue_pages_hugetlb+0x190/0x190
    ? new_node_page+0xc0/0xc0
    ? change_prot_numa+0x40/0x40
    migrate_to_node+0x71/0xd0
    do_migrate_pages+0x1c3/0x210
    SyS_migrate_pages+0x261/0x290
    entry_SYSCALL_64_fastpath+0x1a/0xa4
  Code: e8 b0 87 fb ff 0f 0b 48 c7 c6 30 32 9f 81 e8 a2 87 fb ff 0f 0b 48 c7 c6 b8 46 9f 81 e8 94 87 fb ff 0f 0b 85 c0 0f 84 3e fd ff ff <0f> 0b 85 c0 0f 85 a6 00 00 00 48 8b 75 c0 4c 89 f7 41 be f0 ff
  RIP   split_huge_page_to_list+0x496/0x590

I'm not sure of the full scenario of the reproduction, but my debug
showed that split_huge_pmd_address(freeze=true) returned without running
main code of pmd splitting because pmd_present(*pmd) in precheck somehow
returned 0.  If this happens, the subsequent try_to_unmap() fails and
returns non-zero (because page_mapcount() still > 0), and finally
VM_BUG_ON() fires.  This patch tries to fix it by prechecking pmd state
inside ptl.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Naoya Horiguchi <[email protected]>
Signed-off-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
jthornber pushed a commit that referenced this issue Feb 1, 2017
…s enabled

Nils Holland and Klaus Ethgen have reported unexpected OOM killer
invocations with 32b kernel starting with 4.8 kernels

	kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
	kworker/u4:5 cpuset=/ mems_allowed=0
	CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
	[...]
	Mem-Info:
	active_anon:58685 inactive_anon:90 isolated_anon:0
	 active_file:274324 inactive_file:281962 isolated_file:0
	 unevictable:0 dirty:649 writeback:0 unstable:0
	 slab_reclaimable:40662 slab_unreclaimable:17754
	 mapped:7382 shmem:202 pagetables:351 bounce:0
	 free:206736 free_pcp:332 free_cma:0
	Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
	DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
	lowmem_reserve[]: 0 813 3474 3474
	Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
	lowmem_reserve[]: 0 0 21292 21292
	HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB

the oom killer is clearly pre-mature because there there is still a lot
of page cache in the zone Normal which should satisfy this lowmem
request.  Further debugging has shown that the reclaim cannot make any
forward progress because the page cache is hidden in the active list
which doesn't get rotated because inactive_list_is_low is not memcg
aware.

The code simply subtracts per-zone highmem counters from the respective
memcg's lru sizes which doesn't make any sense.  We can simply end up
always seeing the resulting active and inactive counts 0 and return
false.  This issue is not limited to 32b kernels but in practice the
effect on systems without CONFIG_HIGHMEM would be much harder to notice
because we do not invoke the OOM killer for allocations requests
targeting < ZONE_NORMAL.

Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
and subtract per-memcg highmem counts when memcg is enabled.  Introduce
helper lruvec_zone_lru_size which redirects to either zone counters or
mem_cgroup_get_zone_lru_size when appropriate.

We are losing empty LRU but non-zero lru size detection introduced by
ca70723 ("mm: update_lru_size warn and reset bad lru_size") because
of the inherent zone vs. node discrepancy.

Fixes: f8d1a31 ("mm: consider whether to decivate based on eligible zones inactive ratio")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Nils Holland <[email protected]>
Tested-by: Nils Holland <[email protected]>
Reported-by: Klaus Ethgen <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Reviewed-by: Vladimir Davydov <[email protected]>
Cc: <[email protected]>	[4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
jthornber pushed a commit that referenced this issue Feb 1, 2017
Mathieu reported that the LTTNG modules are broken as of 4.10-rc1 due to
the removal of the cpu hotplug notifiers.

Usually I don't care much about out of tree modules, but LTTNG is widely
used in distros. There are two ways to solve that:

1) Reserve a hotplug state for LTTNG

2) Add a dynamic range for the prepare states.

While #1 is the simplest solution, #2 is the proper one as we can convert
in tree users, which do not care about ordering, to the dynamic range as
well.

Add a dynamic range which allows LTTNG to request states in the prepare
stage.

Reported-and-tested-by: Mathieu Desnoyers <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Reviewed-by: Mathieu Desnoyers <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Sewior <[email protected]>
Cc: Steven Rostedt <[email protected]>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701101353010.3401@nanos
Signed-off-by: Thomas Gleixner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant