bags (Multisets)

Bags are a multiset implementation for Python. Currently, bags have constant time inclusion testing but can only contain hashable elements due to the implementation.

There are three classes provided:

Bag

An abstract base class for bags.

bag

A mutable (unhashable) Bag.

frozenbag

An immutable (implements collections.abc.Hashable) Bag.

The Bag base class implements collections.abc.Sized, collections.abc.Iterable and collections.abc.Container as well as collections.abc.Collection starting in Python 3.6 and the polyfilled Collection for Python < 3.6.

Set Operations

Bags use python operators for multiset operations:

  • __add__ (a + b): The sum of two multisets

  • __sub__ (a - b): The difference between a and b

  • __and__ (a & b): The intersection of a and b

  • __or__ (a | b): The union of a and b

  • __xor__ (a ^ b): The symmetric difference between a and b

bag has the equivalent in-place operators defined.

Comparison Methods

Bags are comparable only to other Bags. Ordering comparisons are done setwise.

>>> bag('ac') <= bag('ab')
False
>>> bag('ac') >= bag('ab')
False
>>> bag('a') <= bag('a') < bag('aa')
True
>>> bag('aa') <= bag('a')
False

Compared to existing similar implementations

collections.Counter

Counters don’t really behave like Collections - Sized, Iterable, Containers

Adding and Removing

>>> c = Counter()
>>> c['a'] += 1
>>> c['a'] -= 1
>>> 'a' in c
True
>>> b = bag()
>>> b.add('a')
>>> 'a' in b
True
>>> b.remove('a')
>>> 'a' in b
False

len

>>> c = Counter()
>>> c['a'] += 1
>>> len(c)
1
>>> c['a'] -= 1
>>> len(c)
1
>>> c['a'] += 2
>>> len(c)
1
>>> len(Counter('aaabbc'))
3
>>> b = bag()
>>> b.add('a')
>>> len(b)
1
>>> b.remove('a')
>>> len(b)
0
>>> len(bag('aaabbc'))
6

Iterating

>>> for item in Counter('aaa'): print(item)
a
>>> for item in bag('aaa'): print(item)
a
a
a

Compared to Standard Types

bag vs. list

  • Inclusion testing is O(1)

  • Adding and removing elements is O(1)

  • Cannot add mutable elements

  • Elements aren’t ordered

bag vs. set

  • Can add multiple instances of equal elements

New Methods

These are bag methods that are not implementing an abstract method from a standard Python ABC.

num_unique_elements

Returns the number of unique elements in the bag. O(1)

unique_elements()

Returns a set of all the unique elements in the bag. O(1)

nlargest(n=None)

Returns the n most common elements and their counts from most common to least. If n is None then all elements are returned. O(n log n)

copy()

Returns a shallow copy of self. O(self.num_unique_elements())

isdisjoint(other: Iterable)

Tests if self is disjoint with any other Iterable. O(len(other))

issubset(other: Iterable)

Tests if self is a subset of another Iterable.

issuperset(other: Iterable)

Tests if self is a superset of another Iterable.

from_mapping(map: Mapping)

Classmethod to create a bag from a Mapping that maps elements to counts.

The following are only for mutable bags (not frozenbags).

  • pop()

  • add(elem)

  • discard(elem)

  • remove(elem)

  • clear()

API

Bag

class collections_extended.Bag(iterable=None)[source]

Bases: collections.abc.Collection

Base class for bag classes.

Base class for bag and frozenbag. Is not mutable and not hashable, so there’s no reason to use this instead of either bag or frozenbag.

__init__(iterable=None)[source]

Create a new bag.

If iterable isn’t given, is None or is empty then the bag starts empty. Otherwise each element from iterable will be added to the bag however many times it appears.

This runs in O(len(iterable))

copy()[source]

Create a shallow copy of self.

This runs in O(len(self.num_unique_elements()))

num_unique_elements()[source]

Return the number of unique elements.

This runs in O(1) time

unique_elements()[source]

Return a view of unique elements in this bag.

count(value)[source]

Return the number of value present in this bag.

If value is not in the bag no Error is raised, instead 0 is returned.

This runs in O(1) time

Parameters

value – The element of self to get the count of

Returns

The count of value in self

Return type

int

nlargest(n=None)[source]

List the n most common elements and their counts. List is from the most common to the least. If n is None, the list all element counts. Run time should be O(m log m) where m is len(self) Args: n (int): The number of elements to return

Deprecated since version 1.0: Use heapq.nlargest(n, self.counts(), key=itemgetter(1)) instead or sorted(self.counts(), reverse=True, key=itemgetter(1)) for n=None

counts()[source]

Return a view of the unique elements in self and their counts.

New in version 1.0.3.

classmethod from_mapping(mapping)[source]

Create a bag from a dict of elem->count.

Each key in the dict is added if the value is > 0.

Raises

ValueError – If any count is < 0.

__len__()[source]

Return the cardinality of the bag.

This runs in O(1)

__contains__(value)[source]

Return the multiplicity of the element.

This runs in O(1)

__iter__()[source]

Iterate through all elements.

Multiple copies will be returned if they exist.

issubset(other)[source]

Check that every element in self has a count <= in other.

Parameters

other (Iterable) –

issuperset(other)[source]

Check that every element in self has a count >= in other.

Parameters

other (Iterable) –

__and__(other)[source]

Intersection is the minimum of corresponding counts.

This runs in O(l + n) where:
  • n is self.num_unique_elements()

  • l = 1 if other is a bag else l = len(other)

isdisjoint(other)[source]

Return if this bag is disjoint with the passed collection.

This runs in O(len(other))

__or__(other)[source]

Union is the maximum of all elements.

This runs in O(m + n) where:
  • n = self.num_unique_elements()

  • m = other.num_unique_elements() if other is a bag else m = len(other)

__add__(other)[source]

Return a new bag also containing all the elements of other.

self + other = self & other + self | other

This runs in O(m + n) where:
  • n is self.num_unique_elements()

  • m is len(other)

Parameters

other (Iterable) – elements to add to self

__sub__(other)[source]

Difference between the sets.

For normal sets this is all x s.t. x in self and x not in other. For bags this is count(x) = max(0, self.count(x)-other.count(x))

This runs in O(m + n) where:
  • n is self.num_unique_elements()

  • m is len(other)

Parameters

other (Iterable) – elements to remove

__mul__(other)[source]

Cartesian product with other.

product(other, operator=None)[source]

Cartesian product of the two sets.

Optionally, pass an operator to combine elements instead of creating a tuple.

This should run in O(m*n+l) where:
  • m is the number of unique elements in self

  • n is the number of unique elements in other

  • l is 0 if other is a bag, else l is the len(other)

Parameters
  • other (Iterable) – The iterable to take the product with.

  • operator (Callable) – A function that accepts an element from self and other and returns a combined value to include in the return value.

__xor__(other)[source]

Symmetric difference between the sets.

other can be any iterable.

This runs in O(m + n) where:

m = len(self) n = len(other)

bag

class collections_extended.bag(iterable=None)[source]

Bases: collections_extended.bags.Bag

bag is a mutable unhashable bag.

__init__(iterable=None)

Create a new bag.

If iterable isn’t given, is None or is empty then the bag starts empty. Otherwise each element from iterable will be added to the bag however many times it appears.

This runs in O(len(iterable))

pop()[source]

Remove and return an element of self.

add(elem)[source]

Add elem to self.

discard(elem)[source]

Remove elem from this bag, silent if it isn’t present.

remove(elem)[source]

Remove elem from this bag, raising a ValueError if it isn’t present.

Parameters

elem – object to remove from self

Raises

ValueError – if the elem isn’t present

discard_all(other)[source]

Discard all of the elems from other.

remove_all(other)[source]

Remove all of the elems from other.

Raises a ValueError if the multiplicity of any elem in other is greater than in self.

clear()[source]

Remove all elements from this bag.

frozenbag

class collections_extended.frozenbag(iterable=None)[source]

Bases: collections_extended.bags.Bag, collections.abc.Hashable

frozenbag is an immutable, hashable bag.

__init__(iterable=None)

Create a new bag.

If iterable isn’t given, is None or is empty then the bag starts empty. Otherwise each element from iterable will be added to the bag however many times it appears.

This runs in O(len(iterable))

__hash__()[source]

Compute the hash value of a frozenbag.

Views

class collections_extended.CountsView(bag)[source]

Bases: collections_extended.bags.BagView

A view for the unique items and their counts in a bag.

New in version 1.0.

class collections_extended.UniqueElementsView(bag)[source]

Bases: collections_extended.bags.BagView

A view for the unique items and their counts in a bag.

New in version 1.0.