Add a hash-based `factorize` primitive

## Summary

Add a hash-based `factorize(array)` that returns `(uniques, codes)` for a 1D array of
**any** dtype, in O(n), where `array[i] == uniques[codes[i]]`. This is the standard
"dense integer coding" of a categorical column (cf. `pandas.factorize`,
`numpy.unique(..., return_inverse=True)`), but computed via arraykit's existing hash
table rather than a sort.

## Motivation

Grouping, pivoting, and index de-duplication all reduce to the same primitive: map each
element of an array to a dense integer code and recover the distinct values. Today the
only way to do this in the numpy/arraykit stack is **sort-based**:

- `numpy.unique(a, return_inverse=True)` — argsort of all *n* elements.
- arraykit's own `ufunc_unique1d_indexer` (used by `array_to_groups_and_locations`) — also `argsort`.

The sort is the bottleneck, and it is worst exactly where grouping is most common:
**string keys**. Measured on `n = 100_000` rows, `k = 5_000` groups (`<U` array):

| step | time |
|---|---|
| `np.unique(a, return_inverse=True)` (sort) | **14.9 ms** |
| `pd.factorize(a)` on the same `<U` array (khash) | 7.8 ms |
| `pd.factorize` on the `object`-dtype equivalent (khash + cached hashes) | 2.9 ms |
| downstream reduction (`np.bincount`) | 0.12 ms |

Factorization is ~99% of the cost; the reduction is free. A hash-based factorize roughly
halves the time on `<U` arrays, and the gap to pandas' 2.9 ms is then purely the string
*representation* (object-interned, cached-hash strings vs. fixed-width `<U`), not the
algorithm.

Concrete downstream case: static-frame's `Frame.pivot` on a string key currently
factorizes with `np.unique` and is ~3.6× slower than the pandas equivalent end-to-end
(13.8 ms vs 3.8 ms), with essentially all of the difference in the sort.

## Proposed API

```python
def factorize(
    array: np.ndarray,
    *,
    sort: bool = False,
) -> tuple[np.ndarray, np.ndarray]:
    """
    Return (uniques, codes) such that array[i] == uniques[codes[i]].

    uniques: 1D array of the distinct values, in first-appearance order
             (sort=False) or ascending sorted order (sort=True).
    codes:   1D intp array, len == len(array), values in [0, len(uniques)).
    """
```

Semantics mirror `pandas.factorize` / `numpy.unique(return_inverse=True)`:

- **Ordering.** Default `sort=False` returns uniques in first-appearance (hash-insertion)
  order — the cheap, natural output. `sort=True` additionally argsorts the *k* uniques
  and remaps, which is O(n + k log k) and still far cheaper than sorting *n* elements
  when k << n. (Callers that need sorted output — e.g. index construction — get it
  without a full re-sort.)
- **codes dtype** `intp`, matching what `np.bincount` and fancy-indexing consume with no
  copy.

## Why arraykit, and why this is low-risk

arraykit already contains the entire hash table this needs: `AutoMap` / `FrozenAutoMap`
map values to sequential integers in insertion order, and `FrozenAutoMap.get_all(array)`
is a vectorized O(n) code lookup that already works across **every** dtype we tried —
`int*`, `uint*`, `float64`, `<U`, `S`, `datetime64`, and `object` (verified: `uniques[codes]`
round-trips for all of them).

The only reason these maps can't factorize today is that they **raise `NonUniqueError`
on a duplicate insert** — they are maps, not factorizers. A `factorize` entry point is
therefore a small addition over proven code: run the existing insertion hashing, but on a
repeated key return the already-assigned code instead of raising. No new hash table, no
new dtype handling.

## Benefits across dtypes

Because it reuses the existing all-dtype hash, `factorize` helps everywhere, though the
*magnitude* and *kind* of benefit vary:

| dtype | today | with hash `factorize` |
|---|---|---|
| `U` / `S` (strings) | sort — slowest case | **large speedup** (sort of variable-length strings is the worst case) |
| `object` | sort via Python compares — very slow | **large speedup** |
| `float` | *no O(n) coding possible; grouping falls back to sort* | **new O(n) path** (floats can't be dense-coded) |
| sparse `int` / `datetime64` | dense coding bails when range ≫ n → sort fallback | **new O(n) path** for the sparse case |
| dense `int`, `bool` | already O(n) via `key - min` dense coding | little/no change (dense coding already optimal) |

So it is not only a string optimization: it gives numeric/temporal **float keys and
sparse-integer keys a vectorized grouping path they don't have at all today**, while
leaving the already-optimal dense-integer path alone.

## Design considerations / edge cases

- **NaN.** `FrozenAutoMap` currently raises `KeyError` on a `NaN` float key (since
  `nan != nan`). `factorize` must define this explicitly — either treat all NaN as one
  group, or emit a sentinel code (`-1`, as `pandas.factorize(use_na_sentinel=True)` does).
- **Immutability.** arraykit's maps require immutable arrays; `factorize` should either
  accept mutable input or document the requirement.
- **Empty / single-value / all-unique** arrays should be handled (degenerate but common).
- **Relationship to existing API.** `factorize` generalizes `ufunc_unique1d_indexer`
  (which is sort-based) and complements `get_new_indexers_and_screen` (which already
  re-screens *integer* codes). A shared C core could back all three.

## Downstream impact in static-frame

- `pivot_group_reduce_1d` gains a hash path for string keys and, newly, for float and
  sparse-integer keys.
- `array_to_groups_and_locations` / `iter_group` can drop their sort for the unsorted case.
- Index construction / set operations that currently sort to dedup.


dtype	today	with hash `factorize`
`U` / `S` (strings)	sort — slowest case	large speedup (sort of variable-length strings is the worst case)
`object`	sort via Python compares — very slow	large speedup
`float`	no O(n) coding possible; grouping falls back to sort	new O(n) path (floats can't be dense-coded)
sparse `int` / `datetime64`	dense coding bails when range ≫ n → sort fallback	new O(n) path for the sparse case
dense `int`, `bool`	already O(n) via `key - min` dense coding	little/no change (dense coding already optimal)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a hash-based `factorize` primitive #212

Summary

Motivation

Proposed API

Why arraykit, and why this is low-risk

Benefits across dtypes

Design considerations / edge cases

Downstream impact in static-frame

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

step	time
`np.unique(a, return_inverse=True)` (sort)	14.9 ms
`pd.factorize(a)` on the same `<U` array (khash)	7.8 ms
`pd.factorize` on the `object`-dtype equivalent (khash + cached hashes)	2.9 ms
downstream reduction (`np.bincount`)	0.12 ms

Uh oh!

Add a hash-based factorize primitive #212

Description

Summary

Motivation

Proposed API

Why arraykit, and why this is low-risk

Benefits across dtypes

Design considerations / edge cases

Downstream impact in static-frame

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Add a hash-based `factorize` primitive #212