Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
6aed859
Add `remove()` and `repack()` to `ZipFile`
danny0838 May 24, 2025
5453dbc
📜🤖 Added by blurb_it.
blurb-it[bot] May 24, 2025
80ab2e2
Fix and optimize test code
danny0838 May 24, 2025
72c2a66
Handle common setups with `setUpClass`
danny0838 May 24, 2025
a4b410b
Add tests for mode `w` and `x` for `remove()`
danny0838 May 24, 2025
a9e85c6
Introduce `_calc_initial_entry_offset` and refactor
danny0838 May 24, 2025
236cd06
Optimize `_calc_initial_entry_offset` by introducing cache
danny0838 May 24, 2025
bdc58c7
Introduce `_validate_local_file_entry` and refactor
danny0838 May 24, 2025
c3c8345
Introduce `_debug` and refactor
danny0838 May 24, 2025
1b7d75a
Introduce `_move_entry_data` and rework chunk_size passing
danny0838 May 25, 2025
51c9254
Refactor `_validate_local_file_entry`
danny0838 May 25, 2025
0d971d8
Add `strict_descriptor` option
danny0838 May 25, 2025
8f0a504
Fix and improve validation tests
danny0838 May 25, 2025
0cb8682
Remove obsolete NameToInfo updating
danny0838 May 25, 2025
a788a00
Use `zinfo` rather than `info`
danny0838 May 25, 2025
ae01b8c
Raise on overlapping file blocks
danny0838 May 25, 2025
edee203
Rework writing protection
danny0838 May 25, 2025
555ac78
Update doc
danny0838 May 25, 2025
95fde31
Fix typo
danny0838 May 26, 2025
8a448e4
Add test for bytes between file entries
danny0838 May 26, 2025
4c35eb2
Check `testzip()` after zip file closed
danny0838 May 26, 2025
926338c
Support `repack(removed)`
danny0838 May 26, 2025
e76f9a1
Fix bytes between entries be removed when `removed` is passed
danny0838 May 26, 2025
93f4c25
Fix bad test code
danny0838 May 26, 2025
9e94209
Revise docstring
danny0838 May 27, 2025
3ef72c6
Add `tearDown` for tests
danny0838 May 28, 2025
fbf7588
Rename methods and parameters
danny0838 May 28, 2025
81a419a
Adjust parameter order
danny0838 May 28, 2025
c62a455
Optimize code and revise comment
danny0838 May 28, 2025
a05353c
Improve debug for `_ZipRepacker.repack()`
danny0838 May 29, 2025
3d0240c
Rework `_validate_local_file_entry_sequence` to return size or None
danny0838 May 29, 2025
31c4c93
Rework `_validate_local_file_entry_sequence` to allow passing no `che…
danny0838 May 29, 2025
f8fade1
Introduce `_scan_data_descriptor_no_sig_by_decompression`
danny0838 May 30, 2025
c80d21b
Strip only entries immediately following a referenced entry
danny0838 May 29, 2025
e1caea9
Adjust method names
danny0838 May 30, 2025
2b23d46
Add memory usage test
danny0838 May 30, 2025
de4f15b
Fix rst
danny0838 May 30, 2025
ea3259f
Optimize code
danny0838 Jun 1, 2025
fef92c4
Fix and optimize `_iter_scan_signature`
danny0838 Jun 1, 2025
8067b0c
Fix `_scan_data_descriptor`
danny0838 Jun 1, 2025
92d3a9c
Fix and optimize `_scan_data_descriptor_no_sig`
danny0838 Jun 1, 2025
b5d7ae3
Rename `_trace_compressed_block_end`
danny0838 Jun 1, 2025
1d5ec61
Fix `_scan_data_descriptor_no_sig_by_decompression`
danny0838 Jun 1, 2025
db9d0d6
Add tests for `_ZipRepacker`
danny0838 Jun 1, 2025
aaa566c
Remove unneeded import
danny0838 Jun 1, 2025
578c7c8
Add requirements
danny0838 Jun 1, 2025
c470c33
Fix `_scan_data_descriptor_no_sig_by_decompression` when library not …
danny0838 Jun 1, 2025
b1dcb07
Test with pre-calculated CRC
danny0838 Jun 1, 2025
04cddef
Remove unneeded import
danny0838 Jun 1, 2025
797a62c
Fix and optimize `repack`
danny0838 Jun 1, 2025
3b2f232
Remove unneeded catch type
danny0838 Jun 14, 2025
cb549c9
Patch more explicitly
danny0838 Jun 14, 2025
0f50a6f
Remove unneeded variables
danny0838 Jun 14, 2025
c759b63
Improve dependency check for decompression tests
danny0838 Jun 14, 2025
1ece5b1
Refactor and optimize `RepackHelperMixin`
danny0838 Jun 14, 2025
ce88616
Update NEWS
danny0838 Jun 20, 2025
5f093e5
Sync with danny0838/zipremove@1691ca25bf971cf1e45d5ed7d22c512636f20cb8
danny0838 Jun 20, 2025
11c0937
Revise NEWS
danny0838 Jun 20, 2025
4b2176e
Sync with danny0838/zipremove@1843d87b70e6cb129fb55446eaf4486a87d2af4d
danny0838 Jun 21, 2025
d9824ce
Fix timezone related timestamp issue
danny0838 Jun 21, 2025
85811ab
Simplify tests with data descriptors
danny0838 Jun 22, 2025
748ac63
Sync with danny0838/zipremove@e79042768f3c2541e0226f6bed3a9ff2ee04fac0
danny0838 Jun 23, 2025
001a8d0
Sync with danny0838/zipremove@87bcdb50411a355d24c35f31dcbe4273c0568cf8
danny0838 Jun 24, 2025
3a364ce
Sync with danny0838/zipremove@6a78bd15de87afde510f8a1b6364365c6e17f252
danny0838 Jun 25, 2025
0832528
Sync with danny0838/zipremove@092f98b4d7b3a0cd335fe4ba64e7090ebb3dc6da
danny0838 Jun 27, 2025
f20ec5d
Revise doc for `repack`
danny0838 Jun 28, 2025
8e69c09
Revise doc for `remove`
danny0838 Jun 28, 2025
725b1a3
Update `data_offset`
danny0838 Jun 29, 2025
9e82bb7
Revise doc for `repack`
danny0838 Jul 1, 2025
93db94a
Revise doc for `repack`
danny0838 Jul 2, 2025
72673e0
Sync with danny0838/zipremove@8bedf7c9b891acadc3393d2f1267b78bd9b5a49a
danny0838 Jul 3, 2025
e926a95
Sync with danny0838/zipremove@86a240bf019fe9212b1e72c963306186163fb8b8
danny0838 Jul 22, 2025
ee3b753
Sync with danny0838/zipremove@8ad341bdd28a78033d7111a0532cb4714349aee3
danny0838 Jun 20, 2026
5cbe51b
Merge remote-tracking branch 'origin/main' into gh-51067-2
gpshead Jun 20, 2026
3dde04e
docs: clarify unspecified multiple entry removal, suggest ZipInfo
gpshead Jun 20, 2026
59d7b0d
whatsnew entry
gpshead Jun 20, 2026
9376c2d
gh-51067: Default ZipFile.repack() to strict_descriptor=True
gpshead Jun 20, 2026
cbae620
gh-51067: Document reliable space reclamation in ZipFile.repack()
gpshead Jun 20, 2026
e0fcc69
gh-51067: Bound _validate_local_file_entry to the scanned gap
gpshead Jun 20, 2026
adcd088
gh-51067: Guard _copy_bytes against short reads
gpshead Jun 20, 2026
b6dfda7
gh-51067: Spell out ZipFile.repack() keyword arguments
gpshead Jun 20, 2026
645910a
gh-51067: Move NEWS entry to Library section
gpshead Jun 20, 2026
01b493e
gh-51067: Fix docstring typos in _ZipRepacker
gpshead Jun 20, 2026
d339b6c
gh-51067: Drop unreachable struct.error handler in no-sig DD scan
gpshead Jun 20, 2026
b6937c5
gh-51067: Use io.DEFAULT_BUFFER_SIZE in no-sig DD scan
gpshead Jun 20, 2026
e7125e7
gh-51067: Replace ComparableZipInfo class with a function
gpshead Jun 20, 2026
eb29a30
gh-51067: Add _REPACK_CHUNK_SIZE constant
gpshead Jun 20, 2026
b70a80b
gh-51067: Add tests for entry-size overshoot and _copy_bytes EOF
gpshead Jun 20, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions Doc/library/zipfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -550,6 +550,94 @@ ZipFile objects
.. versionadded:: 3.11


.. method:: ZipFile.remove(zinfo_or_arcname)

Removes a member entry from the archive's central directory.
*zinfo_or_arcname* may be the full path of the member or a :class:`ZipInfo`
instance. If multiple members share the same full path and the path is
given as a string, only one of them is removed and which one is unspecified;
it should not be relied upon. Pass the specific :class:`ZipInfo` instance to
remove a particular member.

The archive must be opened with mode ``'w'``, ``'x'`` or ``'a'``.

Returns the removed :class:`ZipInfo` instance.

Calling :meth:`remove` on a closed ZipFile will raise a :exc:`ValueError`.

.. note::
This method only removes the member's entry from the central directory,
making it inaccessible to most tools. The member's local file entry,
including content and metadata, remains in the archive and is still
recoverable using forensic tools. Call :meth:`repack` afterwards to
remove the local file entry and reclaim space; pass the returned
:class:`ZipInfo` to :meth:`repack` to ensure the data is removed
regardless of how the entry was written.

.. versionadded:: next


.. method:: ZipFile.repack(removed=None, *, \
strict_descriptor=True[, chunk_size])

Rewrites the archive to remove unreferenced local file entries, shrinking
its file size. The archive must be opened with mode ``'a'``.

If *removed* is provided, it must be a sequence of :class:`ZipInfo` objects
representing the recently removed members, and only their corresponding
local file entries will be removed. Otherwise, the archive is scanned to
locate and remove local file entries that are no longer referenced in the
central directory.

Passing *removed* is the most reliable way to reclaim space: the
corresponding local file entries are located directly from the central
directory and removed regardless of how they were written, whereas the scan
used when *removed* is omitted may leave some entries in place (see
*strict_descriptor* below). To remove members and reclaim their space in a
single step::

with ZipFile('spam.zip', 'a') as myzip:
removed = [myzip.remove(name) for name in ('ham.txt', 'eggs.txt')]
myzip.repack(removed)

When scanning, *strict_descriptor* controls how entries written with an
unsigned *data descriptor* are handled. A data descriptor is an optional
record holding an entry's CRC and sizes, stored just after the entry's data;
it is used when the archive is written to a non-seekable stream, and is
*signed* when it begins with a marker signature or *unsigned* otherwise.
Unsigned descriptors have been deprecated by the `PKZIP Application Note`_
since version 6.3.0 (released in 2006) and are written only by some legacy
tools; signed descriptors—written by Python and other modern tools—are always
detected. When *strict_descriptor* is true (the default), only signed data
descriptors are detected, so an unreferenced entry written with an unsigned
descriptor is not located and its space is not reclaimed by the scan.
Setting ``strict_descriptor=False`` additionally detects unsigned
descriptors, at the cost of a significantly slower scan—around 100 to 1000
times in the worst case—which may be exploitable as a denial-of-service
vector on untrusted input. This does not affect entries without a data
descriptor, and is not needed when *removed* is provided.

*chunk_size* may be specified to control the buffer size when moving
entry data (default is 1 MiB).

Calling :meth:`repack` on a closed ZipFile will raise a :exc:`ValueError`.

.. note::
The scanning algorithm is heuristic-based and assumes that the ZIP file
is normally structured—for example, with local file entries stored
consecutively, without overlap or interleaved binary data. Prepended
binary data, such as a self-extractor stub, is recognized and preserved
unless it happens to contain bytes that coincidentally resemble a valid
local file entry in multiple respects—an extremely rare case. Embedded
ZIP payloads are also handled correctly, as long as they follow normal
structure. However, the algorithm does not guarantee correctness or
safety on untrusted or intentionally crafted input. It is generally
recommended to provide the *removed* argument for better reliability and
performance.

.. versionadded:: next


The following data attributes are also available:

.. attribute:: ZipFile.filename
Expand Down
9 changes: 9 additions & 0 deletions Doc/whatsnew/3.16.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,15 @@ xml
instead of failing later, when encounter non-ASCII data.
(Contributed by Serhiy Storchaka in :gh:`62259`.)

zipfile
-------

* Add :meth:`ZipFile.remove() <zipfile.ZipFile.remove>` to remove a member
from an archive's central directory, and
:meth:`ZipFile.repack() <zipfile.ZipFile.repack>` to reclaim the space used
by the local file entries of removed members.
(Contributed by Danny Lin in :gh:`51067`.)

.. Add improved modules above alphabetically, not here at the end.
Optimizations
Expand Down
Loading
Loading