Discussion:
SEEK_HOLE at EOF
(too old to reply)
Alan Somers
2024-04-04 18:14:45 UTC
Permalink
tldr; there are two problems:
1) tmpfs handles SEEK_HOLE differently than other file systems
2) everything else handles SEEK_HOLE at EOF poorly, IMHO

Details:

According to lseek(2), SEEK_HOLE should return the start of the next
hole greater than or equal to the supplied offset. Also, each file
has a zero-sized virtual hole at the very end of the file. So I would
expect that calling SEEK_HOLE at EOF would return the file's size.
However, the man page also says that SEEK_HOLE will return ENXIO when
the offset points to EOF. Those two statements seem contradictory to
me. The first behavior seems more logical. I would expect SEEK_HOLE
to work the same way both at EOF and at any other file offset.

What does the spec say?

There is no POSIX standard for this. It was invented by Solaris,
Illumos's man page does not say clearly say what should happen at EOF.
Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
offset is beyond the end of the file". That would seem to indicate
behavior 1: SEEK_HOLE should return the file's size at EOF. Only
beyond EOF should it return ENXIO.

But what do other implementations do?

Contrary to its man page, Linux behaves mostly like FreeBSD. SEEK_HOLE
returns ENXIO at EOF on most file systems. I tested a number of file
systems on both FreeBSD and Linux. Most of them return ENXIO. The
only two outliers are FreeBSD's tmpfs and Linux's NFS client.

FreeBSD Linux
======= ========= =====
UFS ENXIO
ZFS ENXIO
tmpfs file size ENXIO
msdosfs ENXIO ENXIO
ext2fs ENXIO ENXIO
xfs ENXIO
tarfs ENXIO
nfs ENXIO file size

So what should we change? Clearly, it's bad for tmpfs to be
inconsistent. My preference would be for everything to behave like
tmpfs, but it's currently losing the popularity contest. Anybody else
have thoughts?

-Alan


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Rick Macklem
2024-04-04 20:56:31 UTC
Permalink
Post by Alan Somers
1) tmpfs handles SEEK_HOLE differently than other file systems
2) everything else handles SEEK_HOLE at EOF poorly, IMHO
According to lseek(2), SEEK_HOLE should return the start of the next
hole greater than or equal to the supplied offset. Also, each file
has a zero-sized virtual hole at the very end of the file. So I would
expect that calling SEEK_HOLE at EOF would return the file's size.
However, the man page also says that SEEK_HOLE will return ENXIO when
the offset points to EOF. Those two statements seem contradictory to
me. The first behavior seems more logical. I would expect SEEK_HOLE
to work the same way both at EOF and at any other file offset.
What does the spec say?
There is no POSIX standard for this. It was invented by Solaris,
Illumos's man page does not say clearly say what should happen at EOF.
Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
offset is beyond the end of the file". That would seem to indicate
behavior 1: SEEK_HOLE should return the file's size at EOF. Only
beyond EOF should it return ENXIO.
Well, there is the Austin Group stuff (never ratified by POSIX as I
understand it).

Here's what it says about SEEK_HOLE and offset:
If whence is SEEK_HOLE, the file offset shall be set to the smallest
location of a byte within a hole and not less than offset, except that
if offset falls within the last hole, then the file offset may be set
to the file size instead. It shall be an error if offset is greater
or equal to the size of the file.

I'd suggest we follow this, since it is the closest to a standard that there is.

rick
Post by Alan Somers
But what do other implementations do?
Contrary to its man page, Linux behaves mostly like FreeBSD. SEEK_HOLE
returns ENXIO at EOF on most file systems. I tested a number of file
systems on both FreeBSD and Linux. Most of them return ENXIO. The
only two outliers are FreeBSD's tmpfs and Linux's NFS client.
FreeBSD Linux
======= ========= =====
UFS ENXIO
ZFS ENXIO
tmpfs file size ENXIO
msdosfs ENXIO ENXIO
ext2fs ENXIO ENXIO
xfs ENXIO
tarfs ENXIO
nfs ENXIO file size
So what should we change? Clearly, it's bad for tmpfs to be
inconsistent. My preference would be for everything to behave like
tmpfs, but it's currently losing the popularity contest. Anybody else
have thoughts?
-Alan
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alan Somers
2024-04-04 20:59:25 UTC
Permalink
Post by Rick Macklem
Post by Alan Somers
1) tmpfs handles SEEK_HOLE differently than other file systems
2) everything else handles SEEK_HOLE at EOF poorly, IMHO
According to lseek(2), SEEK_HOLE should return the start of the next
hole greater than or equal to the supplied offset. Also, each file
has a zero-sized virtual hole at the very end of the file. So I would
expect that calling SEEK_HOLE at EOF would return the file's size.
However, the man page also says that SEEK_HOLE will return ENXIO when
the offset points to EOF. Those two statements seem contradictory to
me. The first behavior seems more logical. I would expect SEEK_HOLE
to work the same way both at EOF and at any other file offset.
What does the spec say?
There is no POSIX standard for this. It was invented by Solaris,
Illumos's man page does not say clearly say what should happen at EOF.
Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
offset is beyond the end of the file". That would seem to indicate
behavior 1: SEEK_HOLE should return the file's size at EOF. Only
beyond EOF should it return ENXIO.
Well, there is the Austin Group stuff (never ratified by POSIX as I
understand it).
If whence is SEEK_HOLE, the file offset shall be set to the smallest
location of a byte within a hole and not less than offset, except that
if offset falls within the last hole, then the file offset may be set
to the file size instead. It shall be an error if offset is greater
or equal to the size of the file.
I'd suggest we follow this, since it is the closest to a standard that there is.
That sounds like behavior 2: return ENXIO at EOF. For reference, do
you have a link to that somewhere?


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Rick Macklem
2024-04-04 21:38:43 UTC
Permalink
Post by Alan Somers
Post by Rick Macklem
Post by Alan Somers
1) tmpfs handles SEEK_HOLE differently than other file systems
2) everything else handles SEEK_HOLE at EOF poorly, IMHO
According to lseek(2), SEEK_HOLE should return the start of the next
hole greater than or equal to the supplied offset. Also, each file
has a zero-sized virtual hole at the very end of the file. So I would
expect that calling SEEK_HOLE at EOF would return the file's size.
However, the man page also says that SEEK_HOLE will return ENXIO when
the offset points to EOF. Those two statements seem contradictory to
me. The first behavior seems more logical. I would expect SEEK_HOLE
to work the same way both at EOF and at any other file offset.
What does the spec say?
There is no POSIX standard for this. It was invented by Solaris,
Illumos's man page does not say clearly say what should happen at EOF.
Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
offset is beyond the end of the file". That would seem to indicate
behavior 1: SEEK_HOLE should return the file's size at EOF. Only
beyond EOF should it return ENXIO.
Well, there is the Austin Group stuff (never ratified by POSIX as I
understand it).
If whence is SEEK_HOLE, the file offset shall be set to the smallest
location of a byte within a hole and not less than offset, except that
if offset falls within the last hole, then the file offset may be set
to the file size instead. It shall be an error if offset is greater
or equal to the size of the file.
I'd suggest we follow this, since it is the closest to a standard that there is.
That sounds like behavior 2: return ENXIO at EOF. For reference, do
you have a link to that somewhere?
0000415: add SEEK_HOLE, SEEK_DATA to lseek - Austin Group Defect
Tracker (austingroupbugs.net)
If this doesn't give you a link (gmail never shows the raw url for me)
just google
"SEEK_HOLE austin group".

rick


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Steffen Nurpmeso
2024-04-04 22:16:29 UTC
Permalink
Rick Macklem wrote in
<CAM5tNy5btZGYz3Ya-***@mail.gmail.com>:
|On Thu, Apr 4, 2024 at 1:59 PM Alan Somers <***@freebsd.org> wrote:
|>
|> On Thu, Apr 4, 2024 at 2:56 PM Rick Macklem <***@gmailcom> \
|> wrote:
|>>
|>> On Thu, Apr 4, 2024 at 11:15 AM Alan Somers <***@freebsd.org> wrote:
...
|>> Here's what it says about SEEK_HOLE and offset:
|>> If whence is SEEK_HOLE, the file offset shall be set to the smallest
|>> location of a byte within a hole and not less than offset, except that
|>> if offset falls within the last hole, then the file offset may be set
|>> to the file size instead. It shall be an error if offset is greater
|>> or equal to the size of the file.
|>>
|>> I'd suggest we follow this, since it is the closest to a standard \
|>> that there is.
|>
|> That sounds like behavior 2: return ENXIO at EOF. For reference, do
|> you have a link to that somewhere?
|0000415: add SEEK_HOLE, SEEK_DATA to lseek - Austin Group Defect
|Tracker (austingroupbugs.net)
|If this doesn't give you a link (gmail never shows the raw url for me)
|just google
|"SEEK_HOLE austin group".

just a few lines further below


46396 [ENXIO] The whence argument is SEEK_HOLE or SEEK_DATA, and offset is greater
46397 than or equal to the file size; or the whence argument is SEEK_DATA and the
46398 offset falls beyond the last byte not within a hole.

...
--End of <CAM5tNy5btZGYz3Ya-***@mail.gmail\
.com>

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Warner Losh
2024-04-04 22:44:54 UTC
Permalink
Post by Rick Macklem
Post by Alan Somers
Post by Rick Macklem
Post by Alan Somers
1) tmpfs handles SEEK_HOLE differently than other file systems
2) everything else handles SEEK_HOLE at EOF poorly, IMHO
According to lseek(2), SEEK_HOLE should return the start of the next
hole greater than or equal to the supplied offset. Also, each file
has a zero-sized virtual hole at the very end of the file. So I
would
Post by Alan Somers
Post by Rick Macklem
Post by Alan Somers
expect that calling SEEK_HOLE at EOF would return the file's size.
However, the man page also says that SEEK_HOLE will return ENXIO when
the offset points to EOF. Those two statements seem contradictory to
me. The first behavior seems more logical. I would expect SEEK_HOLE
to work the same way both at EOF and at any other file offset.
What does the spec say?
There is no POSIX standard for this. It was invented by Solaris,
Illumos's man page does not say clearly say what should happen at
EOF.
Post by Alan Somers
Post by Rick Macklem
Post by Alan Somers
Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
offset is beyond the end of the file". That would seem to indicate
behavior 1: SEEK_HOLE should return the file's size at EOF. Only
beyond EOF should it return ENXIO.
Well, there is the Austin Group stuff (never ratified by POSIX as I
understand it).
If whence is SEEK_HOLE, the file offset shall be set to the smallest
location of a byte within a hole and not less than offset, except that
if offset falls within the last hole, then the file offset may be set
to the file size instead. It shall be an error if offset is greater
or equal to the size of the file.
I'd suggest we follow this, since it is the closest to a standard that
there is.
Post by Alan Somers
That sounds like behavior 2: return ENXIO at EOF. For reference, do
you have a link to that somewhere?
0000415: add SEEK_HOLE, SEEK_DATA to lseek - Austin Group Defect
Tracker (austingroupbugs.net)
If this doesn't give you a link (gmail never shows the raw url for me)
just google
"SEEK_HOLE austin group".
You have to join the mailing list to have access. It's easy to do. You can
then download the latest draft (which I think is the ballot draft, so will
be quite close to final, usually just 'typos' and such are corrected before
the published standard).This will be the next POSIX.1 standard, likely this
year.

So it's kinda hard to give an exact link :(.

Warner
Poul-Henning Kamp
2024-04-05 05:43:13 UTC
Permalink
--------
Post by Alan Somers
Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
offset is beyond the end of the file".
[...]
Contrary to its man page, Linux behaves mostly like FreeBSD. SEEK_HOLE
returns ENXIO at EOF on most file systems.
Just two minor quibbles:

If the file position is EOF, then you /are/ "beyond the end of the file"
because a read(2) would not be able to return any data.

And returning ENXIO is more informative than returning the size of the
file, since it atomically tells you that there are no more holes.

If it returned the size of the file, you would have to make another
syscall (opening a race) to check if what you got was EOF or a hole.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
alan somers
2024-04-05 14:13:18 UTC
Permalink
Post by Poul-Henning Kamp
--------
Post by Alan Somers
Post by Poul-Henning Kamp
If the file position is EOF, then you /are/ "beyond the end of the file"
because a read(2) would not be able to return any data.
Do you distinguish between "at EOF" and "beyond EOF"? And does it not
trouble you that calling SEEK_HOLE from the beginning of the "virtual
hole at EOF" will return ENXIO, even though calling SEEK_HOLE from the
beginning of any real hole will return the current offset?
EOF is where the file ends and there's no "hole" there, because there
no more file on the other side of that "hole".
When you stand on a cliff, the ocean is not "a hole in the landscape",
it's where the landscape ends.
Except there is a hole at EOF, a virtual hole. The draft spec
specifically says "all seekable files shall have a virtual hole
starting at the
current size of the file".
Post by Poul-Henning Kamp
Post by Alan Somers
Post by Poul-Henning Kamp
And returning ENXIO is more informative than returning the size of the
file, since it atomically tells you that there are no more holes.
Ahh, that's a good point. It's the first point I've heard in favor of
this option. Are you aware of any applications that need to know
that?
No, but that should not get in the way of good syscall architecture :-)
It might be useful for archivers which try to be smart about sparse files.
I imagine that most archivers would work like this:
ofs = 0
loop {
let start = lseek(fd, ofs, SEEK_DATA);
if ENXIO {
// No more data regions
break
}
let end = lseek(fd, ofs, SEEK_HOLE);
assert!(!ENXIO) // thanks to the virtual hole, we should never
have ENXIO here
copy(fd, start, end - start, ...)
ofs = end
}
truncate(output_file, fd.fsize)

Since archivers really only care about data regions, not holes, I
don't think that they would usually call SEEK_HOLE at EOF.
Post by Poul-Henning Kamp
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Poul-Henning Kamp
2024-04-05 14:18:36 UTC
Permalink
--------
The draft spec specifically says "all seekable files shall have a virtual hole
starting at the current size of the file".
I have never subscribed to the notion that people standardizing C and UNIX were
particular competent, so that carries no water with me.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Rick Macklem
2024-04-05 14:26:33 UTC
Permalink
Post by Poul-Henning Kamp
--------
The draft spec specifically says "all seekable files shall have a virtual hole
starting at the current size of the file".
I have never subscribed to the notion that people standardizing C and UNIX were
particular competent, so that carries no water with me.
Sure, but choosing to be non-conformant just creates portability problems.

rick
Post by Poul-Henning Kamp
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Dag-Erling Smørgrav
2024-04-05 11:25:39 UTC
Permalink
Post by Warner Losh
Post by Rick Macklem
That sounds like behavior 2: return ENXIO at EOF.  For reference, do
you have a link to that somewhere?
0000415: add SEEK_HOLE, SEEK_DATA to lseek - Austin Group Defect
Tracker (austingroupbugs.net)
If this doesn't give you a link (gmail never shows the raw url for me)
just google "SEEK_HOLE austin group".
You have to join the mailing list to have access.
To get access to the drafts, yes, but the defect tracker is open:

https://austingroupbugs.net/view.php?id=415

DES
--
Dag-Erling Smørgrav - ***@FreeBSD.org


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Alan Somers
2024-04-05 13:41:31 UTC
Permalink
Post by Poul-Henning Kamp
--------
Post by Alan Somers
Linux's man page is clear: "whence is SEEK_DATA or SEEK_HOLE, and
offset is beyond the end of the file".
[...]
Contrary to its man page, Linux behaves mostly like FreeBSD. SEEK_HOLE
returns ENXIO at EOF on most file systems.
If the file position is EOF, then you /are/ "beyond the end of the file"
because a read(2) would not be able to return any data.
Do you distinguish between "at EOF" and "beyond EOF"? And does it not
trouble you that calling SEEK_HOLE from the beginning of the "virtual
hole at EOF" will return ENXIO, even though calling SEEK_HOLE from the
beginning of any real hole will return the current offset?
Post by Poul-Henning Kamp
And returning ENXIO is more informative than returning the size of the
file, since it atomically tells you that there are no more holes.
Ahh, that's a good point. It's the first point I've heard in favor of
this option. Are you aware of any applications that need to know
that?
Post by Poul-Henning Kamp
If it returned the size of the file, you would have to make another
syscall (opening a race) to check if what you got was EOF or a hole.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Poul-Henning Kamp
2024-04-05 13:54:01 UTC
Permalink
--------
Post by Alan Somers
Post by Poul-Henning Kamp
If the file position is EOF, then you /are/ "beyond the end of the file"
because a read(2) would not be able to return any data.
Do you distinguish between "at EOF" and "beyond EOF"? And does it not
trouble you that calling SEEK_HOLE from the beginning of the "virtual
hole at EOF" will return ENXIO, even though calling SEEK_HOLE from the
beginning of any real hole will return the current offset?
EOF is where the file ends and there's no "hole" there, because there
no more file on the other side of that "hole".

When you stand on a cliff, the ocean is not "a hole in the landscape",
it's where the landscape ends.
Post by Alan Somers
Post by Poul-Henning Kamp
And returning ENXIO is more informative than returning the size of the
file, since it atomically tells you that there are no more holes.
Ahh, that's a good point. It's the first point I've heard in favor of
this option. Are you aware of any applications that need to know
that?
No, but that should not get in the way of good syscall architecture :-)

It might be useful for archivers which try to be smart about sparse files.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
***@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Rick Macklem
2024-04-05 14:23:20 UTC
Permalink
Post by alan somers
Post by Poul-Henning Kamp
--------
Post by Alan Somers
Post by Poul-Henning Kamp
If the file position is EOF, then you /are/ "beyond the end of the file"
because a read(2) would not be able to return any data.
Do you distinguish between "at EOF" and "beyond EOF"?
As a bit of an aside, NFSv4.2 does differentiate between "at EOF"
and "beyond EOF" for its Seek operation.
The fun part is that Linux did not implement what is in the RFC and shipped
to many before the "bug" was noticed (and still do not conform to the RFC
afaik). As such, there are now two ways to do it, The RFC way or the Linux
way. Selecting between them is what the sysctl vfs.nfsd.linux42server does.
Post by alan somers
Post by Poul-Henning Kamp
Post by Alan Somers
And does it not
trouble you that calling SEEK_HOLE from the beginning of the "virtual
hole at EOF" will return ENXIO, even though calling SEEK_HOLE from the
beginning of any real hole will return the current offset?
EOF is where the file ends and there's no "hole" there, because there
no more file on the other side of that "hole".
When you stand on a cliff, the ocean is not "a hole in the landscape",
it's where the landscape ends.
Except there is a hole at EOF, a virtual hole. The draft spec
specifically says "all seekable files shall have a virtual hole
starting at the
current size of the file".
I think that they used the term "virtual" to indicate this is not a real hole
and I think it was a good idea, since it allows file systems that do not
support holes to support SEEK_DATA.

However, I still believe that conforming to the Austin Group draft is
preferable.

rick
Post by alan somers
Post by Poul-Henning Kamp
Post by Alan Somers
Post by Poul-Henning Kamp
And returning ENXIO is more informative than returning the size of the
file, since it atomically tells you that there are no more holes.
Ahh, that's a good point. It's the first point I've heard in favor of
this option. Are you aware of any applications that need to know
that?
No, but that should not get in the way of good syscall architecture :-)
It might be useful for archivers which try to be smart about sparse files.
ofs = 0
loop {
let start = lseek(fd, ofs, SEEK_DATA);
if ENXIO {
// No more data regions
break
}
let end = lseek(fd, ofs, SEEK_HOLE);
assert!(!ENXIO) // thanks to the virtual hole, we should never
have ENXIO here
copy(fd, start, end - start, ...)
ofs = end
}
truncate(output_file, fd.fsize)
Since archivers really only care about data regions, not holes, I
don't think that they would usually call SEEK_HOLE at EOF.
Post by Poul-Henning Kamp
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...