[dpdk-dev] [RFC PATCH 0/2] pmdinfogen: rewrite in Python
Neil Horman
nhorman at tuxdriver.com
Tue Jun 23 13:28:06 CEST 2020
On Mon, Jun 22, 2020 at 10:39:46PM +0300, Dmitry Kozlyuk wrote:
> On Mon, 22 Jun 2020 08:41:17 -0400, Neil Horman wrote:
> > On Mon, Jun 22, 2020 at 03:45:01AM +0300, Dmitry Kozlyuk wrote:
> [snip]
> > > 1. No standard ELF or COFF module for Python
> > > (amount of Python code without libelf on par with C code using it).
> > Thats not really true, pyelftools is pretty widely used, and packaged for
> > most(all) major distributions. Requiring this kicks the can for (2) above down
> > the road a bit, but I would prefer to see that used rather than a custom
> > implementation, just to reduce the maint. burden
>
> I must have underestimated pyelftools spread. I'll look into using it. The
> less new code the better, I agree here. Windows users can get it via PyPI.
>
Ack.
>
> > > 2. struct rte_pci_id must be synchronized with header file
> > > (it's a few lines that never change).
> > >
> > I think you can just use ctypes and the native python C interface to just import
> > the rte_pci_id structure from the native dpdk header to avoid this, no?
>
> Not sure I follow. RFC script uses ctypes to describe struct rte_pci_id
> in Python. If you're suggesting to create a Python module in C just to include
> a header with a single small structure, I'd say it's not worth the trouble.
>
Yeah, apologies, i misread what you were saying here, and didn't bother to check
the code. what you're doing makes sense.
>
> > > 1. Support for >65K section headers seems present in current pmdinfogen.
> > > However, the data it reads is not used after. Is it really needed?
> > >
> > Its used.
> > The support you're referring to is primarily the ability to extract the true
> > number of section headers from the ELF file (in the event that its more than
> > 64k). Without that, on very large binaries, you may not be able to find the
> > symbol table in the ELF file.
>
> I was talking about these fields of struct elf_info:
>
> /* if Nth symbol table entry has .st_shndx = SHN_XINDEX,
> * take shndx from symtab_shndx_start[N] instead
> */
> Elf32_Word *symtab_shndx_start;
> Elf32_Word *symtab_shndx_stop;
>
> They are not used after being filled in parse_elf() and their endianness
> fixed in the end despite the comment.
>
Its been a while since I wrote this, so I need to go back and look closely, but
as you say, these values aren't used after parse_elf completes, but they are(I
think) used to correct the endianess of the section header index table, so that
get_sym_value works properly. You would (again I think), only note a problem if
you were parsing an ELF file for an architecture with an endianess opposite that
of what you are building on (i.e. an x86 system cross compiling for a powerpc
target).
>
> > > 2. How much error-handling is required? This is a build-time tool,
> > > and Python gives nice stacktraces. However, segfaults are possible
> > > in Python version due to pointer handling. IMO, error checking
> > > must be just sufficient to prevent silent segfaults.
> > >
> > I'm not really sure what the question is here. You need to handle errors in the
> > parsing process, we can't just have the tool crashing during the build. How
> > thats handled is an implementation detail I would think.
>
> Consider a snippet from pmdinfogen.c:
>
> /* Check if file offset is correct */
> if (hdr->e_shoff > info->size) {
> fprintf(stderr, "section header offset=%lu in file '%s' "
> "is bigger than filesize=%lu\n",
> (unsigned long)hdr->e_shoff,
> filename, info->size);
> return 0;
> }
>
> It is required in C, because otherwise pmdinfogen would crash without
> diagnostic. With Python, most errors like this result in a crash with a trace
> to the error line and a message from ctypes (or ELF parsing library). I'm
> arguing for leaving only the checks that prevent *silent* crashes (this can
> happen with ctypes) which are hard to diagnose. Motivation is to keep the
> code simple.
>
Hmm, I'd defer to others opinions on that. As a build tool it may be ok to just
crash with a backtrace, but I'm personally not a fan of that. Id rather see a
diagnostic error and have the tool exits with an appropriate exit code, but lets
see what others say.
>
> > > 3. On Unix, pmdinfogen is called for each object file extracted with ar
> > > from an .a library by a shell script. On Windows, other tools
> > > have to be used, shell script will not work. On the other hand, COFF
> > > library format is quite simple. Would it be appropriate for
> > > pmdinfogen to handle it to avoid intermediate script?
> > >
> > I suppose you could, but extracting objects from archives seems outside the
> > scope of what pmdinfogen normally does. What is the problem with using a shell
> > script on windows? I thought WSL had support for executing bash syntax shell
> > scripts there? Why not key off an environment variable to make the relevant
> > scripts do the right thing on your environment?
>
> WSL2 is a Linux VM integrated in Windows, you can only cross-compile from
> there. Native builds can use Python or PowerShell, which is as capable as Unix
> shells, but incompatible with them. To call lib.exe (MSVC analog of ar) is
> probably simpler then parsing COFF, yes. So meson could select one of the
> following for a Windows target (objects are COFF):
>
> Host Toolchain Archive Script Extractor
> ------- --------- ------- --------- ---------
> Linux MinGW .a sh ar
> Windows MinGW .a PowerShell ar
> Windows Clang .lib PowerShell lib
I think if I read you right, what you're suggesting here is that meson setup a
build time marco $ARCHIVETOOL, and set it to ar or lib.exe depending on the
build environment, and just have the script in question use $ARCHIVER? If so,
I'd be good with that. Do we have to worry about the use of divergent command
line options for each tool as well?
Neil
>
> --
> Dmitry Kozlyuk
>
More information about the dev
mailing list